Adding New Syntaxes to Hakyll

Configuring Hakyll's Pandoc Compiler to Support New Language Syntaxes

August 16, 2021

If you’re here, you’re probably attempting at configuring Hakyll to support additional languages for syntax highlighting. Or you’re “not satisfied with the build-in highlighting”.

If your journey here is similar to mine, I’ll make the assumption that you’ve stumbled on a post somewhere on the internet telling you that all you need to do is take KDE’s syntax file for your language, then use --syntax-definition flag for Pandoc via the command line will include your syntax file. And voila! New Language is ready!

The reality is not as easy as that. We are using Hakyll. Hakyll uses Pandoc under the hood to convert from markdown to html, and so we do not have command line access to Pandoc. However, Hakyll does expose the Pandoc settings to us. We can ask Hakyll for the Pandoc settings and update it with our own preferences. For the purpose of this post we’ll walk through how to set-up a new language for syntax highlighting, but the process is similar for configuring other Pandoc options for Hakyll.

Before we get started, I want to note that the code examples below are using packages from Stack’s 16.31 LTS release. Meaning that we’ll be using GHC 8.8.4, Hakyll 4.13.4.0, Pandoc 2.9.2.1, and Skylighting 0.8.5. If you’re using older or newer versions, the actual code required may be different but the idea should be similar. “Follow the types” and all will fall in place.

Identify the Pandoc Compiler and Pandoc Options

The first step is to identify where the Pandoc compiler is being applied. The default project will make use of pandocCompiler. Hakyll, however provides additional functions for setting Pandoc options for the compiler.

Those methods are: pandocCompilerWith, pandocCompilerWithTransform, and pandocCompilerWithTransformM. For details checkout the Hakyll.Web.Pandoc module.

The function we’re interested in is pandocCompilerWith. This method takes two parameters. The first is ReaderOptions from Text.Pandoc.Options. ReaderOptions contain options that act on the raw input format as it is converted to the Pandoc AST. The second parameter is WriterOptions. WriterOptions are settings that act on the Pandoc AST and affect the final output.

Both ReaderOptions and WriterOptions are record types that can be queried and updated. The setting we want is writerSyntaxMap and is part of WriterOptions. The next question is, where can we find the default WriterOptions used by Hakyll?

Hakyll exposes defaultHakyllReaderOptions and defaultHakyllWriterOptions with Hakyll’s defaults. We can reference these directly and use them when we use pandocCompilerWith.

An equivalent of pandocCompiler would be:

myPandocCompiler = pandocCompilerWith defaultHakyllReaderOptions defaultHakyllWriterOptions

With this knowledge we can now query for Hakyll’s default syntax map:

defaultHakyllSyntaxMap = writerSyntaxMap defaultHakyllWriterOptions

At this point we can build a custom Pandoc compiler, and query for Hakyll’s default so we can augment it. Next we need to find a way to load additional syntax definition files. Pandoc uses the library Skylighting to do it’s syntax highlighting. Skylighting takes case of loading, parsing, and outputing styled syntax.

Loading KDE XML Syntax

In the above snippet, defaultHakyllSyntaxMap is of type: SyntaxMap, which comes from the Skylighting.Types module. Skylighting contains methods to load XML syntax files. Skylighting.Loader contains: loadSyntaxFromFile and loadSyntaxesFromDir to load a file and a directory respectively.

When we make use of these methods we need to do a bit of unwrapping from types like: IO (Either String Syntax).

import           Data.Either (fromRight)
import           System.IO.Unsafe (unsafePerformIO)
import           Skylighting.Loader (loadSyntaxesFromDir)
import           Skylighting.Syntax (defaultSyntaxMap)
import           Skylighting.Types (SyntaxMap)

ioResult :: IO (Either String SyntaxMap)
ioResult = loadSyntaxesFromDir "syntax"

eitherResult :: Either String SyntaxMap
eitherResult = unsafePerformIO ioResult
    
loadedSyntaxMap :: SyntaxMap
loadedSyntaxMap = fromRight defaultSyntaxMap eitherResult

At this point we’ve successfully loaded new SyntaxMaps using Skylighting’s methods and are ready to make use of them.

We can join this new SyntaxMap with the default SyntaxMap Hakyll knows about:

updatedSyntaxMap :: SyntaxMap
updatedSyntaxMap = 
    let defaultSyntaxMap = writerSyntaxMap defaultHakyllWriterOptions
    in defaultSyntaxMap `mappend` loadedSyntaxMap 

We can then create an updated WriterOptions based on the Hakyll defaults.

pandocWriterOptions :: WriterOptions
pandocWriterOptions = defaultHakyllWriterOptions { 
                          writerSyntaxMap = updatedSyntaxMap
                      }

All Together

Now that we know more about the constituent parts to configuring a pandoc compiler with our custom settings we can bring it all together.

import           Data.Monoid (mappend)
import           Data.Either (fromRight)
import           System.IO.Unsafe (unsafePerformIO)

import           Hakyll
import           Text.Pandoc.Options
import           Skylighting.Loader (loadSyntaxesFromDir)
import           Skylighting.Syntax (defaultSyntaxMap)
import           Skylighting.Types (SyntaxMap)

-- | 'loadedSyntaxMap' loads xml syntax definitions or 
-- the default syntaxMap if load operation fails.
loadedSyntaxMap :: SyntaxMap
loadedSyntaxMap = 
    let ioResult = loadSyntaxesFromDir "syntax"
        eitherResult = unsafePerformIO ioResult
    in fromRight defaultSyntaxMap eitherResult

-- | 'updatedSyntaxMap' extras the default Hakyll syntax map
-- and appends 'loadedSyntaxMap' to include new syntaxes.
updatedSyntaxMap :: SyntaxMap
updatedSyntaxMap = 
    let defaultSyntaxMap = writerSyntaxMap defaultHakyllWriterOptions
    in defaultSyntaxMap `mappend` loadedSyntaxMap

-- | 'pandocWriterOptions' update Hakyll's default WriterOptions
-- with the new writerSyntaxMap
pandocWriterOptions :: WriterOptions
pandocWriterOptions = defaultHakyllWriterOptions { 
                          writerSyntaxMap = updatedSyntaxMap
                      }

-- | 'customPandocCompiler' definition of the Pandoc compiler with 
-- custom loaded syntaxes and settings.
customPandocCompiler :: Compiler (Item String)
customPandocCompiler = pandocCompilerWith defaultHakyllReaderOptions pandocWriterOptions

And finally, we can use the new customPandocCompiler in the compile step for posts. This will replace what would have been the default pandocCompiler.

main :: IO ()
main = hakyll $ do
    match "posts/*" $ do
        route $ setExtension "html"
        compile $ customPandocCompiler 
            >>= loadAndApplyTemplate "templates/post.html"    postCtx
            >>= loadAndApplyTemplate "templates/default.html" postCtx
            >>= relativizeUrls

Resources