purelyfunctional.org

Calling External Functions from JIT-compiled LLVM Modules using llvm-hs

2018-04-02T00:00:00Z

Posted on April 2, 2018

llvm-hs provides bindings to LLVM’s ORC JIT APIs. These APIs let you JIT-compile LLVM modules and then call functions in those modules from your Haskell code. However, sometimes you want to use external libraries from within your LLVM module either because you want to make use of an existing library or because it might be easier to implement certain parts in other languages (e.g. C) than LLVM IR. Sam Griffin recently raised the question of how you can call functions in external libraries from a JIT-compiled module and while I had a rough idea of how to do this, I had never actually tried it myself. In this post, I present my findings on how you can accomplish this for both static and dynamic libraries.

We start with a very simple C file lib.c that defines a function called external_function which returns twice its argument. This is the function that we will attempt to call from our LLVM module.

#include 

int32_t external_function(int32_t x) {
    return 2 * x;
}

We can now compile this to an object file using gcc -fPIC -c -o lib.o lib.c. (-fPIC is only necessary when we want to produce a dynamic library but to keep things simple we will use the same object file for building the static and the dynamic library in this post).

The static library can now be created using ar rcs libexternalstatic.a lib.o. The dynamic library can be built using gcc -shared -o libexternaldynamic.so lib.o.

The LLVM module module.ll that we will be using in this post declares external_function and defines a function f which takes no argument and returns the result of applying external_function to 21.

; ModuleID = 'basic'
source_filename = ""

declare i32 @external_function(i32)

define i32 @f() {
entry:
  %0 = call i32 @external_function(i32 21)
  ret i32 %0
}

Now that we have defined the module, we are ready to write the Haskell code to JIT the module and then finally call the f function. For this post, we will declare the module using LLVM’s textual IR and load it using llvm-hs’s withModuleFromLLVMAssembly but building the module using llvm-hs-pure’s AST works as well.

There are two points that you need to pay attention to if your JIT-compiled module references external functions (for both static and dynamic libraries):

Your resolver needs some way to find the symbol. We are going to use getSymbolAdressInProcess for this which is a function provided by llvm-hs that will search for loaded symbols in the current process.
getSymbolAddressInProcess will only find symbols in libraries that have been loaded before. This is accomplished by calling loadLibraryPermanently before you JIT the module. You can either pass the name of a dynamic library to loadLibraryPermanently or you can pass Nothing (equivalent to dlopen(NULL)) which will load the symbols in the current process including the symbols in shared libraries that the executable is linked against.

This leaves us with the following resolver:

resolver :: IRCompileLayer l -> SymbolResolver
resolver compileLayer =
  SymbolResolver
    (\s -> findSymbol compileLayer s True)
    (\s ->
       fmap
         (\a -> JITSymbol a (JITSymbolFlags False True))
         (getSymbolAddressInProcess s))

The implementation of main might look slightly complicated at a first glance, so let’s break it down:

We first call the aforementioned loadLibraryPermanently function to make sure that later calls to getSymbolAddressInProcess will find external_function.
Then follows a bit of boilerplate to initialize the LLVM context, load the module and create the ORC linking and compile layers.
We can now add the module to the ORC compile layer using withModule which is a bracket-style wrapper around addModule and removeModule.
Next, we mangle the symbol of the function that we want to call (f in this case) and search for the symbol in the compile layer.
Pattern matching on the resulting JITSymbol gives us back a WordPtr representing the address of f. We use wordPtrToPtr and castPtrToFunPtr to convert the WordPtr to a FunPtr.
Finally, we use a dynamic foreign import to convert the FunPtr to a Haskell function and call the resulting function.

main :: IO ()
main = do
  loadLibraryPermanently Nothing
  withContext $ \ctx ->
    withModuleFromLLVMAssembly ctx (File "module.ll") $ \mod' ->
      withHostTargetMachine $ \tm ->
        withObjectLinkingLayer $ \objectLayer ->
          withIRCompileLayer objectLayer tm $ \compileLayer -> do
            withModule
              compileLayer
              mod'
              (resolver compileLayer) $
              \_ -> do
                mainSymbol <- mangleSymbol compileLayer "f"
                (JITSymbol mainFn _) <- findSymbol compileLayer mainSymbol True
                result <- mkFun (castPtrToFunPtr (wordPtrToPtr mainFn))
                print result

If you want to use the dynamic library, then all that’s left to do is to add extra-libraries: externaldynamic to the executable section in our cabal file. Depending on where you placed the shared library, you will also have to set extra-lib-dirs to the directory containing the library so that it is found at link time and the LD_LIBRARY_PATH environment variable to make sure it is found when you run the executable.

If you want to use the static library, then things are a bit more involved: Just adding externalstatic to extra-libraries will not work since the linker will omit unused symbols when linking against static libraries. Since the linker does not know about the reference to external_function in our JIT compiled module, this symbol will thereby not end up in the binary. To fix this you need to use -Wl,--whole-archive,-lexternalstatic,--no-whole-archive in the ld-options section in your cabal file. This will force all symbols in the externalstatic library to be included in the final executable even if they are not referenced. We also need to ensure that the symbols end up in the dynamic symbol table since that is what getSymbolAddressInProcess will look at. The corresponding flag in GNU ld is called --export-dynamic but we use GHC’s -rdynamic option here (by adding it to ld-options) which will use --export-dynamic under the hood if you’re using GNU ld (but should also support other linkers). As for shared libraries, you might also need to set extra-lib-dirs to make sure that the library is found at link time. Since we are linking the library statically, there is no need for messing with LD_LIBRARY_PATH. If you followed the steps thus far, you might have noticed that this still does not quite work: You know longer get symbol resolution errors but you will get a segfault. Luckily, this can be fixed by changing the relocation model of the target machine to PIC instead of relying on the default set by withHostTargetMachine which seems to be Static on X86. (I think this has the effect of preventing LLVM from emitting call instructions to immediates but I am not entirely sure why this is necessary. If you do know more about this, I would love here from you!). The custom version of withHostTargetMachine that sets the relocation model looks as follows:

withHostTargetMachine :: (TargetMachine -> IO a) -> IO a
withHostTargetMachine f = do
  initializeAllTargets
  triple <- getProcessTargetTriple
  cpu <- getHostCPUName
  features <- getHostCPUFeatures
  (target, _) <- lookupTarget Nothing triple
  withTargetOptions $ \options ->
    withTargetMachine target triple cpu features options Reloc.PIC CodeModel.Default CodeGenOpt.Default f

Conclusion

While calling functions in external libraries from a JIT-compiled module is not particularly complicated, finding all the correct linker flags can be a bit tricky especially if you are not too familiar with linkers (which certainly applies to myself :)). Hopefully, this post can serve as a reference and spare others from having to go through the same trial and error process that I went through. You can find the full code mentioned in this blogpost on github. Note that I only tested this on Linux (specifically Archlinux 64bit), the linker flags might be slightly different on other systems.

MonadFix and the Lazy and Strict State Monad

2018-03-04T00:00:00Z

Posted on March 4, 2018

In this post, I will assume rudamentary familiarity with the different Monad instances of the lazy and strict state monad and MonadFix. If you are not familiar with these concepts or want to brush up your knowledge, I recommend Kwang Yul Seo’s post on the lazy and strict state monad and Will Fancher’s post on MonadFix.

Recently, llvm-hs-pure got a new API for building modules called IRBuilder which makes this process significantly more convenient by taking care of a lot of the necessary book keeping. In particular, the API is built upon a state monad that tracks variables and creates fresh variables as necessary, allows the use of monadic binds to refer to operators and more. In the context of LLVM references to variables or blocks often end up being circular, e.g., the branch instructions in the basic blocks in a loop will form a cycle referencing each other. While monadic binds can’t be recursive by default, MonadFix and the RecursiveDo extension lift this restriction and thereby allow for a very convenient API even in the presence of recursive definitions. For a more detailed blogpost on a very similar API, I recommend Lewis’ post on the ASM monad.

Recursive functions are another case where references end up being circular and thereby require MonadFix. Sadly, this usecase was completely broken in llvm-hs-pure as Pavol Klacansky noticed in a bugreport: All attempts to build modules this way led to an infinite loop and GHC’s infamous <> exception. After investigating this problem, I figured out that replacing the strict state monad by the lazy state monad solved the problem and lead to the expected behavior instead of an infinite loop. In the following, I’m going to present a simplified version of the problem and explain why the two versions differ.

We’ll start out by defining a very simple type representing the instructions in our program. For this example, we only need to instructions:

A Dummy instruction and
a Reference instruction that refers to the result of another instruction by its name.

data Instr
  = Reference String
  | Dummy
  deriving Show

We can now define the Builder monad which is used to build the list of instructions. Builder is just a type synonym for a State monad with the state being a list of (String, Instr) pairs. We’ll also define runBuilder function that run a builder with an initial state consisting of an empty list of instructions and returns the final list.

type Builder a = State [(String, Instr)] a

runBuilder :: Builder a -> [(String, Instr)]
runBuilder a = execState a []

Emitting an instruction appends it to the list of instructions and returns the name of the instruction. We also define two convenience wrappers for emitting Dummy and Reference instructions.

emitInstr :: (String, Instr) -> Builder String
emitInstr (n, i) = do
  modify (\instrs -> instrs ++ [(n, i)])
  pure n

dummy :: String -> Builder String
dummy n = emitInstr (n, Dummy)

reference :: String -> String -> Builder String
reference n ref = do
  let instr = Reference ref
  emitInstr (n, instr)

Finally, we can define a very simple example program consisting of a Reference instruction and a Dummy instruction with the Reference instruction referencing the Dummy instruction which is defined later (that is why we need MonadFix and RecursiveDo here).

example :: Builder ()
example = mdo
  ref <- reference "ref" foo
  foo <- dummy "foo"
  pure ()

You can use the following definition for main to test this example.

main :: IO ()
main = print (runBuilder example)

This example will work with both the lazy and the strict state monad. However, if we change the definition of reference as shown below, running the example will result in an infinite loop.

reference :: String -> String -> Builder String
reference n ref = do
  let instr = Reference ref
  case ref of
    !a -> emitInstr (n, instr)

Introducing the strict pattern match here might seem silly and in this isolated example it definitely is. However, in general it is definitely possible that the way an instruction is emitted depends on the reference and thereby requires a pattern match. In llvm-hs, the call instruction checks if the callee has a void return type which resulted in the issue mentioned above. To better understand why the strict and the lazy monad behave differently here, I am going to substitute the Monad and MonadFix instances and inline the definitions.

Let us start by removing the use of mdo and replace it by an explicit use of mfix.

example = do
  mfix $ \foo -> do
    ref <- reference "ref" foo
    foo' <- dummy "foo"
    pure foo'
  pure ()

Next, we can substitute the definition of mfix. Since State s a in transformers is defined as a StateT s Identity a, the definition can look a bit complicated. For this post, we are going to assume that State has not been defined as a transformer and provide definitions for this simplified version of State. You can see the recursion in mfix by a occuring both on the left and on the right of =.

newtype State s a = State { runState :: s -> (a, s) }
mfix f = State (\s -> let (a, s') = runState (f a) s in (a, s'))

In the next step, we inline this definition of mfix.

example :: State [(String, Instr)] ()
example = do
  State $ \s ->
    let (foo, s') =
          runState (do ref <- reference "ref" foo
                       foo' <- dummy "foo"
                       pure foo'
                   )
                   s
    in (foo, s')
  pure ()

Finally, we desugar do notation and inline reference, dummy and runState.

example :: State [(String, Instr)] ()
example = do
  State $ \s ->
    let (foo, s') =
          let (ref, s'') = 
                case foo of !a -> ("ref", s ++ [("ref", Reference foo)])
              (foo', s''') = ("foo", s'' ++ [("foo", Dummy)])
          in (foo', s''')
    in (foo, s')
  pure ()

The above definition uses the bind implementation of the lazy state monad, for the strict state monad, we need to change the let statement to be strict in the tuple (note that pattern matches in let statements are lazy by default):

let !(ref, s'') = 
      case foo of !a -> ("ref", s ++ [("ref", Reference foo)])
    (foo', s''') = ("foo", s'' ++ [("foo", Dummy)])
in (foo', s''')

At this point, the difference becomes clear: For the strict state monad, forcing (foo, s') forces (ref, s'') which in turn ends up forcing foo which has not yet been computed so we run into an infinite loop. For the lazy state monad, the evaluation of the (ref, s'') tuple and thereby also the case statement on foo is lazy and thus we can first evaluate that foo = "foo" before evaluating the case statement and avoid the infinite loop.

Conclusion

When asked what the lazy state monad is for, the most common response is infinite states as demonstrated by Kwang in the post mentioned at the beginning of this post. In this article, we have seen a different usecase in combination with MonadFix where monadic actions depend on recursive bindings and the lazy state monad prevents an infinite loop.

Haskell bindings for template-heavy C++ code

2017-05-30T00:00:00Z

Posted on May 30, 2017

This post describes a technique for writing Haskell bindings (similar tricks apply to other languages) to template-heavy C++ code when the template instantiations that should be exposed are not statically known. I am going to assume some rudimentary knowledge of C++ templates and the Haskell C FFI.

I originally faced this problem when trying to make the bindings to ORC JIT in llvm-hs more flexible, so the examples used in this post will be based on the API of ORC JIT. However, the solution is not tied to ORC JIT or LLVM and can be applied when writing bindings to other libraries. The ORC JIT API is composed of various compile layers which are responsible for compiling LLVM modules to object files (The examples below call the method responsible for this compileModule). There are base layers which just compile modules directly but more importantly (for this post), there are layers that wrap other layers and apply some sort of transformation before passing the modified module to the underlying layer. Ignoring all the irrelevant details, we can imagine that the C++-API for this looks as follows:

class Module;
class Object;

// The base layer which compiles a module directly to object code. The details
// of how this is done are irrelevant for this post.
class BaseLayer {
  public:
    Object *compileModule(Module *module);
};

// A transform layer which first applies a function transforming the module
// before handing off compilation to the underlying base layer.
template  class TransformLayer {
  public:
    TransformLayer(BaseLayerT &baseLayer,
                   std::function transform)
        : baseLayer(baseLayer), transform(std::move(transform)) {}
    Object *compileModule(Module *module) {
        Module *transformedModule = transform(module);
        return baseLayer.compileModule(transformedModule);
    }
    BaseLayerT &baseLayer;
    std::function transform;
};

Being able to compose layers is great since it gives users a lot of flexibility in how they want to build their JIT. However, it makes providing Haskell bindings for that API tricky. Let’s first consider what Haskell API we would like to end up with. It should expose the same flexibility available in the C++ interface. In particular, users should be able to choose which layers they want to use and how they should be composed. A first attempt at the low-level API might look as follows:

import Foreign.Ptr

data Object
data Module
data BaseLayer
data TransformLayer baseLayer

newBaseLayer :: IO (Ptr BaseLayer)
newTransformLayer :: Ptr a -> FunPtr (Ptr Module -> IO (Ptr Module)) -> IO (Ptr (TransformLayer a))
compileModule :: Ptr a -> Ptr Module -> IO (Ptr Object)

You might have noticed that we are being to polymorphic here: Users shouldn’t be able to use pointers to arbitrary types to be used as compile layers. We will come back to that later.

Haskell does not support directly interfacing with C++, so we are going to need to write a C wrapper to the C++ API. But we cannot write a wrapper for newTransformLayer. C does not really have a concept of polymorphism so we can’t write a function that accepts an arbitrary layer. You might be tempted to just accept a void* and cast it and hope for the best but even that will not work since calling the C++ constructor of TransformLayer requires statically knowing the type of the base layer. Another non-solution would be to write different wrappers for newTransformLayer for each type of base layers since this contradicts our goal of exposing the full flexibility present in the C++-API.

Before explaining the solution, let’s step back for a moment and take a look at the situation at hand: What’s causing problems here is the fact that C++ templates are a form of static polymorphism and we cannot expose that via the C API. However, we can expose dynamic polymorphism, i.e., virtual dispatch. So if LLVM would just have a CompileLayer base class that TransformLayer and BaseLayer inherit from, all would be fine. So since LLVM does not provide this base class, let’s just write it ourselves!

class CompileLayer {
  public:
    virtual Object *compileModule(Module *module) = 0;
};

But BaseLayer and TransformLayer do not inherit from this new class. So we are going to create a new class that wraps an arbitrary compile layer, inherits from CompileLayer and hands of the actual compilation to the wrapped layer.

template  class CompileLayerT : public CompileLayer {
  public:
    CompileLayerT(T layer) : layer(std::move(layer)) {}
    Object *compileModule(Module *module) override {
        return layer.compileModule(module);
    }
    T layer;
};

Now that we have the necessary machinery, we can write the non-polymorphic C wrappers which we will use via the Haskell C FFI. These wrappers instantiate the templates only for CompileLayer. Since we can wrap the other layers in CompileLayerT and upcast them to CompileLayer we have not lost any flexibility.

extern "C" {
CompileLayer *newBaseLayer() {
    return new CompileLayerT(BaseLayer());
}
CompileLayer *newTransformLayer(CompileLayer *baseLayer,
                                Module *(*transform)(Module *)) {
    return new CompileLayerT>(
        TransformLayer(*baseLayer, transform));
}
Object *compileModule(CompileLayer *layer, Module *module) {
    return layer->compileModule(module);
}
}

Finally, we are ready to get back to the Haskell code. Writing the bindings to the 3 C functions that we just defined is easy.

{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign.Ptr

data Object
data Module
data CompileLayer

foreign import ccall newBaseLayer ::
  IO (Ptr CompileLayer)
foreign import ccall newTransformLayer ::
  Ptr CompileLayer -> FunPtr (Ptr Module -> IO (Ptr Module)) -> IO (Ptr CompileLayer)
foreign import ccall compileModule ::
  Ptr CompileLayer -> Ptr Module -> IO (Ptr Object)

However, you have probably noticed that we have lost the separate types for BaseLayer and TransformLayer. This is fine for the FFI imports but we don’t want to present that API to the user. So we wrap the above in a nicer Haskell API: We use a typeclass to represent types which can be converted to a Ptr CompileLayer and add newtypes for BaseLayer and TransformLayer. TransformLayer has a phantom type parameter representing the base layer and our wrapper for newTransformLayer ensures that it is correctly instantiated.

foreign import ccall newBaseLayer ::
  IO (Ptr CompileLayer)
foreign import ccall newTransformLayer ::
  Ptr CompileLayer -> FunPtr (Ptr Module -> IO (Ptr Module)) -> IO (Ptr CompileLayer)
foreign import ccall compileModule ::
  Ptr CompileLayer -> Ptr Module -> IO (Ptr Object)

newtype BaseLayer = BaseLayer (Ptr CompileLayer)
newtype TransformLayer baseLayer = TransformLayer (Ptr CompileLayer)

class IsCompileLayer l where
  getCompileLayer :: l -> Ptr CompileLayer

instance IsCompileLayer BaseLayer where
  getCompileLayer (BaseLayer l) = l

instance IsCompileLayer (TransformLayer l) where
  getCompileLayer (TransformLayer l) = l

newBaseLayer' :: IO BaseLayer
newBaseLayer' = BaseLayer <$> newBaseLayer

newTransformLayer' :: IsCompileLayer l => l -> FunPtr (Ptr Module -> IO (Ptr Module)) -> IO (TransformLayer l)
newTransformLayer' baseLayer transform =
  TransformLayer <$> newTransformLayer (getCompileLayer baseLayer) transform

compileModule' :: IsCompileLayer l => l -> Ptr Module -> IO (Ptr Object)
compileModule' layer module' =
  compileModule (getCompileLayer layer) module'

Caveats

I’ve only shown constructors to this API. Usually, you also want to add destructors which free the allocated layers. Otherwise, you are never going to deallocate the memory which leads to a memory leak. Luckily, you can mark the destructor of CompileLayer as virtual and then use the same C wrapper for all layers.
Turning static polymorphism into dynamic polymorphism does incur a slight performance cost. In this case, this is probably irrelevant but if you are wrapping a template function that wraps “small” types, e.g., a function that accepts different types of integers and performs cheap operations on them it might matter.

Dynamic loading of Haskell modules

2016-05-20T00:00:00Z

Posted on May 20, 2016

Even though I don’t have any particular compelling use case for dynamic loading of Haskell modules, it is something that I’ve been wanting to do for quite some time. Sadly I have never been able to produce anything but crashes so far. There is the plugins package but I have not gotten that to work either. The question seems to come up from time to time, e.g. on reddit, but I have not seen an example that works so far. This morning I decided to give it another shot and finally managed to get it to work!

Let us take the following module as an example:

module Plugin(f) where
f :: String
f = "Monads are just monoids in the category of endofunctors, what’s the problem?"

We want to load the module in our main executable and print the string f. The code is surprisingly simple and pretty much the same that is also used in plugins and similar to the code used in GHCi. We first need a function to create the ELF symbol name in our executable from the package, module and Haskell symbol name.

mangleSymbol :: Maybe String -> String -> String -> String
mangleSymbol pkg module' valsym =
  prefixUnderscore ++
  maybe "" (\p -> zEncodeString p ++ "_") pkg ++
  zEncodeString module' ++ "_" ++ zEncodeString valsym ++ "_closure"

For the details of prefixUnderscore take a look at the complete code. GHCi also has a similar function called nameToCLabel which can probably be used if you have a Name instead of dumb strings.

To load our module we now only need to initialize the linker, load our object file and lookup the symbol of the corresponding name.

main :: IO ()
main =
  do initObjLinker
     loadObj "plugin.o"
     _ret <- resolveObjs
     ptr  <- lookupSymbol (mangleSymbol Nothing "Plugin" "f")
     case ptr of
       Nothing         -> putStrLn "Couldn’t load symbol"
       Just (Ptr addr) -> case addrToAny# addr of
                                 (# hval #) -> putStrLn hval

If you are confused by (# hval #) that’s just syntax for unboxed tuples. Also note that this is not at all typesafe. It is up to you to ensure that the symbol has the correct type.

We can now compile the plugin module using ghc plugin.hs and our main module using ghc -package ghc test.hs. However if we run ./test we get a cryptic error:

test: plugin.o: unknown symbol `ghczmprim_GHCziCString_unpackCStringUtf8zh_closure'
zsh: segmentation fault (core dumped)  ./test

Why is this symbol not found, isn’t that a standard symbol that should always be available? This is the point at which I gave up on my previous tries.

Luckily GHC has a test doing something similar (I have no idea why I have not found it before). The solution is to simply compile our executable using ghc -package ghc -rdynamic test.hs.

If we now run test we see the popular useful fact used to confuse beginners (please don’t do that):

Monads are just monoids in the category of endofunctors, what’s the problem?

You can change the text in plugin.hs, recompile it and rerun ./test (notably without recompiling test.hs) and it will show the new text.

Since I’ve never used rdynamic before I did a bit of digging. The reason for the error is actually independent of Haskell. It turns out that there is a so called dynamic symbol table in an ELF executable. Dynamically loaded code can only access symbols in that table. However by default not every symbol in the executable is added to the dynamic symbol table. Passing rdynamic tells the linker to add all symbols to that table no matter if they’re used or not. That way the dynamically loaded module has access to it.

You can also unload a modul using unloadObj. Thanks to Simon Marlow the GC then unloads the object code.

Sadly I could only test this on Linux so I have no idea if it works on Windows or OS X.

I hope this is useful for someone and look forward to see if and what people use it for.

Deriving a Servant Schema from your Data

2016-01-01T00:00:00Z

Posted on January 1, 2016

This post assumes some level of familiarity with the “modern Haskell extension zoo” in particular DataKinds, PolyKinds and TypeFamilies.

Basic Setup

The scenario we are in is a bunch of static data that determines which routes are valid and which aren’t. I got the idea for this post while working on documentation for haskell-ide-engine using servant-swagger. I simplify the code to make it independent of hie. So haskell-ide-engine has a list of plugins each having a list of commands. You can then make requests to /plugin/command passing all additional parameters via a JSON object. Here a Command consists of a name and a response that we send back when we get a request. Let’s take a look at the types

data Command =
  Command {cmdName :: T.Text, response :: T.Text}
data Plugin = Plugin { cmds :: [Command]}
type Plugins = M.Map T.Text Plugin

The static data (it’s important that it’s static) looks as follows

plugin1 :: Plugin
plugin1 =
  Plugin {cmds =
            [Command "cmd1.1" "cmd1.1 response"
            ,Command "cmd1.2" "cmd1.2 response"]}

plugin2 :: Plugin
plugin2 =
  Plugin {cmds =
            [Command "cmd2.1" "cmd2.1 response"
            ,Command "cmd2.2" "cmd2.2 response"]}

pluginList :: Plugins
pluginList = M.fromList [("plugin1",plugin1),("plugin2",plugin2)]

Now we take a look at the corresponding servant schema and the handlers

type CommandName = T.Text
type PluginName = T.Text
type Param = T.Text
type ParamMap = M.Map T.Text T.Text
type API = Capture "plugin" PluginName :>
           Capture "command" CommandName :>
           ReqBody '[JSON] ParamMap :>
           Post '[JSON] T.Text

lookupCommandResponse :: CommandName -> [Command] -> Maybe T.Text
lookupCommandResponse name =
  fmap response . find (\(Command name' _) -> name == name')

server :: Server API
server plugin command params =
  case lookupCommandResponse command . cmds =<<
             M.lookup plugin pluginList of
    Nothing -> left err404
    Just r -> pure r

Nothing fancy going on here, we have a single route, which captures the plugin and the command name and extracts a map of parameters from the request body. We won’t use that map here. It’s just there to show how this can be extended to something useful. Once we have the names we just do a lookup returning the response if it was successful or a 404 otherwise.

The Problem

Obviously, the above approach works just fine but there is (at least) one problem: Even though we know all plugins and commands at compile time, we don’t tell servant about them. At a first glance this might not be so bad, but if you want to generate documentation or client bindings for that API, using something like servant-swagger this is pretty bad. The documentation you can generate from a single route with two parameters is less useful than it needs to be. Wouldn’t it be great if we could teach servant about the existing plugins and commands and thereby profit a lot more from the cool documentation and binding generation servant provides?

Generating the Schema

Since the servant API is defined at the type level, we need to move the names to the type level too. Luckily GHC provides the GHC.TypeLits module for type level strings, and we can also reflect them back to the value level. So let’s make a type level representation of plugin

data PluginText = PluginText Symbol [Symbol]

Symbol is the equivalent of String at the type level. Using DataKinds we get a PluginType kind and a 'PluginType type constructor. Now we need to create a valid servant schema from a list of these. For that, we need to do induction on type level lists, so it’s nice to have a base case, which we’ll call Fail for :<|>, that always fails. This base case or identity gives us some sort of monoid structure with :<|> being a type level mappend and Fail being mempty. Note that :<|> is not strictly associative since (a :<|> b) :<|> c is a different type than a :<|> (b :<|> c), but that doesn’t make a difference in our case.

data Fail = Fail

instance HasServer Fail where
  type ServerT Fail m = Fail
  route _ _ _ f = f (failWith NotFound)

There is nothing that interesting going on, just note that we have to fill in Fail on the value level for Fail on the type level. Equipped with the identity for :<|>, we may move on. Given a command as a symbol, we just use a type synonym to create a route for it

type CommandRoute cmd = cmd :> ReqBody '[JSON] ParamMap :>
  Post '[JSON] T.Text

So what do we do if we have a list of command names? On the value level, we just create a function and recurse on the list. Luckily we have functions on the type level called TypeFamilies so let’s use that:

type family CommandRoutes list where
  CommandRoutes '[] = Fail
  CommandRoutes (cmd ': cmds) = CommandRoute cmd :<|>
                                CommandRoutes cmds

Now that we can route a list of commands, we’ll think about how the schema for a plugin should look. Let’s assume we already have the route for all the commands. Now it’s simply a case of prepending the plugin name:

type PluginRoute plugin cmdRoutes = plugin :> cmdRoutes

So finally, let’s convert a list of PluginTypes to a servant schema. We already have all the building blocks, so it’s fairly easy:

type family PluginRoutes list where
  PluginRoutes ('PluginType name cmds ': xs)
     = (PluginRoute name (CommandRoutes cmds)) :<|> PluginRoutes xs
  PluginRoutes '[] = Fail

Generating the Servant Handlers

So now we know how to get to the servant schema, but we also need the handlers that deal with the commands. How can we get from a type level list of PluginTypes to an implementation? Type classes! We just do induction on the lists using (the value level) Fail as the base case and combining the cases using (the value level) :<|>:


class HieServer (list :: [PluginType])  where
  hieServer
    :: Proxy list -> Server (PluginRoutes list)

instance HieServer '[] where
  hieServer _ = Fail

instance (KnownSymbol plugin,CommandServer cmds,HieServer xs)
          => HieServer ('PluginType plugin cmds ': xs) where
  hieServer _ =
    pluginHandler :<|> hieServer (Proxy :: Proxy xs)
    where pluginHandler
            :: Server (PluginRoute plugin (CommandRoutes cmds))
          pluginHandler =
            cmdServer (T.pack $ symbolVal (Proxy :: Proxy plugin))
                      (Proxy :: Proxy cmds)

class CommandServer (list :: [Symbol])  where
  cmdServer
    :: T.Text -> Proxy list -> Server (CommandRoutes list)

instance CommandServer '[] where
  cmdServer _ _ = Fail

instance (KnownSymbol x,CommandServer xs)
  => CommandServer (x ': xs) where
  cmdServer plugin _ =
    cmdHandler plugin
               (Proxy :: Proxy x) :<|>
    (cmdServer plugin (Proxy :: Proxy xs))

cmdHandler
  :: KnownSymbol x => T.Text -> Proxy x -> Server (CommandRoute x)
cmdHandler plugin cmd reqVal =
  case lookupCommandResponse cmd' . cmds =<<
             M.lookup plugin pluginList of
    Nothing -> left err404
    Just r -> pure r
    where cmd' = T.pack $ symbolVal cmd

Moving command and plugin names to the type level

We want to preserve the data representation we have right now since there might be a lot of code that uses it and shoving around stuff with complicated types is often not trivial, e.g. you need to hide arguments in an existential to put it in a map. It would be great if we could just tag our existing Command type with a Symbol. That’s exactly what Const is for. There is a small problem here: Const in GHC 7.10 is not polykinded, so we can’t use a Symbol here (in GHC 8.0 it will be polykinded). Luckily vinyl provides a polykinded Const in Data.Vinyl.Functor. Let’s build a function to create a tagged command:

buildCommand
  :: KnownSymbol s
  => Proxy s -> T.Text -> Vinyl.Const Command s
buildCommand name response =
  Vinyl.Const (Command (T.pack $ symbolVal name) response)

We use the KnownSymbol type class to reflect the string back to the value level. The Proxy here is not actually needed, but I found it more intuitive to specify the type in the arguments. Now we have a slight problem: we no longer have a list of Commands but a list of Vinyl.Const Command s with the s being different for every Command. Since the standard haskell list is uniform, we can’t use that anymore. Again Vinyl saves us by providing a Rec type, which takes data that varies in the last type parameter and keeps track of those parameters in a type level list. Since we want to preserve the original representation we pull out the type of the commands giving us

data Plugin cmds = Plugin { cmds :: cmds }

type UntaggedPlugin = Plugin [Command]
type TaggedPlugin cmds = Plugin (Vinyl.Rec (Vinyl.Const Command)
                                           cmds)

We need to slightly change our data

plugin1 :: TaggedPlugin '["cmd1.1","cmd1.2"]
plugin1 = Plugin (buildCommand (Proxy :: Proxy "cmd1.1")
                               "cmd1.1 response"
         Vinyl.:& buildCommand (Proxy :: Proxy "cmd1.2")
                               "cmd1.2 response"
         Vinyl.:& Vinyl.RNil)

We still don’t have the plugin name. Let’s see where we want to go and work our way backwards from there:

taggedPlugins :: Vinyl.Rec (Vinyl.Const (T.Text,UntaggedPlugin))
                 '[ 'PluginType "plugin1" _
                  , 'PluginType "plugin2" _]
taggedPlugins = tag plugin1 Vinyl.:& tag plugin2
                            Vinyl.:& Vinyl.RNil

The underscores represent the list of command names. You can either write them here manually or use PartialTypeSignatures to let GHC infer them for you if you are lazy like me. Once we have this type, we can use Vinyl.recordToList to get our original value level representation:

pluginList :: Plugins
pluginList = M.fromList $ Vinyl.recordToList taggedPlugins

So what should tag do? We’re going to define that in two steps: first we wrap it in another layer of Const, this time adding the plugin name. Then we smash them together, giving us a PluginType type parameter.

untagPlugin :: TaggedPlugin cmds -> UntaggedPlugin
untagPlugin (Plugin cmds) = Plugin $ Vinyl.recordToList cmds

retagPlugin
  :: forall name cmds.
     KnownSymbol name
  => Vinyl.Const (TaggedPlugin cmds) name
  -> Vinyl.Const (T.Text,UntaggedPlugin)
                 ('PluginType name cmds)
retagPlugin (Vinyl.Const desc) =
  Vinyl.Const $
  (T.pack $ symbolVal (Proxy :: Proxy name),untagPlugin desc)

type NamedPlugin name cmds = Vinyl.Const UntaggedPlugin
                                         ('PluginType name cmds)

tag
  :: KnownSymbol name
  => TaggedPlugin cmds
  -> Vinyl.Const (T.Text,UntaggedPlugin) ('PluginType name cmds)
tag = retagPlugin . Vinyl.Const

Hold tight we’re almost done! All that’s left is to throw away the data from the Rec type and make a Proxy out of it.

recProxy :: Vinyl.Rec f t -> Proxy t
recProxy _ = Proxy

So finally we can serve our API

serveAPI :: forall plugins.
            (HieServer plugins,HasServer (PluginRoutes plugins))
         => Proxy plugins -> IO ()
serveAPI plugins = run 8080 $ serve
  (Proxy :: Proxy (PluginRoutes plugins)) (hieServer plugins)

servePlugins :: IO ()
servePlugins = serveAPI (recProxy taggedPlugins)

Conclusion

To profit from servant’s full potential, you need to move as much information as possible into your API declaration. It might look like a fair amount of work, but considering you now get documentation & client bindings that might actually be useful, I think it’s worth a trouble (also it’s a lot of fun :)).

You can find the full code on github.

If you are interested, the PR adding this to haskell-ide-engine can be found here.