<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>purelyfunctional.org</title>
    <link href="http://purelyfunctional.org/atom.xml" rel="self" />
    <link href="http://purelyfunctional.org" />
    <id>http://purelyfunctional.org/atom.xml</id>
    <author>
        <name>Moritz Kiefer</name>
        <email>moritz.kiefer@purelyfunctional.org</email>
    </author>
    <updated>2018-04-02T00:00:00Z</updated>
    <entry>
    <title>Calling External Functions from JIT-compiled LLVM Modules using llvm-hs</title>
    <link href="http://purelyfunctional.org/posts/2018-04-02-llvm-hs-jit-external-function.html" />
    <id>http://purelyfunctional.org/posts/2018-04-02-llvm-hs-jit-external-function.html</id>
    <published>2018-04-02T00:00:00Z</published>
    <updated>2018-04-02T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on April  2, 2018
    
</div>

<p><code>llvm-hs</code> provides bindings to LLVM’s ORC JIT APIs. These APIs let you
JIT-compile LLVM modules and then call functions in those modules from
your Haskell code. However, sometimes you want to use external
libraries from within your LLVM module either because you want to make
use of an existing library or because it might be easier to implement
certain parts in other languages (e.g. C) than LLVM IR. Sam Griffin recently
<a href="https://github.com/llvm-hs/llvm-hs/issues/193">raised the question</a>
of how you can call functions in external libraries from a
JIT-compiled module and while I had a rough idea of how to do
this, I had never actually tried it myself. In this post, I
present my findings on how you can accomplish this for both static and
dynamic libraries.</p>
<p>We start with a very simple C file <code>lib.c</code> that defines a function
called <code>external_function</code> which returns twice its argument. This is
the function that we will attempt to call from our LLVM module.</p>
<pre><code class="language-c">#include &lt;stdint.h&gt;

int32_t external_function(int32_t x) {
    return 2 * x;
}
</code></pre>
<p>We can now compile this to an object file using <code>gcc -fPIC -c -o lib.o lib.c</code>. (<code>-fPIC</code> is only necessary when we want to produce a dynamic
library but to keep things simple we will use the same object file for
building the static and the dynamic library in this post).</p>
<p>The static library can now be created using <code>ar rcs libexternalstatic.a lib.o</code>. The dynamic library can be built using
<code>gcc -shared -o libexternaldynamic.so lib.o</code>.</p>
<p>The LLVM module <code>module.ll</code> that we will be using in this post
declares <code>external_function</code> and defines a function <code>f</code> which takes no
argument and returns the result of applying <code>external_function</code> to
<code>21</code>.</p>
<pre><code class="language-LLVM">; ModuleID = 'basic'
source_filename = &quot;&lt;string&gt;&quot;

declare i32 @external_function(i32)

define i32 @f() {
entry:
  %0 = call i32 @external_function(i32 21)
  ret i32 %0
}
</code></pre>
<p>Now that we have defined the module, we are ready to write the Haskell
code to JIT the module and then finally call the <code>f</code> function. For
this post, we will declare the module using LLVM’s textual IR and
load it using <code>llvm-hs</code>’s <code>withModuleFromLLVMAssembly</code> but building
the module using <code>llvm-hs-pure</code>’s AST works as well.</p>
<p>There are two points that you need to pay attention to if your JIT-compiled
module references external functions (for both static and dynamic
libraries):</p>
<ol>
<li>Your resolver needs some way to find the symbol. We are going to
use <code>getSymbolAdressInProcess</code> for this which is a function
provided by <code>llvm-hs</code> that will search for loaded symbols in the
current process.</li>
<li><code>getSymbolAddressInProcess</code> will only find symbols in libraries
that have been loaded before. This is accomplished by calling
<code>loadLibraryPermanently</code> before you JIT the module. You can either
pass the name of a dynamic library to <code>loadLibraryPermanently</code> or
you can pass <code>Nothing</code> (equivalent to <code>dlopen(NULL)</code>) which will
load the symbols in the current process including the symbols in
shared libraries that the executable is linked against.</li>
</ol>
<p>This leaves us with the following resolver:</p>
<pre><code class="language-haskell">resolver :: IRCompileLayer l -&gt; SymbolResolver
resolver compileLayer =
  SymbolResolver
    (\s -&gt; findSymbol compileLayer s True)
    (\s -&gt;
       fmap
         (\a -&gt; JITSymbol a (JITSymbolFlags False True))
         (getSymbolAddressInProcess s))
</code></pre>
<p>The implementation of <code>main</code> might look slightly complicated at a first glance, so let’s break it down:</p>
<ol>
<li>We first call the aforementioned <code>loadLibraryPermanently</code> function to make sure that later calls to <code>getSymbolAddressInProcess</code> will find <code>external_function</code>.</li>
<li>Then follows a bit of boilerplate to initialize the LLVM context,
load the module and create the ORC linking and compile layers.</li>
<li>We can now add the module to the ORC compile layer using
<code>withModule</code> which is a <code>bracket</code>-style wrapper around <code>addModule</code> and
<code>removeModule</code>.</li>
<li>Next, we mangle the symbol of the function that we want to call (<code>f</code> in this case) and
search for the symbol in the compile layer.</li>
<li>Pattern matching on the resulting <code>JITSymbol</code> gives us back a
<code>WordPtr</code> representing the address of <code>f</code>. We use
<code>wordPtrToPtr</code> and <code>castPtrToFunPtr</code> to convert the <code>WordPtr</code> to a
<code>FunPtr</code>.</li>
<li>Finally, we use a <a href="https://www.haskell.org/onlinereport/haskell2010/haskellch8.html#x15-1620008.5.1">dynamic foreign
import</a>
to convert the <code>FunPtr</code> to a Haskell function and call the resulting
function.</li>
</ol>
<pre><code class="language-haskell">main :: IO ()
main = do
  loadLibraryPermanently Nothing
  withContext $ \ctx -&gt;
    withModuleFromLLVMAssembly ctx (File &quot;module.ll&quot;) $ \mod' -&gt;
      withHostTargetMachine $ \tm -&gt;
        withObjectLinkingLayer $ \objectLayer -&gt;
          withIRCompileLayer objectLayer tm $ \compileLayer -&gt; do
            withModule
              compileLayer
              mod'
              (resolver compileLayer) $
              \_ -&gt; do
                mainSymbol &lt;- mangleSymbol compileLayer &quot;f&quot;
                (JITSymbol mainFn _) &lt;- findSymbol compileLayer mainSymbol True
                result &lt;- mkFun (castPtrToFunPtr (wordPtrToPtr mainFn))
                print result
</code></pre>
<p>If you want to use the dynamic library, then all that’s left to do is
to add <code>extra-libraries: externaldynamic</code> to the executable section in
our cabal file. Depending on where you placed the shared library, you
will also have to set <code>extra-lib-dirs</code> to the directory containing the
library so that it is found at link time and the <code>LD_LIBRARY_PATH</code>
environment variable to make sure it is found when you run the
executable.</p>
<p>If you want to use the static library, then things are a bit more
involved: Just adding <code>externalstatic</code> to <code>extra-libraries</code> will not
work since the linker will omit unused symbols when linking against
static libraries. Since the linker does not know about the reference to
<code>external_function</code> in our JIT compiled module, this symbol will
thereby not end up in the binary. To fix this you need to use
<code>-Wl,--whole-archive,-lexternalstatic,--no-whole-archive</code> in the
<code>ld-options</code> section in your cabal file. This will force all symbols
in the <code>externalstatic</code> library to be included in the final executable
even if they are not referenced. We also need to ensure that the
symbols end up in the dynamic symbol table since that is what
<code>getSymbolAddressInProcess</code> will look at. The corresponding flag in <code>GNU ld</code> is called <code>--export-dynamic</code> but we use GHC’s <code>-rdynamic</code> option
here (by adding it to <code>ld-options</code>) which will use <code>--export-dynamic</code>
under the hood if you’re using GNU ld (but should also support other linkers).
As for shared libraries, you might also
need to set <code>extra-lib-dirs</code> to make sure that the library is found at
link time. Since we are linking the library statically, there is no
need for messing with <code>LD_LIBRARY_PATH</code>. If you followed the steps thus far,
you might have noticed that this still does not quite work: You know
longer get symbol resolution errors but you will get a segfault.
Luckily, this can be fixed by changing the relocation model of the
target machine to PIC instead of relying on the default set by
<code>withHostTargetMachine</code> which seems to be <code>Static</code> on X86. (I think
this has the effect of preventing LLVM from emitting call instructions
to immediates but I am not entirely sure why this is necessary. If you
do know more about this, I would love here from you!).  The custom version
of <code>withHostTargetMachine</code> that sets the relocation model looks as
follows:</p>
<pre><code class="language-haskell">withHostTargetMachine :: (TargetMachine -&gt; IO a) -&gt; IO a
withHostTargetMachine f = do
  initializeAllTargets
  triple &lt;- getProcessTargetTriple
  cpu &lt;- getHostCPUName
  features &lt;- getHostCPUFeatures
  (target, _) &lt;- lookupTarget Nothing triple
  withTargetOptions $ \options -&gt;
    withTargetMachine target triple cpu features options Reloc.PIC CodeModel.Default CodeGenOpt.Default f
</code></pre>
<h2>Conclusion</h2>
<p>While calling functions in external libraries from a JIT-compiled
module is not particularly complicated, finding all the correct linker
flags can be a bit tricky especially if you are not too familiar with
linkers (which certainly applies to myself :)). Hopefully, this post
can serve as a reference and spare others from having to go through
the same trial and error process that I went through. You can find the
full code mentioned in this blogpost on
<a href="https://github.com/cocreature/llvm-hs-jit-external-lib.git">github</a>. Note
that I only tested this on Linux (specifically Archlinux 64bit), the
linker flags might be slightly different on other systems.</p>
<!-- LocalWords: JIT LLVM -->

]]></summary>
</entry>
<entry>
    <title>MonadFix and the Lazy and Strict State Monad</title>
    <link href="http://purelyfunctional.org/posts/2018-03-04-monadfix-lazy-strict-state.html" />
    <id>http://purelyfunctional.org/posts/2018-03-04-monadfix-lazy-strict-state.html</id>
    <published>2018-03-04T00:00:00Z</published>
    <updated>2018-03-04T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on March  4, 2018
    
</div>

<p>In this post, I will assume rudamentary familiarity with the different
<code>Monad</code> instances of the lazy and strict state monad and
<code>MonadFix</code>. If you are not familiar with these concepts or want to
brush up your knowledge, I recommend Kwang Yul Seo’s <a href="https://kseo.github.io/posts/2016-12-28-lazy-vs-strict-state-monad.html">post on the lazy
and strict state
monad</a>
and Will Fancher’s <a href="https://elvishjerricco.github.io/2017/08/22/monadfix-is-time-travel.html">post on
<code>MonadFix</code></a>.</p>
<p>Recently, <code>llvm-hs-pure</code> got a new API for building modules called
<code>IRBuilder</code> which makes this process significantly more convenient by
taking care of a lot of the necessary book keeping. In particular, the
API is built upon a state monad that tracks variables and creates
fresh variables as necessary, allows the use of monadic binds to refer
to operators and more. In the context of LLVM references to variables
or blocks often end up being circular, e.g., the branch instructions
in the basic blocks in a loop will form a cycle referencing each
other. While monadic binds can’t be recursive by default, <code>MonadFix</code>
and the <code>RecursiveDo</code> extension lift this restriction and thereby
allow for a very convenient API even in the presence of recursive
definitions. For a more detailed blogpost on a very similar API, I
recommend Lewis’ <a href="http://wall.org/%7Elewis/2013/10/15/asm-monad.html">post on the ASM
monad</a>.</p>
<p>Recursive functions are another case where references end up being
circular and thereby require <code>MonadFix</code>. Sadly, this usecase was
completely broken in <code>llvm-hs-pure</code> as Pavol Klacansky noticed in a
<a href="https://github.com/llvm-hs/llvm-hs/issues/181">bugreport</a>: All
attempts to build modules this way led to an infinite loop and GHC’s
infamous <code>&lt;&lt;loop&gt;&gt;</code> exception. After investigating this problem, I
figured out that replacing the strict state monad by the lazy state
monad solved the problem and lead to the expected behavior instead of
an infinite loop. In the following, I’m going to present a simplified
version of the problem and explain why the two versions differ.</p>
<p>We’ll start out by defining a very simple type representing the
instructions in our program. For this example, we only need to instructions:</p>
<ol>
<li>A <code>Dummy</code> instruction and</li>
<li>a <code>Reference</code> instruction that refers to the result of another instruction by its name.</li>
</ol>
<pre><code class="language-haskell">data Instr
  = Reference String
  | Dummy
  deriving Show
</code></pre>
<p>We can now define the <code>Builder</code> monad which is used to build the list
of instructions. <code>Builder</code> is just a type synonym for a <code>State</code> monad
with the state being a list of <code>(String, Instr)</code> pairs. We’ll also
define <code>runBuilder</code> function that run a builder with an initial state
consisting of an empty list of instructions and returns the final
list.</p>
<pre><code class="language-haskell">type Builder a = State [(String, Instr)] a

runBuilder :: Builder a -&gt; [(String, Instr)]
runBuilder a = execState a []
</code></pre>
<p>Emitting an instruction appends it to the list of instructions and
returns the name of the instruction. We also define two convenience
wrappers for emitting <code>Dummy</code> and <code>Reference</code> instructions.</p>
<pre><code class="language-haskell">emitInstr :: (String, Instr) -&gt; Builder String
emitInstr (n, i) = do
  modify (\instrs -&gt; instrs ++ [(n, i)])
  pure n

dummy :: String -&gt; Builder String
dummy n = emitInstr (n, Dummy)

reference :: String -&gt; String -&gt; Builder String
reference n ref = do
  let instr = Reference ref
  emitInstr (n, instr)
</code></pre>
<p>Finally, we can define a very simple example program consisting of a
<code>Reference</code> instruction and a <code>Dummy</code> instruction with the <code>Reference</code>
instruction referencing the <code>Dummy</code> instruction which is defined
<em>later</em> (that is why we need <code>MonadFix</code> and <code>RecursiveDo</code> here).</p>
<pre><code class="language-haskell">example :: Builder ()
example = mdo
  ref &lt;- reference &quot;ref&quot; foo
  foo &lt;- dummy &quot;foo&quot;
  pure ()
</code></pre>
<p>You can use the following definition for <code>main</code> to test this example.</p>
<pre><code class="language-haskell">main :: IO ()
main = print (runBuilder example)
</code></pre>
<p>This example will work with both the lazy and the strict state monad.
However, if we change the definition of <code>reference</code> as shown below,
running the example will result in an infinite loop.</p>
<pre><code class="language-haskell">reference :: String -&gt; String -&gt; Builder String
reference n ref = do
  let instr = Reference ref
  case ref of
    !a -&gt; emitInstr (n, instr)
</code></pre>
<p>Introducing the strict pattern match here might seem silly and in this
isolated example it definitely is. However, in general it is
definitely possible that the way an instruction is emitted depends on
the reference and thereby requires a pattern match. In <code>llvm-hs</code>, the
<code>call</code> instruction <a href="https://github.com/llvm-hs/llvm-hs/blob/944ce3849b773137e4704c23e9fef715e2c8599d/llvm-hs-pure/src/LLVM/IRBuilder/Instruction.hs#L180">checks if the callee has a void return
type</a>
which resulted in the issue mentioned above. To better understand why
the strict and the lazy monad behave differently here, I am going to
substitute the <code>Monad</code> and <code>MonadFix</code> instances and inline the
definitions.</p>
<p>Let us start by removing the use of <code>mdo</code> and replace it by an explicit use of <code>mfix</code>.</p>
<pre><code class="language-haskell">example = do
  mfix $ \foo -&gt; do
    ref &lt;- reference &quot;ref&quot; foo
    foo' &lt;- dummy &quot;foo&quot;
    pure foo'
  pure ()
</code></pre>
<p>Next, we can substitute the definition of <code>mfix</code>. Since <code>State s a</code> in
<code>transformers</code> is defined as a <code>StateT s Identity a</code>, the definition
can look a bit complicated. For this post, we are going to assume that
<code>State</code> has not been defined as a transformer and provide definitions
for this simplified version of <code>State</code>. You can see the recursion in
<code>mfix</code> by <code>a</code> occuring both on the left and on the right of <code>=</code>.</p>
<pre><code class="language-haskell">newtype State s a = State { runState :: s -&gt; (a, s) }
mfix f = State (\s -&gt; let (a, s') = runState (f a) s in (a, s'))
</code></pre>
<p>In the next step, we inline this definition of <code>mfix</code>.</p>
<pre><code class="language-haskell">example :: State [(String, Instr)] ()
example = do
  State $ \s -&gt;
    let (foo, s') =
          runState (do ref &lt;- reference &quot;ref&quot; foo
                       foo' &lt;- dummy &quot;foo&quot;
                       pure foo'
                   )
                   s
    in (foo, s')
  pure ()
</code></pre>
<p>Finally, we desugar <code>do</code> notation and inline <code>reference</code>, <code>dummy</code> and <code>runState</code>.</p>
<pre><code class="language-haskell">example :: State [(String, Instr)] ()
example = do
  State $ \s -&gt;
    let (foo, s') =
          let (ref, s'') = 
                case foo of !a -&gt; (&quot;ref&quot;, s ++ [(&quot;ref&quot;, Reference foo)])
              (foo', s''') = (&quot;foo&quot;, s'' ++ [(&quot;foo&quot;, Dummy)])
          in (foo', s''')
    in (foo, s')
  pure ()
</code></pre>
<p>The above definition uses the bind implementation of the lazy state
monad, for the strict state monad, we need to change the let statement
to be strict in the tuple (note that pattern matches in <code>let</code> statements are lazy by default):</p>
<pre><code class="language-haskell">let !(ref, s'') = 
      case foo of !a -&gt; (&quot;ref&quot;, s ++ [(&quot;ref&quot;, Reference foo)])
    (foo', s''') = (&quot;foo&quot;, s'' ++ [(&quot;foo&quot;, Dummy)])
in (foo', s''')
</code></pre>
<p>At this point, the difference becomes clear: For the strict state
monad, forcing <code>(foo, s')</code> forces <code>(ref, s'')</code> which in turn ends up
forcing <code>foo</code> which has not yet been computed so we run into an
infinite loop. For the lazy state monad, the evaluation of the <code>(ref, s'')</code> tuple and thereby also the case statement on <code>foo</code> is lazy and
thus we can first evaluate that <code>foo = &quot;foo&quot;</code> before evaluating the
<code>case</code> statement and avoid the infinite loop.</p>
<h3>Conclusion</h3>
<p>When asked what the lazy state monad is for, the most common response
is infinite states as demonstrated by Kwang in the
<a href="https://kseo.github.io/posts/2016-12-28-lazy-vs-strict-state-monad.html">post</a>
mentioned at the beginning of this post. In this article, we have seen
a different usecase in combination with <code>MonadFix</code> where monadic
actions depend on recursive bindings and the lazy state monad prevents
an infinite loop.</p>

]]></summary>
</entry>
<entry>
    <title>Haskell bindings for template-heavy C++ code</title>
    <link href="http://purelyfunctional.org/posts/2017-05-30-haskell-binding-c%2B%2B-template.html" />
    <id>http://purelyfunctional.org/posts/2017-05-30-haskell-binding-c%2B%2B-template.html</id>
    <published>2017-05-30T00:00:00Z</published>
    <updated>2017-05-30T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on May 30, 2017
    
</div>

<p>This post describes a technique for writing Haskell bindings (similar
tricks apply to other languages) to template-heavy C++ code when the
template instantiations that should be exposed are not statically
known. I am going to assume some rudimentary knowledge of C++
templates and the Haskell C FFI.</p>
<p>I originally faced this problem when trying to make the bindings to
ORC JIT in <a href="https://github.com/llvm-hs/llvm-hs">llvm-hs</a> more
flexible, so the examples used in this post will be based on the API
of ORC JIT. However, the solution is not tied to ORC JIT or LLVM and
can be applied when writing bindings to other libraries. The ORC JIT
API is composed of various compile layers which are responsible for
compiling LLVM modules to object files (The examples below call the
method responsible for this <code>compileModule</code>). There are base layers
which just compile modules directly but more importantly (for this
post), there are layers that wrap other layers and apply some sort of
transformation before passing the modified module to the underlying
layer. Ignoring all the irrelevant details, we can imagine that the
C++-API for this looks as follows:</p>
<pre><code class="language-cpp">class Module;
class Object;

// The base layer which compiles a module directly to object code. The details
// of how this is done are irrelevant for this post.
class BaseLayer {
  public:
    Object *compileModule(Module *module);
};

// A transform layer which first applies a function transforming the module
// before handing off compilation to the underlying base layer.
template &lt;typename BaseLayerT&gt; class TransformLayer {
  public:
    TransformLayer(BaseLayerT &amp;baseLayer,
                   std::function&lt;Module *(Module *)&gt; transform)
        : baseLayer(baseLayer), transform(std::move(transform)) {}
    Object *compileModule(Module *module) {
        Module *transformedModule = transform(module);
        return baseLayer.compileModule(transformedModule);
    }
    BaseLayerT &amp;baseLayer;
    std::function&lt;Module *(Module *)&gt; transform;
};
</code></pre>
<p>Being able to compose layers is great since it gives users a lot of
flexibility in how they want to build their JIT. However, it makes
providing Haskell bindings for that API tricky. Let’s first consider
what Haskell API we would like to end up with. It should expose the
same flexibility available in the C++ interface. In particular, users
should be able to choose which layers they want to use and how they
should be composed. A first attempt at the low-level API might look as
follows:</p>
<pre><code class="language-haskell">import Foreign.Ptr

data Object
data Module
data BaseLayer
data TransformLayer baseLayer

newBaseLayer :: IO (Ptr BaseLayer)
newTransformLayer :: Ptr a -&gt; FunPtr (Ptr Module -&gt; IO (Ptr Module)) -&gt; IO (Ptr (TransformLayer a))
compileModule :: Ptr a -&gt; Ptr Module -&gt; IO (Ptr Object)
</code></pre>
<p>You might have noticed that we are being to polymorphic here: Users
shouldn’t be able to use pointers to arbitrary types to be used as
compile layers. We will come back to that later.</p>
<p>Haskell does not support directly interfacing with C++, so we are
going to need to write a C wrapper to the C++ API. But we cannot write
a wrapper for <code>newTransformLayer</code>. C does not really have a concept of
polymorphism so we can’t write a function that accepts an arbitrary
layer. You might be tempted to just accept a <code>void*</code> and cast it and
hope for the best but even that will not work since calling the C++
constructor of <code>TransformLayer</code> requires statically knowing the type
of the base layer. Another non-solution would be to write different
wrappers for <code>newTransformLayer</code> for each type of base layers since
this contradicts our goal of exposing the full flexibility present in
the C++-API.</p>
<p>Before explaining the solution, let’s step back for a moment and take
a look at the situation at hand: What’s causing problems here is the
fact that C++ templates are a form of static polymorphism and we
cannot expose that via the C API. However, we can expose dynamic
polymorphism, i.e., virtual dispatch. So if LLVM would just have a
<code>CompileLayer</code> base class that <code>TransformLayer</code> and <code>BaseLayer</code>
inherit from, all would be fine. So since LLVM does not provide this
base class, let’s just write it ourselves!</p>
<pre><code class="language-cpp">class CompileLayer {
  public:
    virtual Object *compileModule(Module *module) = 0;
};
</code></pre>
<p>But <code>BaseLayer</code> and <code>TransformLayer</code> do not inherit from this new
class. So we are going to create a new class that wraps an arbitrary
compile layer, inherits from <code>CompileLayer</code> and hands of the actual
compilation to the wrapped layer.</p>
<pre><code class="language-cpp">template &lt;typename T&gt; class CompileLayerT : public CompileLayer {
  public:
    CompileLayerT(T layer) : layer(std::move(layer)) {}
    Object *compileModule(Module *module) override {
        return layer.compileModule(module);
    }
    T layer;
};
</code></pre>
<p>Now that we have the necessary machinery, we can write the
non-polymorphic C wrappers which we will use via the Haskell C
FFI. These wrappers instantiate the templates only for
<code>CompileLayer</code>. Since we can wrap the other layers in <code>CompileLayerT</code>
and upcast them to <code>CompileLayer</code> we have not lost any flexibility.</p>
<pre><code class="language-cpp">extern &quot;C&quot; {
CompileLayer *newBaseLayer() {
    return new CompileLayerT&lt;BaseLayer&gt;(BaseLayer());
}
CompileLayer *newTransformLayer(CompileLayer *baseLayer,
                                Module *(*transform)(Module *)) {
    return new CompileLayerT&lt;TransformLayer&lt;CompileLayer&gt;&gt;(
        TransformLayer&lt;CompileLayer&gt;(*baseLayer, transform));
}
Object *compileModule(CompileLayer *layer, Module *module) {
    return layer-&gt;compileModule(module);
}
}
</code></pre>
<p>Finally, we are ready to get back to the Haskell code. Writing the
bindings to the 3 C functions that we just defined is easy.</p>
<pre><code class="language-haskell">{-# LANGUAGE ForeignFunctionInterface #-}
import Foreign.Ptr

data Object
data Module
data CompileLayer

foreign import ccall newBaseLayer ::
  IO (Ptr CompileLayer)
foreign import ccall newTransformLayer ::
  Ptr CompileLayer -&gt; FunPtr (Ptr Module -&gt; IO (Ptr Module)) -&gt; IO (Ptr CompileLayer)
foreign import ccall compileModule ::
  Ptr CompileLayer -&gt; Ptr Module -&gt; IO (Ptr Object)
</code></pre>
<p>However, you have probably noticed that we have lost the separate
types for <code>BaseLayer</code> and <code>TransformLayer</code>. This is fine for the FFI
imports but we don’t want to present that API to the user. So we wrap
the above in a nicer Haskell API: We use a typeclass to represent
types which can be converted to a <code>Ptr CompileLayer</code> and add newtypes
for <code>BaseLayer</code> and <code>TransformLayer</code>. <code>TransformLayer</code> has a phantom
type parameter representing the base layer and our wrapper for
<code>newTransformLayer</code> ensures that it is correctly instantiated.</p>
<pre><code class="language-haskell">foreign import ccall newBaseLayer ::
  IO (Ptr CompileLayer)
foreign import ccall newTransformLayer ::
  Ptr CompileLayer -&gt; FunPtr (Ptr Module -&gt; IO (Ptr Module)) -&gt; IO (Ptr CompileLayer)
foreign import ccall compileModule ::
  Ptr CompileLayer -&gt; Ptr Module -&gt; IO (Ptr Object)

newtype BaseLayer = BaseLayer (Ptr CompileLayer)
newtype TransformLayer baseLayer = TransformLayer (Ptr CompileLayer)

class IsCompileLayer l where
  getCompileLayer :: l -&gt; Ptr CompileLayer

instance IsCompileLayer BaseLayer where
  getCompileLayer (BaseLayer l) = l

instance IsCompileLayer (TransformLayer l) where
  getCompileLayer (TransformLayer l) = l

newBaseLayer' :: IO BaseLayer
newBaseLayer' = BaseLayer &lt;$&gt; newBaseLayer

newTransformLayer' :: IsCompileLayer l =&gt; l -&gt; FunPtr (Ptr Module -&gt; IO (Ptr Module)) -&gt; IO (TransformLayer l)
newTransformLayer' baseLayer transform =
  TransformLayer &lt;$&gt; newTransformLayer (getCompileLayer baseLayer) transform

compileModule' :: IsCompileLayer l =&gt; l -&gt; Ptr Module -&gt; IO (Ptr Object)
compileModule' layer module' =
  compileModule (getCompileLayer layer) module'
</code></pre>
<h3>Caveats</h3>
<ol>
<li>
<p>I’ve only shown constructors to this API. Usually, you also want to
add destructors which free the allocated layers. Otherwise, you are
never going to deallocate the memory which leads to a memory
leak. Luckily, you can mark the destructor of <code>CompileLayer</code> as
<code>virtual</code> and then use the same C wrapper for all layers.</p>
</li>
<li>
<p>Turning static polymorphism into dynamic polymorphism does incur a
slight performance cost. In this case, this is probably irrelevant
but if you are wrapping a template function that wraps “small”
types, e.g., a function that accepts different types of integers
and performs cheap operations on them it might matter.</p>
</li>
</ol>

]]></summary>
</entry>
<entry>
    <title>Dynamic loading of Haskell modules</title>
    <link href="http://purelyfunctional.org/posts/2016-05-20-dynamic-loading-haskell-module.html" />
    <id>http://purelyfunctional.org/posts/2016-05-20-dynamic-loading-haskell-module.html</id>
    <published>2016-05-20T00:00:00Z</published>
    <updated>2016-05-20T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on May 20, 2016
    
</div>

<p>Even though I don’t have any particular compelling use case for
dynamic loading of Haskell modules, it is something that I’ve been
wanting to do for quite some time. Sadly I have never been able to
produce anything but crashes so far. There is the
<a href="https://hackage.haskell.org/package/plugins">plugins package</a> but I
have not gotten that to work either. The question seems to come up
from time to time, e.g. on
<a href="https://www.reddit.com/r/haskell/comments/2z6ci1/dynamic_loadable_modules/">reddit</a>,
but I have not seen an example that works so far. This morning I
decided to give it another shot and finally managed to get it to work!</p>
<p>Let us take the following module as an example:</p>
<pre><code class="language-haskell">module Plugin(f) where
f :: String
f = &quot;Monads are just monoids in the category of endofunctors, what’s the problem?&quot;
</code></pre>
<p>We want to load the module in our main executable and print the string
<code>f</code>. The code is surprisingly simple and pretty much the same that is
also used in <code>plugins</code> and similar to the code used in <code>GHCi</code>. We
first need a function to create the <code>ELF</code> symbol name in our
executable from the package, module and <code>Haskell</code> symbol name.</p>
<pre><code class="language-haskell">mangleSymbol :: Maybe String -&gt; String -&gt; String -&gt; String
mangleSymbol pkg module' valsym =
  prefixUnderscore ++
  maybe &quot;&quot; (\p -&gt; zEncodeString p ++ &quot;_&quot;) pkg ++
  zEncodeString module' ++ &quot;_&quot; ++ zEncodeString valsym ++ &quot;_closure&quot;
</code></pre>
<p>For the details of <code>prefixUnderscore</code> take a look at the
<a href="https://gist.github.com/cocreature/2e3ca5d921d08f8e0704b19b7dd186a6">complete code</a>. <code>GHCi</code>
also has a similar function called
<a href="https://github.com/ghc/ghc/blob/791947db6db32ef7d4772a821a0823e558e3c05b/compiler/ghci/ByteCodeLink.hs#L163"><code>nameToCLabel</code></a>
which can probably be used if you have a <code>Name</code> instead of dumb
strings.</p>
<p>To load our module we now only need to initialize the linker, load our
object file and lookup the symbol of the corresponding name.</p>
<pre><code class="language-haskell">main :: IO ()
main =
  do initObjLinker
     loadObj &quot;plugin.o&quot;
     _ret &lt;- resolveObjs
     ptr  &lt;- lookupSymbol (mangleSymbol Nothing &quot;Plugin&quot; &quot;f&quot;)
     case ptr of
       Nothing         -&gt; putStrLn &quot;Couldn’t load symbol&quot;
       Just (Ptr addr) -&gt; case addrToAny# addr of
                                 (# hval #) -&gt; putStrLn hval
</code></pre>
<p>If you are confused by <code>(# hval #)</code> that’s just syntax for unboxed
tuples. Also note that this is not at all typesafe. It is up to you to
ensure that the symbol has the correct type.</p>
<p>We can now compile the plugin module using <code>ghc plugin.hs</code> and our
main module using <code>ghc -package ghc test.hs</code>. However if we run <code>./test</code> we get a cryptic error:</p>
<pre><code class="language-none">test: plugin.o: unknown symbol `ghczmprim_GHCziCString_unpackCStringUtf8zh_closure'
zsh: segmentation fault (core dumped)  ./test
</code></pre>
<p>Why is this symbol not found, isn’t that a standard symbol that should
always be available? This is the point at which I gave up on my
previous tries.</p>
<p>Luckily GHC has a
<a href="https://github.com/ghc/ghc/blob/master/testsuite/tests/rts/rdynamic.hs">test</a>
doing something similar (I have no idea why I have not found it
before). The solution is to simply compile our executable using <code>ghc -package ghc -rdynamic test.hs</code>.</p>
<p>If we now run <code>test</code> we see the popular useful fact used to confuse
beginners (please don’t do that):</p>
<pre><code class="language-none">Monads are just monoids in the category of endofunctors, what’s the problem?
</code></pre>
<p>You can change the text in <code>plugin.hs</code>, recompile it and rerun
<code>./test</code> (notably without recompiling <code>test.hs</code>) and it will show the
new text.</p>
<p>Since I’ve never used <code>rdynamic</code> before I did a bit of digging. The
reason for the error is actually independent of Haskell. It turns out
that there is a so called <code>dynamic symbol table</code> in an <code>ELF</code>
executable. Dynamically loaded code can only access symbols in that
table. However by default not every symbol in the executable is added
to the <code>dynamic symbol table</code>. Passing <code>rdynamic</code> tells the linker to
add all symbols to that table no matter if they’re used or not. That
way the dynamically loaded module has access to it.</p>
<p>You can also unload a modul using <code>unloadObj</code>. Thanks to Simon Marlow
the GC then
<a href="https://phabricator.haskell.org/rGHCbdfefb3b72a71cd0afca6e7766456c0d97c47c86">unloads the object code</a>.</p>
<p>Sadly I could only test this on Linux so I have no idea if it works on
Windows or OS X.</p>
<p>I hope this is useful for someone and look forward to see if and what
people use it for.</p>

]]></summary>
</entry>
<entry>
    <title>Deriving a Servant Schema from your Data</title>
    <link href="http://purelyfunctional.org/posts/2016-01-01-servant-from-data.html" />
    <id>http://purelyfunctional.org/posts/2016-01-01-servant-from-data.html</id>
    <published>2016-01-01T00:00:00Z</published>
    <updated>2016-01-01T00:00:00Z</updated>
    <summary type="html"><![CDATA[<div class="info">
    Posted on January  1, 2016
    
</div>

<p>This post assumes some level of familiarity with the “modern Haskell
extension zoo” in particular <code>DataKinds</code>, <code>PolyKinds</code> and <code>TypeFamilies</code>.</p>
<h1>Basic Setup</h1>
<p>The scenario we are in is a bunch of static data that determines
which routes are valid and which aren’t. I got the idea for this post
while working on documentation for
<a href="https://github.com/haskell/haskell-ide-engine.git">haskell-ide-engine</a>
using
<a href="https://hackage.haskell.org/package/servant-swagger">servant-swagger</a>. I
simplify the code to make it independent of
<code>hie</code>. So <code>haskell-ide-engine</code> has a list of plugins each having a
list of commands. You can then make requests to <code>/plugin/command</code>
passing all additional parameters via a JSON object. Here a Command
consists of a name and a response that we send back when we get a
request. Let’s take a look at the types</p>
<pre><code class="language-haskell">data Command =
  Command {cmdName :: T.Text, response :: T.Text}
data Plugin = Plugin { cmds :: [Command]}
type Plugins = M.Map T.Text Plugin
</code></pre>
<p>The static data (it’s important that it’s static) looks as follows</p>
<pre><code class="language-haskell">plugin1 :: Plugin
plugin1 =
  Plugin {cmds =
            [Command &quot;cmd1.1&quot; &quot;cmd1.1 response&quot;
            ,Command &quot;cmd1.2&quot; &quot;cmd1.2 response&quot;]}

plugin2 :: Plugin
plugin2 =
  Plugin {cmds =
            [Command &quot;cmd2.1&quot; &quot;cmd2.1 response&quot;
            ,Command &quot;cmd2.2&quot; &quot;cmd2.2 response&quot;]}

pluginList :: Plugins
pluginList = M.fromList [(&quot;plugin1&quot;,plugin1),(&quot;plugin2&quot;,plugin2)]
</code></pre>
<p>Now we take a look at the corresponding servant schema and the handlers</p>
<pre><code class="language-haskell">type CommandName = T.Text
type PluginName = T.Text
type Param = T.Text
type ParamMap = M.Map T.Text T.Text
type API = Capture &quot;plugin&quot; PluginName :&gt;
           Capture &quot;command&quot; CommandName :&gt;
           ReqBody '[JSON] ParamMap :&gt;
           Post '[JSON] T.Text

lookupCommandResponse :: CommandName -&gt; [Command] -&gt; Maybe T.Text
lookupCommandResponse name =
  fmap response . find (\(Command name' _) -&gt; name == name')

server :: Server API
server plugin command params =
  case lookupCommandResponse command . cmds =&lt;&lt;
             M.lookup plugin pluginList of
    Nothing -&gt; left err404
    Just r -&gt; pure r
</code></pre>
<p>Nothing fancy going on here, we have a single route, which captures the
plugin and the command name and extracts a map of parameters from the
request body. We won’t use that map here. It’s just there to show how
this can be extended to something useful. Once we have the names we
just do a lookup returning the response if it was successful or a 404 otherwise.</p>
<h1>The Problem</h1>
<p>Obviously, the above approach works just fine but there is (at least)
one problem: Even though we know all plugins and commands at compile
time, we don’t tell servant about them. At a first glance this might
not be so bad, but if you want to generate documentation or
client bindings for that API, using something like <code>servant-swagger</code>
this is pretty bad. The documentation you can generate from a single
route with two parameters is less useful than it needs to be. Wouldn’t
it be great if we could teach servant about the existing plugins and
commands and thereby profit a lot more from the cool documentation and
binding generation servant provides?</p>
<h1>Generating the Schema</h1>
<p>Since the servant API is defined at the type level, we need to move the
names to the type level too. Luckily GHC provides the <code>GHC.TypeLits</code>
module for type level strings, and we can also reflect them back to the
value level. So let’s make a type level representation of plugin</p>
<pre><code class="language-haskell">data PluginText = PluginText Symbol [Symbol]
</code></pre>
<p><code>Symbol</code> is the equivalent of <code>String</code> at the type level. Using
<code>DataKinds</code> we get a <code>PluginType</code> kind and a <code>'PluginType</code> type constructor.
Now we need to create a valid servant schema from a list of these.
For that, we need to do induction on type level lists, so it’s nice to
have a base case, which we’ll call <code>Fail</code> for <code>:&lt;|&gt;</code>, that always fails.
This base case or identity gives us some sort of
monoid structure with <code>:&lt;|&gt;</code> being a type level <code>mappend</code> and <code>Fail</code>
being <code>mempty</code>. Note that <code>:&lt;|&gt;</code> is not strictly associative since <code>(a :&lt;|&gt; b) :&lt;|&gt; c</code> is a different type than <code>a :&lt;|&gt; (b :&lt;|&gt; c)</code>, but that
doesn’t make a difference in our case.</p>
<pre><code class="language-haskell">data Fail = Fail

instance HasServer Fail where
  type ServerT Fail m = Fail
  route _ _ _ f = f (failWith NotFound)
</code></pre>
<p>There is nothing that interesting going on, just note that we have to
fill in <code>Fail</code> on the value level for <code>Fail</code> on the type level.
Equipped with the identity for <code>:&lt;|&gt;</code>, we may move on.
Given a command as a symbol, we just use a type synonym to create a
route for it</p>
<pre><code class="language-haskell">type CommandRoute cmd = cmd :&gt; ReqBody '[JSON] ParamMap :&gt;
  Post '[JSON] T.Text
</code></pre>
<p>So what do we do if we have a list of command names? On the value
level, we just create a function and recurse on the
list. Luckily we have functions on the type level called
<code>TypeFamilies</code> so let’s use that:</p>
<pre><code class="language-haskell">type family CommandRoutes list where
  CommandRoutes '[] = Fail
  CommandRoutes (cmd ': cmds) = CommandRoute cmd :&lt;|&gt;
                                CommandRoutes cmds
</code></pre>
<p>Now that we can route a list of commands, we’ll think about how the
schema for a plugin should look. Let’s assume we already have the route for all
the commands. Now it’s simply a case of prepending the plugin name:</p>
<pre><code class="language-haskell">type PluginRoute plugin cmdRoutes = plugin :&gt; cmdRoutes
</code></pre>
<p>So finally, let’s convert a list of <code>PluginType</code>s to a servant
schema. We already have all the building blocks, so it’s fairly easy:</p>
<pre><code class="language-haskell">type family PluginRoutes list where
  PluginRoutes ('PluginType name cmds ': xs)
     = (PluginRoute name (CommandRoutes cmds)) :&lt;|&gt; PluginRoutes xs
  PluginRoutes '[] = Fail
</code></pre>
<h1>Generating the Servant Handlers</h1>
<p>So now we know how to get to the servant schema, but we also need the
handlers that deal with the commands. How can we get from a type level
list of <code>PluginType</code>s to an implementation? Type classes! We just do
induction on the lists using (the value level) <code>Fail</code>
as the base case and combining the cases using (the value level)
<code>:&lt;|&gt;</code>:</p>
<pre><code class="language-haskell">
class HieServer (list :: [PluginType])  where
  hieServer
    :: Proxy list -&gt; Server (PluginRoutes list)

instance HieServer '[] where
  hieServer _ = Fail

instance (KnownSymbol plugin,CommandServer cmds,HieServer xs)
          =&gt; HieServer ('PluginType plugin cmds ': xs) where
  hieServer _ =
    pluginHandler :&lt;|&gt; hieServer (Proxy :: Proxy xs)
    where pluginHandler
            :: Server (PluginRoute plugin (CommandRoutes cmds))
          pluginHandler =
            cmdServer (T.pack $ symbolVal (Proxy :: Proxy plugin))
                      (Proxy :: Proxy cmds)

class CommandServer (list :: [Symbol])  where
  cmdServer
    :: T.Text -&gt; Proxy list -&gt; Server (CommandRoutes list)

instance CommandServer '[] where
  cmdServer _ _ = Fail

instance (KnownSymbol x,CommandServer xs)
  =&gt; CommandServer (x ': xs) where
  cmdServer plugin _ =
    cmdHandler plugin
               (Proxy :: Proxy x) :&lt;|&gt;
    (cmdServer plugin (Proxy :: Proxy xs))

cmdHandler
  :: KnownSymbol x =&gt; T.Text -&gt; Proxy x -&gt; Server (CommandRoute x)
cmdHandler plugin cmd reqVal =
  case lookupCommandResponse cmd' . cmds =&lt;&lt;
             M.lookup plugin pluginList of
    Nothing -&gt; left err404
    Just r -&gt; pure r
    where cmd' = T.pack $ symbolVal cmd
</code></pre>
<h1>Moving command and plugin names to the type level</h1>
<p>We want to preserve the data representation we have right now since
there might be a lot of code that uses it and shoving around stuff
with complicated types is often not trivial, e.g. you need to hide
arguments in an existential to put it in a map. It would be great if
we could just tag our existing <code>Command</code> type with a <code>Symbol</code>. That’s
exactly what <code>Const</code> is for. There is a small problem here: <code>Const</code> in
GHC 7.10 is not polykinded, so we can’t use a <code>Symbol</code> here (in GHC 8.0
it will be polykinded). Luckily
<a href="https://hackage.haskell.org/package/vinyl">vinyl</a> provides a
polykinded <code>Const</code> in <code>Data.Vinyl.Functor</code>. Let’s build a function to
create a tagged command:</p>
<pre><code class="language-haskell">buildCommand
  :: KnownSymbol s
  =&gt; Proxy s -&gt; T.Text -&gt; Vinyl.Const Command s
buildCommand name response =
  Vinyl.Const (Command (T.pack $ symbolVal name) response)
</code></pre>
<p>We use the <code>KnownSymbol</code> type class to reflect the string back to the
value level. The <code>Proxy</code> here is not actually needed, but I found it
more intuitive to specify the type in the arguments.
Now we have a slight problem: we no longer have a list of <code>Commands</code>
but a list of <code>Vinyl.Const Command s</code> with the s being different for
every <code>Command</code>. Since the standard haskell list is uniform, we can’t
use that anymore. Again Vinyl saves us by providing a
<a href="https://hackage.haskell.org/package/vinyl-0.5.1/docs/Data-Vinyl-Core.html#t:Rec">Rec type</a>,
which takes data that varies in the last type parameter and keeps
track of those parameters in a type level list. Since we want to
preserve the original representation we pull out the type of the
commands giving us</p>
<pre><code class="language-haskell">data Plugin cmds = Plugin { cmds :: cmds }

type UntaggedPlugin = Plugin [Command]
type TaggedPlugin cmds = Plugin (Vinyl.Rec (Vinyl.Const Command)
                                           cmds)
</code></pre>
<p>We need to slightly change our data</p>
<pre><code class="language-haskell">plugin1 :: TaggedPlugin '[&quot;cmd1.1&quot;,&quot;cmd1.2&quot;]
plugin1 = Plugin (buildCommand (Proxy :: Proxy &quot;cmd1.1&quot;)
                               &quot;cmd1.1 response&quot;
         Vinyl.:&amp; buildCommand (Proxy :: Proxy &quot;cmd1.2&quot;)
                               &quot;cmd1.2 response&quot;
         Vinyl.:&amp; Vinyl.RNil)
</code></pre>
<p>We still don’t have the plugin name.
Let’s see where we want to go and work our way backwards from there:</p>
<pre><code class="language-haskell">taggedPlugins :: Vinyl.Rec (Vinyl.Const (T.Text,UntaggedPlugin))
                 '[ 'PluginType &quot;plugin1&quot; _
                  , 'PluginType &quot;plugin2&quot; _]
taggedPlugins = tag plugin1 Vinyl.:&amp; tag plugin2
                            Vinyl.:&amp; Vinyl.RNil
</code></pre>
<p>The underscores represent the list of command names. You can either
write them here manually or use <code>PartialTypeSignatures</code> to let GHC
infer them for you if you are lazy like me.
Once we have this type, we can use <code>Vinyl.recordToList</code> to get our
original value level representation:</p>
<pre><code class="language-haskell">pluginList :: Plugins
pluginList = M.fromList $ Vinyl.recordToList taggedPlugins
</code></pre>
<p>So what should tag do? We’re going to define that in two steps: first we
wrap it in another layer of <code>Const</code>, this time adding the plugin
name. Then we smash them together, giving us a <code>PluginType</code> type parameter.</p>
<pre><code class="language-haskell">untagPlugin :: TaggedPlugin cmds -&gt; UntaggedPlugin
untagPlugin (Plugin cmds) = Plugin $ Vinyl.recordToList cmds

retagPlugin
  :: forall name cmds.
     KnownSymbol name
  =&gt; Vinyl.Const (TaggedPlugin cmds) name
  -&gt; Vinyl.Const (T.Text,UntaggedPlugin)
                 ('PluginType name cmds)
retagPlugin (Vinyl.Const desc) =
  Vinyl.Const $
  (T.pack $ symbolVal (Proxy :: Proxy name),untagPlugin desc)

type NamedPlugin name cmds = Vinyl.Const UntaggedPlugin
                                         ('PluginType name cmds)

tag
  :: KnownSymbol name
  =&gt; TaggedPlugin cmds
  -&gt; Vinyl.Const (T.Text,UntaggedPlugin) ('PluginType name cmds)
tag = retagPlugin . Vinyl.Const
</code></pre>
<p>Hold tight we’re almost done! All that’s left is to throw away the
data from the <code>Rec</code> type and make a <code>Proxy</code> out of it.</p>
<pre><code class="language-haskell">recProxy :: Vinyl.Rec f t -&gt; Proxy t
recProxy _ = Proxy
</code></pre>
<p>So finally we can serve our API</p>
<pre><code class="language-haskell">serveAPI :: forall plugins.
            (HieServer plugins,HasServer (PluginRoutes plugins))
         =&gt; Proxy plugins -&gt; IO ()
serveAPI plugins = run 8080 $ serve
  (Proxy :: Proxy (PluginRoutes plugins)) (hieServer plugins)

servePlugins :: IO ()
servePlugins = serveAPI (recProxy taggedPlugins)
</code></pre>
<h1>Conclusion</h1>
<p>To profit from servant’s full potential, you need to move as much
information as possible into your API declaration. It might look like
a fair amount of work, but considering you now get documentation &amp;
client bindings that might actually be useful, I think it’s worth a
trouble (also it’s a lot of fun :)).</p>
<p>You can find the full code on
<a href="https://gist.github.com/cocreature/86702eae354f37f0ed8a">github</a>.</p>
<p>If you are interested, the PR adding this to <code>haskell-ide-engine</code> can
be found <a href="https://github.com/haskell/haskell-ide-engine/pull/152">here</a>.</p>

]]></summary>
</entry>

</feed>
