Minimal (34MB) LLVM build with C API

I've hesitated depending on LLVM because a full build is ~500MB. My runtime is 2.7MB so it is a huge dependence that takes several hours to build. Here I show how to build and get it down to 34MB in size.

There are few other options I am aware of, but few of them are as complete or labour-free as LLVM is. Rolling your own isn't easy as LLVM's documentation is partial and we don't have a proper instruction database with all the necessary information for Intel machines. Closest that you get is tablegen and EXEgesis.

34MB is still large and I don't like depending on C++. But at some point you see that all these side quests eat time from the things you were there to do in the first place.

Don't let Intel eat out of your bowl. Don't accept working for them without getting paid. Don't roll your own optimizing compiler for them. Use LLVM.

The build instructions

Retrieve the sources from I tried this with the LLVM version 6.0.0.

cmake \
    -DCMAKE_INSTALL_PREFIX="../llvm" \
    -DCMAKE_BUILD_TYPE="MinSizeRel" \
cmake --build . --target install

For C projects you may need llvm-config, you get it by running, though note you need to copy it yourself from the build directory:

cmake --build . --target llvm-config

Build instructions dissected

So what does each cmake rule do? The slash in the end continues a bash command to the end of the line.

Selects the directory where the llvm is installed to:


Selects the build type:


Only builds the target for host:


Tells that build tools aren't built, shortens the build time:


Configures the to be built, it is not built by default:


The source code directory, we are in build/ directory, and you get the results into llvm/. It's good practice with cmake to use a separate build directory.


How does this compare?

If you build LLVM without any parameters and request DYLIB, it's giving you 1.2GB large. Though it would appear that when Linux maintainers package software, they will immediately pick the "small" version. That may also be true with prepackaged LLVM binaries.

So what I do here could be considered "normal use" of LLVM. I still have my abhorrence toward the project due to its size, and it appears to be rightly justified. Even if you had 31 target architectures dialed in, 1.2GB is a lot of space allocated just for a compiler backend.

You might consider rolling your own optimizing compiler backend. You may be encouraged by the fact that it's can be an exciting project. Lets say it would take you a month, that is definitely not enough time, but it is usually what I think spending time on this kind of projects. Would you rather use that time on the thing that you need an optimizing compiler for?

Hour of your time is worth more than a 4GB memory stick by now. And this gets more true every year. Poll makers and telemarketers may not value your time as much, but you probably value yours. Don't let Intel architecture eat your time!

Testing that it works

I tried out the resulting binaries with the C-API example in Paul Smith's blog post and concluded that the application works, though to use the execution engine's FFI interface you may have to enable something else, like linking with the libffi.

Though if you already have a C-FFI in your language, which really maximizes what you can do with LLVM, you can use this build directly in your project.

If you need the json-encoded LLVM-C headers. I provided an utility to obtain them through my cffi-gen project, it's in that project's repository.

Note that some popular languages already have LLVM bindings that work, so for your language you may want to take a look there before starting to work on an utility that uses my binding generator. (Though, improvements and effort to make the format provided by cffi-gen more useful are welcome)

Demonstration in Lever

I replicated Paul Smith's LLVM demonstration in Lever programming language. Here's the code for that demonstration.

First we load the LLVM C API from the json-file you can produce with the cffi-gen.

llvm6 = api.read_file(dir ++ "llvm6.json")

Then we load the

llvm = ffi.library("../llvm/lib/", llvm6)

Next populate the compilation unit:

module = llvm.ModuleCreateWithName("my_module")
param_types = [llvm.Int32Type(), llvm.Int32Type()]
ret_type = llvm.FunctionType(llvm.Int32Type(), param_types, 2, 0)
sum = llvm.AddFunction(module, "sum", ret_type)
entry = llvm.AppendBasicBlock(sum, "entry")
builder = llvm.CreateBuilder()
llvm.PositionBuilderAtEnd(builder, entry)
tmp = llvm.BuildAdd(builder,
    llvm.GetParam(sum, 0),
    llvm.GetParam(sum, 1), "tmp")
llvm.BuildRet(builder, tmp)

Running the compilation unit through the verifier:

error_ref = ffi.automem(ffi.pointer(ffi.char), 1, true)
llvm.VerifyModule(module, llvm.AbortProcessAction, error_ref)

To produce code, we need to initialize some utilities in LLVM to continue:



That was the platform-specific part of this program. There would be simpler, portable commands to do this, but they're inlined functions that aren't easy to export.

The execution engine converts the compilation unit into actual code that we can call:

engine = ffi.automem(llvm.ExecutionEngineRef) = null
if llvm.CreateExecutionEngineForModule(engine, module, error_ref) != 0
    raise Error("Failed to create execution engine")
if != null
    raise Error('execution engine failure')

There are utilities to fill up arguments and call the function through an utility, but I probably forgot to load something for it to work properly. Fortunately you can also obtain an address to a function and use your own FFI to call the function:

address = llvm.GetFunctionAddress(, "sum")
functype = ffi.cfunc(ffi.i32, [ffi.i32, ffi.i32])
voidpp = ffi.automem(ffi.long, 1) = address
sum_fn = ffi.cast(voidpp, ffi.pointer(functype)).to

print("result:", sum_fn(10, 15))

I am primarily interested about evaluting programs so I don't care much about the LLVM bitcode, but it can be used:

# Write out bitcode to file
if llvm.WriteBitcodeToFile(module, "sum.bc") != 0
    print("error writing bitcode to file, skipping")

Although Lever is an object-based language with a GC running, it is unable to clear FFI-allocated handles without help. Remember to clean up:



I would have spent lot less time contemplating about things if I had done this when I was optimizing Lever's PNG loading. Also the results would have probably been more impressive.

I still have things to do with the next version of Lever, but generally I have made lot of progress with being able to usefully type-inference code written in the language to extract more invariants.

The story of ispc made me realize that even ordinary computing platforms have been parallerised for years now. And that with SPMD we can also utilize that power without portability issues between PC generations or having to write intrinsics everywhere. The ways to do this has been documented in the ispc itself.

Similar posts