A compiler for functional programs on serialized data
Gibbon is an experimental compiler that transforms high-level functional programs
to operate on serialized data.
Typically, programs that process tree-like data represent trees using pointer-based
data structures in memory (one heap object per-leaf and per-node) because such a
layout is convenient to manipulate in a high-level programming language.
This is also generally distinct from the representation of the data in
serialized form on disk,
which means that a program must perform some sort or marshaling when working with serialized data.
Gibbon unifies the in-memory and serialized formats, transforming recursive
functions to operate directly on serialized data.
Additionally, while the pointer-based structure is efficient
for random access and shape-changing modifications, it can be inefficient
for traversals that process most or all of a tree in bulk.
The Gibbon project aims to explore optimizations of recursive tree transforms
by changing how trees are stored in memory.
Currently, the Gibbon compiler has multiple front-ends: an s-expression synax
similar to Typed Racket, and a small subset of Haskell.
Gibbon is implemented in Haskell, and is set up to be built with
Cabal, but it has a number of native dependencies.
Follow the instructions below to get all dependencies or enter the Nix shell
with nix-shell
to get them via Nix.
$ sudo apt-get update
$ sudo apt-get install software-properties-common
$ sudo apt-get install libgc-dev
$ sudo apt-get install libgmp-dev
$ sudo apt-get install build-essential
$ sudo apt-get install uthash-dev
$ sudo apt-get install vim wget curl
$ wget --no-check-certificate https://mirror.racket-lang.org/installers/7.5/racket-7.5-x86_64-linux.sh
$ chmod +x racket-7.5-x86_64-linux.sh
$ ./racket-7.5-x86_64-linux.sh
curl --proto '=https' --tlsv1.2 -sSf https://get-ghcup.haskell.org | BOOTSTRAP_HASKELL_NONINTERACTIVE=1 BOOTSTRAP_HASKELL_GHC_VERSION=9.4.6 BOOTSTRAP_HASKELL_CABAL_VERSION=3.8.1.0 BOOTSTRAP_HASKELL_INSTALL_STACK=1 BOOTSTRAP_HASKELL_INSTALL_HLS=1 BOOTSTRAP_HASKELL_ADJUST_BASHRC=P sh
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --default-toolchain=1.71.0
Add paths for cabal, ghcup, rust to bashrc
For Building on OSX:
You can install some of the dependencies using Homebrew:
$ brew install libgc gmp gcc ghc@9
Others require a few extra steps:
Racket: Follow the instructions on it’s website
uthash: Clone the repository and copy all the .h
files in src
to /usr/local/include
After you have both Cabal and all the dependencies installed, you can build
Gibbon from source:
$ git clone https://github.com/iu-parfunc/gibbon
$ cd gibbon && source set_env.sh
$ cd gibbon-compiler && cabal v2-build
At this point you can run the Gibbon executable:
$ cabal v2-run gibbon -- -h
If you’d like to run the testsuite, you can do so with:
$ ./run_all_tests.sh
To build the Dockerfile for dev purposes run the command below from the gibbon directory.
DOCKER_BUILDKIT=1 docker image build -t gibbon -f .devcontainer/Dockerfile .
Run the docker image using the following command.
docker run -t -i gibbon
This image does not pre-populate the gibbon folder. Use git clone to clone gibbon into a folder.
Use instructions from before to build gibbon.
To build an image with the gibbon source code already in the image run
DOCKER_BUILDKIT=1 docker image build -t gibbon -f .artifact/Dockerfile .
Run the container with
docker run -t -i gibbon
This has the gibbon source code avaiable in /gibbon
A valid Gibbon program can be written using Haskell syntax or using Racket-like s-expression syntax.
Gibbon doesn’t support every Haskell feature supported by GHC,
but informally, many simple Haskell-98 programs (sans monads) are valid Gibbon programs.
One thing to note is that the main point of entry for a Gibbon program is a
function named gibbon_main
, as opposed to the usual main
.
Here’s a simple Gibbon program that builds a binary tree and sums up its leaves in parallel
using a parallel tuple (par
):
module Main where
data Tree = Leaf Int
| Node Int Tree Tree
mkTree :: Int -> Tree
mkTree i =
if i <= 0
then Leaf 1
else
let x = (mkTree (i-1))
y = (mkTree (i-1))
in Node i x y
sumTree :: Tree -> Int
sumTree foo =
case foo of
Leaf i -> i
Node i a b ->
let tup = par (sumTree a) (sumTree b)
x = fst tup
y = snd tup
in x + y
gibbon_main = sumTree (mkTree 10)
The Gibbon compiler is able to run in several modes, which are configured via command line flags.
Most important are the flags --packed
which means “packed mode” (use serialized data structures),
--run
which means “compile then run”, and --parallel
which means “enable parallel execution”.
You can use these to run the above program as follows:
$ gibbon --run --packed --parallel Bintree.hs
This creates a file Bintree.c
which contains the C-code,
and a Bintree.exe
which is the executable for this program.
Running ./Bintree.exe
prints 1024
, the value of sumTree (mkTree 10)
.
There are many other Gibbon features which can be learned by looking at the
programs under ./examples/parallel/
, and more flags
which can be printed with gibbon --help
.
To view a complete set of primitives supported by Gibbon, you can look at the Gibbon.Prim
module located at gibbon/gibbon-stdlib/Gibbon/Prim.hs
.
This primarily stores the Gibbon
compiler, an implementation of a high-performance functional language.
This repository also contains a collection of sub-projects related to
benchmarking tree traversals and performing tree traversals on packed
representations. Here is a guide to the subdirectories:
gibbon-compiler - the prototype compiler for the Gibbon language of packed tree traversals.
gibbon - a Racket #lang for Gibbon.
ASTBenchmarks - benchmark of treewalks (compiler passes) on ASTs written with Gibbon.
Also includes scripts to fetch input datasets.
BintreeBench
- a submodule containing the tiniest binary tree microbenchmark, implemented several different languages and compilers.
core-harvest - tools to harvest realistic, large ASTs (mainly Racket) from the wild.
DEVLOG.md - detailed documentation for those hacking on this repository.