The FastPFOR C++ library: Fast integer compression
A research library with integer compression schemes.
It is broadly applicable to the compression of arrays of
32-bit integers where most integers are small.
The library seeks to exploit SIMD instructions (SSE)
whenever possible.
This library can decode at least 4 billions of compressed integers per second on most
desktop or laptop processors. That is, it can decompress data at a rate of 15 GB/s.
This is significantly faster than generic codecs like gzip, LZO, Snappy or LZ4.
It is used by the zsearch engine
as well as in GMAP and GSNAP. DuckDB derived some of their code from this library It
has been ported to Java,
C# and
Go. The Java port is used by
ClueWeb Tools.
Apache Lucene version 4.6.x uses a compression format derived from our FastPFOR
scheme.
Myth: SIMD compression requires very large blocks of integers (1024 or more).
Fact: This is not true. Our fastest scheme (SIMDBinaryPacking) works over blocks of 128 integers.
Another very fast scheme (Stream VByte) works over blocks of four integers.
Myth: SIMD compression means high speed but less compression.
Fact: This is wrong. Some schemes cannot easily be accelerated
with SIMD instructions, but many that do compress very well.
If you are working primarily with sorted lists of integers, then
you might want to use differential coding. That is you may want to
compress the deltas instead of the integers themselves. The current
library (fastpfor) is generic and was not optimized for this purpose.
However, we have another library designed to compress sorted integer
lists:
https://github.com/lemire/SIMDCompressionAndIntersection
This other library (SIMDCompressionAndIntersection) also comes complete
with new SIMD-based intersection algorithms.
There is also a C library for differential coding (fast computation of
deltas, and recovery from deltas):
https://github.com/lemire/FastDifferentialCoding
For a simple example, please see
example.cpp
in the root directory of this project.
Please see:
This library was used by several papers including the following:
It has also inspired related work such as…
This code is licensed under Apache License, Version 2.0 (ASL2.0).
This code requires a compiler supporting C++11. This was
a design decision.
It builds under
The code was tested under Windows, Linux and MacOS.
On an x64 platform, your processor should support SSSE3. This includes almost every Intel or AMD processor
sold after 2006. (Note: the key schemes require merely SSE2.) Some specific binaries will only run if your processor
supports SSE4.1. They have been purely used for specific tests however.
We also support ARM platforms through SIMDe, by wrapping.
You need cmake. On most linux distributions, you can simply do the following:
git clone https://github.com/lemire/FastPFor.git
cd FastPFor
mkdir build
cd build
cmake ..
cmake --build .
It may be necessary to set the CXX variable. The project is installable (make install
works).
To create project files for Microsoft Visual Studio, it might be useful to target 64-bit Windows (e.g., see http://www.cmake.org/cmake/help/v3.0/generator/Visual Studio 12 2013.html).
You should not assume that our objects are thread safe.
If you have several threads, each thread should have its own IntegerCODEC
objects to ensure that there is no concurrency problems.
With minor changes, all schemes will compile fine under
compilers that do not support C++11. And porting the code
to C should not be a challenge.
In any case, we already support 3 major C++ compilers so portability
is not a major issue.
Many schemes cannot be efficiently ported to Java. However
some have been. Please see:
https://github.com/lemire/JavaFastPFOR
See CSharpFastPFOR: A C# integer compression library https://github.com/Genbox/CSharpFastPFOR
See Encoding: Integer Compression Libraries for Go https://github.com/zhenjl/encoding
If you used CMake to generate the build files, the check
target will
run the unit tests. For example , if you generated Unix Makefiles
make check
will do it.
make codecs
./codecs --clusterdynamic
./codecs --uniformdynamic
Typing “make allallall” will install some testing binaries that depend
on Google Snappy. If you want to build these, you need to install
Google snappy. You can do so on a recent ubuntu machine as:
sudo apt-get install libsnappy-dev
Typing “make” will generate an “inmemorybenchmark”
executable that can process data files.
You can use it to process arrays on (sorted!) integers
on disk using the following 32-bit format: 1 unsigned 32-bit
integer indicating array length followed by the corresponding
number of 32-bit integer. Repeat.
( It is assumed that the integers are sorted.)
Once you have such a binary file somefilename you can
process it with our inmemorybenchmark:
./inmemorybenchmark --minlength 10000 somefilename
The “minlength” flag skips short arrays. (Warning: timings over
short arrays are unreliable.)
As of April 2014, we recommend getting our archive at
http://lemire.me/data/integercompression2014.html
It is the data was used for the following paper:
Daniel Lemire, Leonid Boytsov, Nathan Kurz, SIMD Compression and the Intersection of Sorted Integers, arXiv: 1401.6399, 2014
http://arxiv.org/abs/1401.6399
Our code is thoroughly tested.
One common issue is that people do not provide large enough buffers.
Some schemes can have such small compression rates that the compressed data
generated will be much larger than the input data.
I (D. Lemire) did not patent anything.
However, we implemented varint-G8UI which was patented by its authors.
DO NOT use varint-G8UI if you want to avoid patents.
The rest of the library should be patent-free.
This work was supported by NSERC grant number 26143.