A blocking, shuffling and loss-less compression library that can be faster than `memcpy()`.
Author | Contact | URL |
---|---|---|
Blosc Development Team | [email protected] | https://www.blosc.org |
Gitter | GH Actions | NumFOCUS | Code of Conduct |
---|---|---|---|
Note: There is a more modern version of this package called C-Blosc2
which supports many more features and is more actively maintained. Visit it at:
https://github.com/Blosc/c-blosc2
Blosc is a high performance compressor optimized for binary data.
It has been designed to transmit data to the processor cache faster
than the traditional, non-compressed, direct memory fetch approach via
a memcpy() OS call. Blosc is the first compressor (that I’m aware of)
that is meant not only to reduce the size of large datasets on-disk or
in-memory, but also to accelerate memory-bound computations.
It uses the blocking technique
so as to reduce activity in the memory bus as much as possible. In short, this
technique works by dividing datasets in blocks that are small enough
to fit in caches of modern processors and perform compression /
decompression there. It also leverages, if available, SIMD
instructions (SSE2, AVX2) and multi-threading capabilities of CPUs, in
order to accelerate the compression / decompression process to a
maximum.
See some benchmarks about Blosc performance.
Blosc is distributed using the BSD license, see LICENSE.txt for
details.
C-Blosc is not like other compressors: it should rather be called a
meta-compressor. This is so because it can use different compressors
and filters (programs that generally improve compression ratio). At
any rate, it can also be called a compressor because it happens that
it already comes with several compressor and filters, so it can
actually work like a regular codec.
Currently C-Blosc comes with support of BloscLZ, a compressor heavily
based on FastLZ (https://ariya.github.io/FastLZ/), LZ4 and LZ4HC
(https://lz4.org/), Snappy
(https://google.github.io/snappy/), Zlib (https://zlib.net/) and
Zstandard (https://facebook.github.io/zstd/).
C-Blosc also comes with highly optimized (they can use
SSE2 or AVX2 instructions, if available) shuffle and bitshuffle filters
(for info on how and why shuffling works see here).
However, additional compressors or filters may be added in the future.
Blosc is in charge of coordinating the different compressor and
filters so that they can leverage the
blocking technique
as well as multi-threaded execution (if several cores are
available) automatically. That makes that every codec and filter
will work at very high speeds, even if it was not initially designed
for doing blocking or multi-threading.
Finally, C-Blosc is specially suited to deal with binary data because
it can take advantage of the type size meta-information for improved
compression ratio by using the integrated shuffle and bitshuffle filters.
When taken together, all these features set Blosc apart from other
compression libraries.
Blosc can be built, tested and installed using CMake_.
The following procedure describes the “out of source” build.
$ cd c-blosc
$ mkdir build
$ cd build
Now run CMake configuration and optionally specify the installation
directory (e.g. ‘/usr’ or ‘/usr/local’):
$ cmake -DCMAKE_INSTALL_PREFIX=your_install_prefix_directory ..
CMake allows to configure Blosc in many different ways, like preferring
internal or external sources for compressors or enabling/disabling
them. Please note that configuration can also be performed using UI
tools provided by CMake (ccmake or cmake-gui):
$ ccmake .. # run a curses-based interface
$ cmake-gui .. # run a graphical interface
Build, test and install Blosc:
$ cmake --build .
$ ctest
$ cmake --build . --target install
The static and dynamic version of the Blosc library, together with
header files, will be installed into the specified
CMAKE_INSTALL_PREFIX.
C-Blosc comes with full sources for LZ4, LZ4HC, Snappy, Zlib and Zstd
and in general, you should not worry about not having (or CMake
not finding) the libraries in your system because by default the
included sources will be automatically compiled and included in the
C-Blosc library. This effectively means that you can be confident in
having a complete support for all the codecs in all the Blosc deployments
(unless you are explicitly excluding support for some of them).
But in case you want to force Blosc to use external codec libraries instead of
the included sources, you can do that:
$ cmake -DPREFER_EXTERNAL_ZSTD=ON ..
You can also disable support for some compression libraries:
$ cmake -DDEACTIVATE_SNAPPY=ON .. # in case you don't have a C++ compiler
In the examples/ directory
you can find hints on how to use Blosc inside your app.
Blosc is meant to support all platforms where a C89 compliant C
compiler can be found. The ones that are mostly tested are Intel
(Linux, Mac OSX and Windows) and ARM (Linux), but exotic ones as IBM
Blue Gene Q embedded “A2” processor are reported to work too.
If you run into compilation troubles when using Mac OSX, please make
sure that you have installed the command line developer tools. You
can always install them with:
$ xcode-select --install
Blosc has an official wrapper for Python. See:
https://github.com/Blosc/python-blosc
Blosc can be used from command line by using Bloscpack. See:
https://github.com/Blosc/bloscpack
For those who want to use Blosc as a filter in the HDF5 library,
there is a sample implementation in the hdf5-blosc project in:
https://github.com/Blosc/hdf5-blosc
There is an official mailing list for Blosc at:
[email protected]
https://groups.google.com/g/blosc
See THANKS.rst.
Enjoy data!