Concurrency primitives, safe memory reclamation mechanisms and non-blocking (including lock-free) data structures designed to aid in the research, design and implementation of high performance concurrent systems developed in C99+.
Modern concurrency primitives and building blocks for high performance applications.
GitHub Actions | Cirrus |
---|---|
Compilers tested in the past include gcc, clang, cygwin, icc, mingw32, mingw64
and suncc across all supported architectures. All new architectures are
required to pass the integration test and under-go extensive code review.
Continuous integration is currently enabled for the following targets:
darwin/clang/arm64
freebsd/clang/x86-64
linux/gcc/arm64
linux/gcc/x86-64
linux/clang/x86-64
Step 1.
./configure
For additional options try ./configure --help
Step 2.
In order to compile regressions (requires POSIX threads) use
make regressions
. In order to compile libck use make all
or make
.
Step 3.
In order to install use make install
To uninstall use make uninstall
.
See http://concurrencykit.org/ for more information.
Concurrency Kit supports any architecture using compiler built-ins as a
fallback. There is usually a performance degradation associated with this.
Concurrency Kit has specialized assembly for the following architectures:
aarch64
arm
ppc
ppc64
riscv64
s390x
sparcv9+
x86
x86_64
Concurrency primitives including architecture-specific ones. Provides wrappers
around CAS in case of missing native support. This also provides support for
RTM (transactional memory), pipeline control, read-for-ownership and more.
A simple and efficient (minimal noise) backoff function.
Abstracted compiler builtins when writing efficient concurrent data structures.
A scalable safe memory reclamation mechanism with support for idle threads and
various optimizations that make it better than or competitive with many
state-of-the-art solutions.
Implements support for hazard pointers, a simple and efficient lock-free safe
memory reclamation mechanism.
A simple concurrently-readable pointer array structure.
An efficient multi-reader and multi-writer concurrent bitmap structure.
Efficient concurrent bounded FIFO data structures with various performance
trade-off. This includes specialization for single-reader, many-reader,
single-writer and many-writer.
A reference implementation of the first published lock-free FIFO algorithm,
with specialization for single-enqueuer-single-dequeuer and
many-enqueuer-single-dequeuer and extensions to allow for node re-use.
A reference implementation of the above algorithm, implemented with safe memory
reclamation using hazard pointers.
A reference implementation of a Treiber stack with support for hazard pointers.
A reference implementation of an efficient lock-free stack, with specialized
variants for a variety of memory management strategies and bounded concurrency.
A concurrently readable friendly derivative of the BSD-queue interface. Coupled
with a safe memory reclamation mechanism, implement scalable read-side queues
with a simple search and replace.
An extremely efficient single-writer-many-reader hash set, that satisfies
lock-freedom with bounded concurrency without any usage of atomic operations
and allows for recycling of unused or deleted slots. This data structure is
recommended for use as a general hash-set if it is possible to compute values
from keys.
A specialization of the ck_hs
algorithm allowing for disjunct key-value pairs.
A variant of ck_hs
that utilizes robin-hood hashing to allow for improved
performance with higher load factors and high deletion rates.
An extremely efficient event counter implementation, a better alternative to
condition variables with specialization for fixed concurrency use-cases.
A plethora of execution barriers including: centralized barriers, combining
barriers, dissemination barriers, MCS barriers, tournament barriers.
A simple big-reader lock implementation, write-biased reader-writer lock with
scalable read-side locking.
An implementation of bytelocks, for research purposes, allowing for (in
theory), fast read-side acquisition without the use of atomic operations. In
reality, memory barriers are required on the fast path.
A generic lock cohorting interface, allows you to turn any lock into a
NUMA-friendly scalable NUMA lock. There is a significant trade-off in fast path
acquisition cost. Specialization is included for all relevant lock
implementations in Concurrency Kit. Learn more by reading “Lock Cohorting: A
General Technique for Designing NUMA Locks”.
A generic lock elision framework, allows you to turn any lock implementation
into an elision-aware implementation. This requires support for restricted
transactional memory by the underlying hardware.
Phase-fair reader-writer mutex that provides strong fairness guarantees between
readers and writers. Learn more by reading “Spin-Based Reader-Writer
Synchronization for Multiprocessor Real-Time Systems”.
A generic read-write lock cohorting interface, allows you to turn any
read-write lock into a NUMA-friendly scalable NUMA lock. There is a significant
trade-off in fast path acquisition cost. Specialization is included for all
relevant lock implementations in Concurrency Kit. Learn more by reading “Lock
Cohorting: A General Technique for Designing NUMA Locks”.
A simple centralized write-biased read-write lock.
A sequence counter lock, popularized by the Linux kernel, allows for very fast
read and write synchronization for simple data structures where deep copy is
permitted.
A single-writer specialized read-lock that is copy-safe, useful for data
structures that must remain small, be copied and contain in-band mutexes.
Task-fair locks are fair read-write locks, derived from “Scalable reader-writer
synchronization for shared-memory multiprocessors”.
A basic but very fast spinlock implementation.
Scalable and fast anderson spinlocks. This is here for reference, one of the
earliest scalable and fair lock implementations.
A basic spinlock utilizing compare_and_swap.
A basic spinlock, a C adaption of the older optimized Linux kernel spinlock for
x86. Primarily here for reference.
A basic spinlock utilizing atomic exchange.
An efficient implementation of the scalable CLH lock, providing many of the
same performance properties of MCS with a better fast-path.
A NUMA-friendly CLH lock.
An implementation of the seminal scalable and fair MCS lock.
An implementation of fair centralized locks.