A C++ implementation of timsort
A C++ implementation of TimSort, an O(n log n) stable sorting algorithm, ported from Python’s and OpenJDK’s.
See also the following links for a detailed description of TimSort:
This library requires at least C11. If you need a C98 version, you can check the 1.x.y branch of this repository.
According to the benchmarks, gfx::timsort
is slower than std::sort()
on randomized sequences, but
faster on partially-sorted ones. It can be used as a drop-in replacement for std::stable_sort
,
with the difference that it can’t fallback to a O(n log² n) algorithm when there isn’t enough extra heap memory
available.
gfx::timsort
also has a few additional features and guarantees compared to std::stable_sort
:
std::invoke
is available, only instances of++
or --
operators: only their prefix equivalentsvoid
.Merging sorted ranges efficiently is an important part of the TimSort algorithm. This library exposes gfx::timmerge
in the public API, a drop-in replacement for std::inplace_merge
with the difference that it
can’t fallback to a O(n log n) algorithm when there isn’t enough extra heap memory available. According to the
benchmarks, gfx::timmerge
is slower than std::inplace_merge
on heavily/randomly overlapping subranges of simple
elements, but it is faster for complex elements such as std::string
and on sparsely overlapping subranges.
Just like gfx::timsort
, gfx::timmerge
can take a projection function and avoids using the postfix ++
and --
operators.
The list of available signatures is as follows (in namespace gfx
):
// timsort
template <
typename RandomAccessIterator,
typename Compare = /* see below (1) */,
typename Projection = /* see below (2) */
>
void timsort(RandomAccessIterator const first, RandomAccessIterator const last,
Compare compare={}, Projection projection={});
template <
typename RandomAccessRange,
typename Compare = /* see below (1) */,
typename Projection = /* see below (2) */
>
void timsort(RandomAccessRange &range, Compare compare={}, Projection projection={});
// timmerge
template <
typename RandomAccessIterator,
typename Compare = /* see below (1) */,
typename Projection = /* see below (2) */
>
void timmerge(RandomAccessIterator first, RandomAccessIterator middle,
RandomAccessIterator last, Compare compare={}, Projection projection={});
In the signatures above:
std::less
specialization for the value_type
of the passed range or iterator.std::identity
.Example of using timsort with a comparison function and a projection function to sort a vector of strings by length:
#include <string>
#include <vector>
#include <gfx/timsort.hpp>
size_t len(const std::string& str) {
return str.size();
}
// Sort a vector of strings by length
std::vector<std::string> collection = { /* ... */ };
gfx::timsort(collection, std::less<std::string>{}, &len);
The library has been tested with the following compilers:
It should also work with more recent compilers, and most likely with some older compilers too. We used to guarantee
support as far back as Clang 3.8, but the new continuous integration environment doesn’t go that far back.
The library can be installed on the system via CMake with the following commands:
cmake -H. -Bbuild -DCMAKE_BUILD_TYPE=Release
cd build
make install
Alternatively the library is also available on conan-center-index and can be installed in your local Conan cache via
the following command:
conan install timsort/2.1.0
The following configuration macros allow gfx::timsort
and gfx::timmerge
to emit diagnostics, which can be helpful
to diagnose issues:
GFX_TIMSORT_ENABLE_ASSERT
inserts assertions in key locations in the algorithm to avoid logic errors.GFX_TIMSORT_ENABLE_AUDIT
inserts assertions that verify pre- and postconditions. These verifications canGFX_TIMSORT_ENABLE_LOG
inserts logs in key locations, which allow to follow more closely the flow of thecpp-TimSort follows semantic versioning and provides the following macros to retrieve the current major, minor
and patch versions:
GFX_TIMSORT_VERSION_MAJOR
GFX_TIMSORT_VERSION_MINOR
GFX_TIMSORT_VERSION_PATCH
The tests are written with Catch2 and can be compiled with CMake and run through CTest.
When using the project’s main CMakeLists.txt
, the CMake variable BUILD_TESTING
is ON
by default unless the
project is included as a subdirectory. The following CMake variables are available to change the way the tests are
built with CMake:
GFX_TIMSORT_USE_VALGRIND
: if ON
, the tests will be run through Valgrind (OFF
by default)GFX_TIMSORT_SANITIZE
: this variable takes a comma-separated list of sanitizers options to run the tests (empty by default)Benchmarks are available in the benchmarks
subdirectory, and can be constructed directly by passing BUILD_BENCHMARKS=ON
variable to CMake during the configuration step.
Example bench_sort output (timing scale: sec.):
c++ -v
Apple LLVM version 7.0.0 (clang-700.0.72)
Target: x86_64-apple-darwin14.5.0
Thread model: posix
c++ -I. -Wall -Wextra -g -DNDEBUG -O2 -std=c++11 example/bench.cpp -o .bin/bench
./.bin/bench
RANDOMIZED SEQUENCE
[int]
size 100000
std::sort 0.695253
std::stable_sort 0.868916
timsort 1.255825
[std::string]
size 100000
std::sort 3.438217
std::stable_sort 4.122629
timsort 5.791845
REVERSED SEQUENCE
[int]
size 100000
std::sort 0.045461
std::stable_sort 0.575431
timsort 0.019139
[std::string]
size 100000
std::sort 0.586707
std::stable_sort 2.715778
timsort 0.345099
SORTED SEQUENCE
[int]
size 100000
std::sort 0.021876
std::stable_sort 0.087993
timsort 0.008042
[std::string]
size 100000
std::sort 0.402458
std::stable_sort 2.436326
timsort 0.298639
Example bench_merge output (timing scale: milliseconds; omitted detailed results for different
middle iterator positions, reformatted to improve readability):
c++ -v
Using built-in specs.
...
Target: x86_64-pc-linux-gnu
...
gcc version 10.2.0 (GCC)
c++ -I ../include -Wall -Wextra -g -DNDEBUG -O2 -std=c++11 bench_merge.cpp -o bench_merge
./bench_merge
size 100000
element type\algorithm: std::inplace_merge timmerge
RANDOMIZED SEQUENCE
[int] approx. average 33.404430 37.047990
[std::string] approx. average 324.964249 210.297207
REVERSED SEQUENCE
[int] approx. average 11.441404 4.017482
[std::string] approx. average 305.649503 114.773898
SORTED SEQUENCE
[int] approx. average 4.291098 0.105571
[std::string] approx. average 158.238114 0.273858
Detailed bench_merge results for different middle iterator positions can be found at
https://github.com/timsort/cpp-TimSort/wiki/Benchmark-results