zsv+lib: tabular data swiss-army knife CLI + world's fastest (simd) CSV parser
Playground (without sheet
viewer command): https://liquidaty.github.io/zsv
zsv+lib is a fast CSV parser library and extensible command-line utility. It
achieves high performance using SIMD operations, efficient memory
use and other optimization techniques, and can also parse
generic-delimited and fixed-width formats, as well as multi-row-span headers
The ZSV CLI can be compiled to virtually any target, including
WebAssembly, and offers features including select
, count
,
direct CSV sql
, flatten
, serialize
, 2json
conversion, 2db
sqlite3
conversion, stack
, pretty
, 2tsv
, compare
, paste
, overwrite
and more.
The ZSV CLI also includes sheet
, an in-console interactive grid viewer that includes
basic navigation, filtering [[, data editing and pivot table with drill down]],
and that supports custom extensions:
Pre-built CLI packages are available via brew
and nuget
.
A pre-built library package is available for Node (npm install zsv-lib
).
Please note, this package is still in alpha and currently only exposes a small
subset of the zsv library capabilities. More to come.
An online playground is available as well
(without the sheet
feature due to browser limitations)
If you like zsv+lib, do not forget to give it a star! 🌟
Preliminary performance results compare favorably vs other CSV utilities (xsv
,
tsv-utils
, csvkit
, mlr
(miller) etc). Below were results on a pre-M1 macOS
MBA; on most platforms zsvlib was 2x faster, though in some cases the advantage
was smaller e.g. 15-25%) (below, mlr not shown as it was about 25x slower):
** See 12/19 update re M1 processor at
https://github.com/liquidaty/zsv/blob/main/app/benchmark/README.md
“CSV” is an ambiguous term. This library uses the same definition as Excel. In
addition, it provides a row-level (as well as cell-level) API and provides
“normalized” CSV output (e.g. input of this"iscell1,"thisis,"cell2
becomes
"this""iscell1","thisis,cell2"
). Each of these three objectives (Excel
compatibility, row-level API and normalized output) has a measurable performance
impact; conversely, it is possible to achieve-- which a number of other CSV
parsers do-- much faster parsing speeds if any of these requirements (especially
Excel compatibility) are dropped.
zsv
is an extensible CSV utility, which uses zsvlib, for tasks such as slicing
and dicing, querying with SQL, combining, serializing, flattening,
converting between CSV/JSON/sqlite3 and more.
zsv
is streamlined for easy development of custom dynamic extensions.
zsvlib and zsv
are written in C, but since zsvlib is a library, and zsv
extensions are just shared libraries, you can extend zsv
with your own code in
any programming language, so long as it has been compiled into a shared library
that implements the expected
interface.
fread
-like functionzsv
CLI with the following built-in commands:
sheet
, an in-console interactive and extendable grid viewerselect
, count
, sql
query, desc
ribe, flatten
, serialize
, 2json
,2db
, stack
, pretty
, 2tsv
, paste
, compare
, overwrite
,jq
, prop
, rm
Download pre-built binaries and packages for macOS, Windows, Linux and BSD from
the Releases page.
You can also download pre-built binaries and packages from
Actions for the latest commits and
PRs but these are retained only for limited days.
[!IMPORTANT]
For musl libc static build, the dynamic
extensions are not supported!
[!NOTE]
After
v0.3.9-alpha
, all package artifacts will be properly
attested.
To verify, you can use GitHub CLI like this:gh attestation verify <downloaded-artifact> --repo liquidaty/zsv
…via Homebrew:
brew tap liquidaty/zsv
brew install zsv
…via MacPorts:
sudo port install zsv
For Linux (Debian/Ubuntu - *.deb
):
# Install
sudo apt install ./zsv-amd64-linux-gcc.deb
# Uninstall
sudo apt remove zsv
For Linux (RHEL/CentOS - *.rpm
):
# Install
sudo yum install ./zsv-amd64-linux-gcc.rpm
# Uninstall
sudo yum remove zsv
For Windows (*.nupkg
), install with nuget.exe
:
# Install via nuget custom feed (requires absolutes paths)
md nuget-feed
nuget.exe add zsv .\<path>\zsv-amd64-windows-mingw.nupkg -source <path>/nuget-feed
nuget.exe install zsv -version <version> -source <path>/nuget-feed
# Uninstall
nuget.exe delete zsv <version> -source <path>/nuget-feed
For Windows (*.nupkg
), install with choco.exe
:
# Install
choco.exe install zsv --pre -source <directory containing .nupkg file>
# Uninstall
choco.exe uninstall zsv
The zsv parser library is available for node:
npm install zsv-lib
Please note:
zsv
CLI is also available as a container image from
Packages.
The container image is published on every release. In addition to the specific
release tag, the image is also tagged as latest
i.e. zsv:latest
always
points the latest released version.
Example:
$ docker pull ghcr.io/liquidaty/zsv
# ...
$ cat worldcitiespop_mil.csv | docker run -i ghcr.io/liquidaty/zsv count
1000000
For image details, see Dockerfile. You may use this as a
baseline for your own use cases as needed.
In a GitHub Actions workflow, you can use zsv/setup-action
to set up zsv+zsvlib:
- name: Set up zsv+zsvlib
uses: liquidaty/zsv/setup-action@main
See zsv/setup-action/README for more details.
See BUILD.md for more details.
Our objectives, which we were unable to find in a pre-existing project, are:
\n
or \r
), embeddedThere are several excellent tools that achieve high performance. Among those we
considered were xsv and tsv-utils. While they met our performance objective,
both were designed primarily as a utility and not a library, and were not easy
enough, for our needs, to customize and/or to support modular customizations
that could be maintained (or licensed) independently of the related project (in
addition to the fact that they were written in Rust and D, respectively, which
happen to be languages with which we lacked deep experience, especially for web
assembly targeting).
Others we considered were Miller (mlr
), csvkit
and Go (csv module), which
did not meet our performance objective. We also considered various other
libraries using SIMD for CSV parsing, but none that we tried met the “real-world
CSV” objective.
Hence, zsv was created as a library and a versatile application, both optimized
for speed and ease of development for extending and/or customizing to your
needs.
zsv
comes with several built-in commands:
sheet
: an in-console, interactive grid viewerecho
: read CSV from stdin and write it back out to stdout. This is mostlyselect
: re-shape CSV by skipping leading garbage, combining header rows intosql
: treat one or more CSV files like database tables and query with SQLdesc
: provide a quick description of your table datapretty
: format for console (fixed-width) display, or convert to markdown2json
: convert CSV to JSON. Optionally, output in2tsv
: convert to TSV (tab-delimited) formatcompare
: compare two or more tables of data and output the differencespaste
(alpha): horizontally paste two tables together (given inputs X and Y,serialize
(inverse of flatten): convert an NxM table to a single 3x (Nx(M-1))flatten
(inverse of serialize): flatten a table by combining rows that sharestack
: merge CSV files verticallyjq
: run a jq
filter2db
: convert from JSON to sqlite3 dboverwrite
: overwrite a cell value; changes will be reflected in any zsvprop
: view or save parsing options associated with a file, such as initialEach of these can also be built as an independent executable named zsv_xxx
where xxx
is the command name.
After installing, run zsv help
to see usage details. The typical syntax is
zsv <command> <parameters>
e.g.
zsv sql my_population_data.csv "select * from data where population > 100000"
Simple API usage examples include:
Pull parsing:
zsv_parser parser = zsv_new(...);
while (zsv_next_row(parser) == zsv_status_row) { // for each row
// ...
size_t cell_count = zsv_cell_count(parser);
for (size_t i = 0; i < cell_count; i++) { // for each cell
struct zsv_cell c = zsv_get_cell(parser, i);
fprintf(stderr, "Cell: %.*s\n", c.len, c.str);
// ...
}
}
Push parsing:
static void my_row_handler(void *ctx) {
zsv_parser p = ctx;
size_t cell_count = zsv_cell_count(p);
for (size_t i = 0, j = zsv_cell_count(p); i < j; i++) {
// ...
}
}
int main() {
zsv_parser p = zsv_new(NULL);
zsv_set_row_handler(p, my_row_handler);
zsv_set_context(p, p);
while (zsv_parse_more(data.parser) == zsv_status_ok);
return 0;
}
Full application code examples can be found at
examples/lib/README.md.
An example of using the API, compiled to wasm and called via Javascript, is in
examples/js/README.md.
For more sophisticated (but at this time, only sporadically
commented/documented) use cases, see the various CLI C source files in the app
directory such as app/serialize.c
.
You can extend zsv
by providing a pre-compiled shared or static library that
defines the functions specified in extension_template.h
and which zsv
loads
in one of three ways:
zsv
You can build and run a sample extension by running make test
from
app/ext_example
.
The easiest way to implement your own extension is to copy and customize the
template files in app/ext_template
This release does not yet implement the full range of core features that are
planned for implementation prior to beta release. If you are interested in
helping, please post an issue.
main
main
.clang-format
(version 15 or later) for C source updates.