A simple utility to parse and split csv-files into manageable chunks
csv-split is a simple utility that can parse and break up large CSV files into smaller peices, with various options on how it does that.
Clone the repository, cd into the directory, and run the following:
make && make install
csv-split [OPTIONS] FILE OUT-PATH
or
csv-split [OPTIONS] --stdin PREFIX OUT-PATH
–g, --group-col
The zero based column with values that must remain together. If specified, csv-split will not seperate
rows with the same value in this column acros multiple files. This assumes the file is already sorted
by this colum, as csv-split doesn’t sort the file.
-n, --num-rows
The maximum number of rows to put in each file. If we’re grouping column values (see above), you can
end up with files with slightly more rows
–stdin
If specified, csv-split will read data from STDIN instead of a provided file, and the file argument
will be treated as a prefix to use when writing output chunks.
-t, --trigger
Each time csv-split writes a file, it can be configured to run a command specified by this option.
Two environment variables will be set prior to the execution of the command:
CSV-PAYLOAD_FILE -- The filename that was written
CSV_ROWCOUNT -- How many rows are in this file
-d, --header
If you pass the --header option, csv-split will treat the first row of the input csv file as a header
and inject it into each split file. By default, the header row is not counted toward the total number
of rows written per file, but can be counted if you pass 1 to this argument (e.g. -d1, --header=1).