Need help with csvtk?
Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

About the developer

528 Stars 59 Forks MIT License 296 Commits 3 Opened issues


A cross-platform, efficient and practical CSV/TSV toolkit in Golang

Services available


Need anything else?

Contributors list

csvtk - A cross-platform, efficient and practical CSV/TSV toolkit


Similar to FASTA/Q format in field of Bioinformatics, CSV/TSV formats are basic and ubiquitous file formats in both Bioinformatics and data science.

People usually use spreadsheet software like MS Excel to process table data. However this is all by clicking and typing, which is not automated and is time-consuming to repeat, especially when you want to apply similar operations with different datasets or purposes.

You can also accomplish some CSV/TSV manipulations using shell commands, but more code is needed to handle the header line. Shell commands do not support selecting columns with column names either.

is convenient for rapid data investigation and also easy to integrate into analysis pipelines. It could save you lots of time in (not) writing Python/R scripts.

Table of Contents


  • Cross-platform (Linux/Windows/Mac OS X/OpenBSD/FreeBSD)
  • Light weight and out-of-the-box, no dependencies, no compilation, no configuration
  • Fast, multiple-CPUs supported (some commands)
  • Practical functions provided by N subcommands
  • Support STDIN and gziped input/output file, easy being used in pipe
  • Most of the subcommands support unselecting fields and fuzzy fields, e.g.
    -f "-id,-name"
    for all fields except "id" and "name",
    -F -f "a.*"
    for all fields with prefix "a.".
  • Support some common plots (see usage)
  • Seamlessly support for data with meta line (e.g.,
    ) of separator declaration used by MS Excel


45 subcommands in total.


  • headers
    : prints headers
  • dim
    : dimensions of CSV file
  • nrow
    : print number of records
  • ncol
    : print number of columns
  • summary
    : summary statistics of selected digital fields (groupby group fields)
  • watch
    : online monitoring and histogram of selected field
  • corr
    : calculate Pearson correlation between numeric columns

Format conversion

  • pretty
    : converts CSV to readable aligned table
  • csv2tab
    : converts CSV to tabular format
  • tab2csv
    : converts tabular format to CSV
  • space2tab
    : converts space delimited format to CSV
  • transpose
    : transposes CSV data
  • csv2md
    : converts CSV to markdown format
  • csv2json
    : converts CSV to JSON format
  • xlsx2csv
    : converts XLSX to CSV format

Set operations

  • head
    : prints first N records
  • concat
    : concatenates CSV/TSV files by rows
  • sample
    : sampling by proportion
  • cut
    : selects parts of fields
  • grep
    : greps data by selected fields with patterns/regular expressions
  • uniq
    : unique data without sorting
  • freq
    : frequencies of selected fields
  • inter
    : intersection of multiple files
  • filter
    : filters rows by values of selected fields with arithmetic expression
  • filter2
    : filters rows by awk-like arithmetic/string expressions
  • join
    : join files by selected fields (inner, left and outer join)
  • split
    splits CSV/TSV into multiple files according to column values
  • splitxlsx
    : splits XLSX sheet into multiple sheets according to column values
  • collapse
    : collapses one field with selected fields as keys
  • comb
    : compute combinations of items at every row


  • add-header
    : add column names
  • del-header
    : delete column names
  • rename
    : renames column names with new names
  • rename2
    : renames column names by regular expression
  • replace
    : replaces data of selected fields by regular expression
  • round
    : round float to n decimal places
  • mutate
    : creates new columns from selected fields by regular expression
  • mutate2
    : creates new column from selected fields by awk-like arithmetic/string expressions
  • sep
    : separate column into multiple columns
  • gather
    : gathers columns into key-value pairs


  • sort
    : sorts by selected fields



  • cat
    stream file and report progress
  • version
    print version information and check for update
  • genautocomplete
    generate shell autocompletion script


Download Page

is implemented in Go programming language, executable binary files for most popular operating systems are freely available in release page.

Method 1: Download binaries (latest stable/dev version)

Just download compressed executable file of your operating system, and decompress it with

tar -zxvf *.tar.gz
command or other tools. And then:
  1. For Linux-like systems

    1. If you have root privilege simply copy it to

      sudo cp csvtk /usr/local/bin/
    2. Or copy to anywhere in the environment variable

      mkdir -p $HOME/bin/; cp csvtk $HOME/bin/
  2. For windows, just copy


Method 2: Install via conda (latest stable version) Anaconda Cloud downloads

conda install -c bioconda csvtk

Method 3: For Go developer (latest stable/dev version)

go get -u

Method 4: For ArchLinux AUR users (may be not the latest)

yaourt -S csvtk


Note: The current version supports Bash only. This should work for *nix systems with Bash installed.


  1. run:

    csvtk genautocomplete
  2. create and edit

    file if you don't have it.
    nano ~/.bash_completion

    add the following:

    for bcfile in ~/.bash_completion.d/* ; do
      . $bcfile

Compared to

csvkit, attention: this table wasn't updated for 2 years.


csvtk csvkit Note
Read Gzip Yes Yes read gzip files
Fields ranges Yes Yes e.g.

-f 1-4,6
Unselect fileds Yes -- e.g.
for excluding first column
Fuzzy fields Yes -- e.g.
for columns with name prefix "ab"
Reorder fields Yes Yes it means
-f 1,2
is different from
-f 2,1
Rename columns Yes -- rename with new name(s) or from existed names
Sort by multiple keys Yes Yes bash sort like operations
Sort by number Yes -- e.g.
-k 1:n
Multiple sort Yes -- e.g.
-k 2:r -k 1:nr
Pretty output Yes Yes convert CSV to readable aligned table
Unique data Yes -- unique data of selected fields
frequency Yes -- frequencies of selected fields
Sampling Yes -- sampling by proportion
Mutate fields Yes -- create new columns from selected fields
Repalce Yes -- replace data of selected fields

Similar tools:

  • csvkit - A suite of utilities for converting to and working with CSV, the king of tabular file formats.
  • xsv - A fast CSV toolkit written in Rust.
  • miller - Miller is like sed, awk, cut, join, and sort for name-indexed data such as CSV and tabular JSON
  • tsv-utils - Command line utilities for tab-separated value files written in the D programming language.


More examples and tutorial.


  1. The CSV parser requires all the lines have same number of fields/columns. Even lines with spaces will cause error. Use '-I/--ignore-illegal-row' to skip these lines if neccessary.
  2. By default, csvtk thinks your files have header row, if not, switch flag
  3. Column names better be unique.
  4. By default, lines starting with
    will be ignored, if the header row starts with
    , please assign flag
    another rare symbol, e.g.
  5. By default, csvtk handles CSV files, use flag
    for tab-delimited files.
  6. If
    exists in tab-delimited files, use flag
  7. Do not mix use digital fields and column names.


  1. Pretty result

    $ csvtk pretty names.csv
    id   first_name   last_name   username
    11   Rob          Pike        rob
    2    Ken          Thompson    ken
    4    Robert       Griesemer   gri
    1    Robert       Thompson    abc
    NA   Robert       Abel        123
  2. Summary of selected digital fields, supporting "group-by"

    $ cat testdata/digitals2.csv \
        | csvtk summary --ignore-non-digits --fields f4:sum,f5:sum --groups f1,f2 \
        | csvtk pretty
    f1    f2     f4:sum   f5:sum
    bar   xyz    7.00     106.00
    bar   xyz2   4.00     4.00
    foo   bar    6.00     3.00
    foo   bar2   4.50     5.00
  3. Select fields/columns (

- By index: `csvtk cut -f 1,2`
- By names: `csvtk cut -f first_name,username`
- **Unselect**: `csvtk cut -f -1,-2` or `csvtk cut -f -first_name`
- **Fuzzy fields**: `csvtk cut -F -f "*_name,username"`
- Field ranges: `csvtk cut -f 2-4` for column 2,3,4 or `csvtk cut -f -3--1` for discarding column 1,2,3
- All fields: `csvtk cut -F -f "*"`
  1. Search by selected fields (
    ) (matched parts will be highlighted as red)
- By exactly matching: `csvtk grep -f first_name -p Robert -p Rob`
- By regular expression: `csvtk grep -f first_name -r -p Rob`
- By pattern list: `csvtk grep -f first_name -P name_list.txt`
- Remore rows containing missing data (NA): `csvtk grep -F -f "*" -r -p "^$" -v `
  1. Rename column names (
- Setting new names: `csvtk rename -f A,B -n a,b` or `csvtk rename -f 1-3 -n a,b,c`
- Replacing with original names by regular express: `cat ../testdata/c.csv | ./csvtk rename2 -F -f "*" -p "(.*)" -r 'prefix_$1'` for adding prefix to all column names.
  1. Edit data with regular expression (
- Remove Chinese charactors:  `csvtk replace -F -f "*_name" -p "\p{Han}+" -r ""`
  1. Create new column from selected fields by regular expression (
- In default, copy a column: `csvtk mutate -f id `
- Extract prefix of data as group name (get "A" from "A.1" as group name):
  `csvtk mutate -f sample -n group -p "^(.+?)\."`
  1. Sort by multiple keys (
- By single column : `csvtk sort -k 1` or `csvtk sort -k last_name`
- By multiple columns: `csvtk sort -k 1,2` or `csvtk sort -k 1 -k 2` or `csvtk sort -k last_name,age`
- Sort by number: `csvtk sort -k 1:n` or  `csvtk sort -k 1:nr` for reverse number
- Complex sort: `csvtk sort -k region -k age:n -k id:nr`
- In natural order: `csvtk sort -k chr:N`
  1. Join multiple files by keys (
- All files have same key column: `csvtk join -f id file1.csv file2.csv`
- Files have different key columns: `csvtk join -f "username;username;name" names.csv phone.csv adress.csv -k`
  1. Filter by numbers (
- Single field: `csvtk filter -f "id>0"`
- **Multiple fields**: `csvtk filter -f "1-3>0"`
- Using `--any` to print record if any of the field satisfy the condition: `csvtk filter -f "1-3>0" --any`
- **fuzzy fields**: `csvtk filter -F -f "A*!=0"`
  1. Filter rows by awk-like arithmetic/string expressions (
- Using field index: `csvtk filter2 -f '$3>0'`
- Using column names: `csvtk filter2 -f '$id > 0'`
- Both arithmetic and string expressions: `csvtk filter2 -f '$id > 3 || $username=="ken"'`
- More complicated: `csvtk filter2 -H -t -f '$1 > 2 && $2 % 2 == 0'`
  1. Ploting

    • plot histogram with data of the second column:

      csvtk -t plot hist testdata/grouped_data.tsv.gz -f 2 | display


- plot boxplot with data of the "GC Content" (third) column,
group information is the "Group" column.

    csvtk -t plot box testdata/grouped_data.tsv.gz -g "Group" \
        -f "GC Content" --width 3 | display


  • plot horiz boxplot with data of the "Length" (second) column, group information is the "Group" column.

     csvtk -t plot box testdata/grouped_data.tsv.gz -g "Group" -f "Length"  \
         --height 3 --width 5 --horiz --title "Horiz box plot" | display


  • plot line plot with X-Y data

      csvtk -t plot line testdata/xy.tsv -x X -y Y -g Group | display


  • plot scatter plot with X-Y data

      csvtk -t plot line testdata/xy.tsv -x X -y Y -g Group --scatter | display



We are grateful to Zhiluo Deng and Li Peng for suggesting features and reporting bugs.

Thanks Albert Vilella for features suggestion, which makes csvtk feature-rich。


Create an issue to report bugs, propose new functions or ask for help.

Or leave a comment.


MIT License


Stargazers over time

We use cookies. If you continue to browse the site, you agree to the use of cookies. For more information on our use of cookies please see our Privacy Policy.