A list of command line tools for manipulating structured text data
The following is a list of text-based file formats and command line tools for manipulating each.
Tools that work with lines of fields separated by delimiters but do not necessarily support CSV field quoting.
Awk is a POSIX-standard command line tool and programming language. If you use Linux, macOS, or a BSD, you almost certainly have it installed. See below for Windows.
sedin a single Windows executable.
| Name | Description | |------|-------------| |
comm| Select the lines common to two sorted files or the lines contained in only one of them. (Manual:
man 1 common your system, GNU, FreeBSD.) | |
cut| Select portions of each line in one or more files. (Manual:
man 1 cut, GNU, FreeBSD.) | |
grep| Select the lines that match or do not match a pattern from one or more files. (Manual:
man 1 grep, GNU, FreeBSD.) | |
join| Take two files sorted by a common field and join their lines on the value of that field. Lines with values that do not appear in the other file are discarded. (Manual:
man 1 join, GNU, FreeBSD.) | |
paste| Combine several consecutive lines in a text file into one. (Manual:
man 1 paste, GNU, FreeBSD.) | |
sort| Sort lines by key fields. (Manual:
man 1 sort, GNU, FreeBSD.) | |
uniq| Find or remove repeated lines. (Manual:
man 1 uniq, GNU, FreeBSD.) |
CSV, TSV, and other delimiter-separated value formats. Tools belong on this list if they support field quoting.
| Name and link | Description | |---------------|-------------| | csv-nix-tools | List *nix system information such as environment variables, files, processes, network connections, users as CSV. Manipulate and pretty-print CSV. Execute CSV rows as commands. | | csv2md | Convert CSV to Markdown tables. | | csv2html | Convert CSV to HTML tables. | | csvfaker | Generate CSV files with fake data. Supports different types of fake data in different locales: names, cities, jobs, email addresses, and others. | | csvfix (unofficial mirror) | A multitool. Compare, filter, normalize, split, and validate CSV files. Reorder, remove, split, and merge fields. Convert data between fixed-width, multi-line, XML, and DSV format. Generate SQL statements. | | csvkit | csvkit is a suite of command-line tools for converting to and working with CSV: convert, clean, cut, grep, join, sort, stack, format, render, query, analyze, etc. | | csvquote | Transform CSV to and from a format processable with Awk-like tools. | | csvtk | Search, sample, cut, join, transpose, and sort CSV/TSV files. Rename columns. Replace fields and generate new fiends from existing fields. Plot data as vector or raster histograms and box, line, and scatter plots. Convert CSV to Markdown. Convert XLSX to CSV. Split XLSX sheets. | | dasel | See the JSON section. | | jp (sgreben) | Plot data. See the JSON section. | | Mario | See the JSON section. | | MCMD (M-Command) | Select, sample, cut, join, sort, reformat, and generate CSV files. Contains a large set of commands. | | Miller |
sortfor name-indexed data such as CSV and tabular JSON. | | pawk | Process text with Awk-like patterns, but Python code. | | rows | A Python library with a CLI. Convert between a number of file formats for tabular data: CSV, XLS, XLSX, ODS, and others. Query the data (via SQLite). Combine tables. Generate schemas. | | rq | See the [JSON section]](#json). | | tab | A non-Turing-complete statically typed programming language for data processing. An alternative to Awk. | | eBay's TSV utilities | Filtering, statistics, sampling, joins and other operations on TSV files. High performance, especially good for large datasets. Written in D. | | tv | View delimited files in the terminal. | | VisiData | Explore interactively data in TSV, CSV, XLS, XLSX, HDF5, JSON, and other formats. Introduction. | | xsv | Index, slice, analyze, split, and join CSV files. |
See the big comparison table. It covers
watch -d. | | lobar | Explore JSON interactively or process it in batch with a wrapper for
hxselect) for manipulating HTML and XML files from W3C. Written in C, quite old-fashioned, but still relevant and maintained. | | Mario | Supports XML. See the JSON section. | | pup | Query HTML pages with CSS selectors. Static binaries available for releases. Inspired by jq. | | Saxon | Query XML and HTML data with XPath. Documentation. | | sml2 | Convert between XML and SML, a simplified XML representation. | | Temme | Query HTML with CSS-like selectors to extract JSON. Temme extends CSS selectors with value capture patterns. | | tidy-html5 | Validate, fix, and reformat HTML(5), XHTML, and XML documents. Convert HTML to XHTML. | | tq | Query HTML with CSS selectors. | | Xidel | Query or modify XML and HTML pages with XPath, XQuery 3, and CSS selectors. | | xml-to-json-fast | Convert XML to JSON. Can handle very large XML files. | | xml2 | Convert XML and HTML to and from flat, greppable lists of "path=value" statements. Source code mirror. | | xmljson | Convert multiple and large XML files to JSON. Written in Swift. | | XMLLint | Query (including XSLT), validate and reformat XML documents. | | XMLStarlet | Query, modify, and validate XML documents. | | xq | jq wrapper for XML documents. | | xsltproc | Transform XML documents using XSLT and EXSLT. |
See also: Grep and Sed Equivalent for XML Command Line Processing on StackOverflow.
With a format converter like Remarshal (below) you can use JSON tools to process YAML and TOML, but make sure you do not lose data in the conversion.
| Name and link | Description | |---------------|-------------| | dasel | Supports TOML and YAML. See the JSON section. | | gojq | Supports YAML. See the JSON section. | | Mario | Supports YAML. See the JSON section. | | Remarshal | Convert between CBOR, JSON, MessagePack, TOML, and YAML. Validate each of the formats. Pretty-print JSON, TOML, and YAML. | | rq | Supports TOML and YAML. See the JSON section. | | shyaml | Query YAML. Can output null-terminated strings for use in shell scripts. | | validtoml | Validate TOML. | | validyaml | Validate or pretty-print YAML. | | yaml-tools | A set of CLI tools to manipulate YAML files (merge, delete, etc...) with comment preservation, based on ruamel.yaml. | | yq (kislyuk) | jq wrapper for YAML. | | yq (mikefarah) | Query, modify, and merge YAML. Convert to and from JSON. |
| Name and link | Description | |---------------|-------------| | hostctl | Add and remove entires in
/etc/hosts. Disable (comment out) and enable (uncomment) entires. Not idempotent. Preserves arbitrary comments above its section of the hosts file. Works with groups of entries called "profiles". | | hostess | Add and remove entires in
/etc/hosts. Disable (comment out) and enable (uncomment) entires. Check if a hostname exists. Reformat the hosts file. Convert the entries to JSON. Idempotent. Removes arbitrary comments. | | hosts | Add and remove entires in
/etc/hosts. Change a hostname's IP address. Idempotent. Preserves arbitrary comments. Can be used as a Tcl library. |
| Name and link | Platform | License | Description | |---------------|----------|---------|-------------| | cfget | Any with Python 2.x? | GNU GPLv2+ | Retrieve properties as shell script commands to set the corresponding variables (with
--dump exports). Retrieve properties' values as plain text. Substitute values from an INI file in an Autoconf-style template. Supports plug-ins. Chokes on section names and keys with spaces. | | confget | Linux, FreeBSD | Two-clause BSD | Retrieve properties and sections as shell script commands to set the corresponding variables. Retrieve properties' values as plain text. Check for existence of properties. List sections. Find values that match a pattern. Read-only. | | crudini | Any with Python 2.x | GNU GPLv2 | Retrieve properties and sections as INI fragments or shell script commands to set the corresponding variables. Retrieve properties' values as plain text. Set properties. Remove properties and sections. Create empty sections. Merge INI files. Changes files in place. | | inicomp | Windows, *nix | Apache 2.0 | Compare INI (and also Windows .reg) files. | | IniFile (DOS version) | Windows (x86, x86-64), MS-DOS | Closed-source freeware | Retrieve properties and sections as batch file commands to set the corresponding variables. Set properties. Remove properties and sections. Changes files in place. | | initool | Linux, FreeBSD, Windows | MIT | Retrieve properties and sections as INI fragments. Retrieve properties' values as plain text. Set properties. Check for existence of properties and sections. Remove properties and sections. Outputs the updated INI file. |
| Name and link | Description | |---------------|-------------| | Augeas | Query and modify a number of file formats. Not all of the formats are equally well supported by Augeas and for some only a limited subset of all valid files can be parsed. | | Elektra | Query and modify configuration files. Shares Augeas' limitations when it comes to application-specific configuration files (it uses the same lenses), but has better support for generic formats such as JSON and INI. |
| Name and link | Description | |---------------|-------------| | Squawk | Query Apache and Nginx log files. See the SQL-based tool comparison. | | lnav | Query and watch log files. Has batch and interactive mode. Supported formats include the Common Log Format, CUPS page_log, syslog, strace, and generic timestamped messages. Can perform SQL queries. |
Listed below are restricted programming language interpreters and templating tools that produce structured text output. They are generally intended to remove repetition in configuration files. They are distinct from unstructed templating tools like the
jinja2CLI program, which should not be added to this table.
| Name and link | Description | File format | |---------------|-------------|-------------| | Firebird | Firebird is a FOSS database that can be used from a single file, like SQLite. "isql is a program that allows the user to issue arbitrary SQL commands". | Binary | | Fsdb | A flat-file database for shell scripting. | Text-based, TSV with a header or "key: value" | | GNU Recutils | "[A] set of tools and libraries to access human-editable, plain text databases called recfiles." | Text-based, roughly "key: value" | | SDB | "[A] simple string key/value database based on djb's cdb disk storage and supports JSON and arrays introspection." | Binary | | sqlite3(1) | "[A] simple command-line utility [...] that allows the user to manually enter and execute SQL statements against an SQLite database." | Binary |
The contents of this document is licensed under the Creative Commons Attribution 4.0 International License. By contributing you agree to release your contribution under this license.