Tools to manipulate CSV files from the Command Line

Cleaning data and manually processing CSV files to make them readable to scripts and other applications is a tedious task.

There are some nice tools to manipulate CSV files from the command line in Linux, which can ease some of the burden and even provide lightweight alternatives to R and other analysis tools.

The following is a list of the tools I am aware of, together with their main characteristics: an “x” means the function is covered, “-” means partial coverage, “?” means that I did not check:

Name	Check	Clean	Stats	Math	Pivot	Transpose	Select	Query	Sort	Join	Convert from Excel
datamash	x	-	x	x	x	x	x
xsv	?	x	x	x			x	x	x	x
miller	-	x	x	x	x	x (xtab)	x	x (expr)	x	x
csvkit	?	x	x				x	x (sql)	x	x	x
pivot	?				x
q	?							x (sql)

where:

Check: check file structure
Clean: file “cleaning”, such as changing separators, field delimiters, pretty printing tables, aligning fields
Stats: computation of descriptive statistics (min, max, average, variance, first and third quartile, …)
Math: computation on columns (sum, count, uniq, average, …)
Pivot: pivoting data
Transpose: transposing rows and columns
Query: performing queries and selecting data¹
Sort: sorting data²
Join: joining two CSV file over the value of a column
Convert Excel: conversion of Excel files

Recommendations

If you want to use these tools extensively, the quirks you need to pay attention to include:

managing input files formats, especially for specific cases, such as fields surrounded by quotes, field separators appearing in quoted fields
management of empty fields
management of “N/A” when piping commands

I usually end up using datamash, mainly because of their simple and clean syntax. Sometimes I use xsv to preprocess the input file. Recently, however, I started appreciating miller, which is very flexible, feature rich, and robust in managing input file formats.

See A quick tutorial on Miller for more details.

Footnotes:

Different from grep, since queries are performed on values in columns.

Similar to sort.