How To

Run SQL Queries on CSV Files from the Command Line

Working with CSV files is routine for sysadmins, data engineers, and developers. But loading CSVs into a full database just to run a quick query is overkill. Several command-line tools let you run SQL queries directly on CSV files – no database server required.

Original content from computingforgeeks.com - post 9009

This guide covers the best tools for querying CSV files with SQL from the terminal. We start with lightweight Python-based options and move through SQLite, DuckDB, and Miller. Each tool has different strengths depending on file size, query complexity, and your workflow.

Prerequisites

To follow along, you need:

  • A Linux or macOS system with terminal access
  • Python 3 with pip (for q and csvkit)
  • SQLite 3 (pre-installed on most Linux distributions)
  • Basic familiarity with SQL syntax

We will use a sample CSV file throughout the examples. Create it first:

cat > employees.csv << 'EOF'
id,name,department,salary,start_date
1,Alice,Engineering,95000,2019-03-15
2,Bob,Marketing,72000,2020-07-01
3,Carol,Engineering,105000,2018-01-20
4,Dave,Sales,68000,2021-11-10
5,Eve,Engineering,98000,2020-02-28
6,Frank,Marketing,75000,2019-09-05
7,Grace,Sales,71000,2022-04-18
8,Hank,Engineering,110000,2017-06-12
9,Ivy,Marketing,69000,2023-01-03
10,Jack,Sales,73000,2021-08-22
EOF

For the JOIN examples later, create a second file:

cat > departments.csv << 'EOF'
department,manager,budget
Engineering,Alice,500000
Marketing,Frank,200000
Sales,Dave,150000
EOF

Step 1: Install q – Run SQL on CSV Files with Python

The q tool is a Python utility that lets you run SQL queries directly on CSV and TSV files. It treats each file as a virtual table and supports most standard SQL syntax. Install it with pip:

pip install q-text-as-data

Confirm the installation by checking the version:

q --version

The q tool auto-detects delimiters in most cases, but you can specify them with -d for delimiter and -H to indicate the file has a header row.

Step 2: Basic SQL Queries on CSV – SELECT, WHERE, ORDER BY

With q installed, you can run standard SQL against your CSV files immediately. The file path acts as the table name in the FROM clause.

Select all rows from the CSV file:

q -H -d, "SELECT * FROM employees.csv"

This returns all rows and columns from the file. The -H flag tells q to use the first row as column headers, and -d, sets the comma delimiter.

Filter rows with a WHERE clause to find employees in the Engineering department:

q -H -d, "SELECT name, salary FROM employees.csv WHERE department='Engineering'"

The output shows only Engineering staff with their salaries:

Alice,95000
Carol,105000
Eve,98000
Hank,110000

Sort results by salary in descending order:

q -H -d, "SELECT name, department, salary FROM employees.csv ORDER BY salary DESC"

This ranks all employees from highest to lowest salary:

Hank,Engineering,110000
Carol,Engineering,105000
Eve,Engineering,98000
Alice,Engineering,95000
Frank,Marketing,75000
Jack,Sales,73000
Bob,Marketing,72000
Grace,Sales,71000
Ivy,Marketing,69000
Dave,Sales,68000

Combine WHERE with ORDER BY and LIMIT to get the top 3 highest-paid engineers:

q -H -d, "SELECT name, salary FROM employees.csv WHERE department='Engineering' ORDER BY salary DESC LIMIT 3"

Expected output showing the top earners:

Hank,110000
Carol,105000
Eve,98000

Step 3: Aggregate Functions – COUNT, SUM, AVG, GROUP BY

SQL aggregation works the same way on CSV files as it does in a database. This is where querying CSV files from the command line becomes genuinely powerful for quick data analysis.

Count employees per department:

q -H -d, "SELECT department, COUNT(*) as headcount FROM employees.csv GROUP BY department"

The result shows the distribution across departments:

Engineering,4
Marketing,3
Sales,3

Calculate the average salary per department:

q -H -d, "SELECT department, AVG(salary) as avg_salary, MIN(salary) as min_salary, MAX(salary) as max_salary FROM employees.csv GROUP BY department"

This gives a salary breakdown for each team:

Engineering,102000.0,95000,110000
Marketing,72000.0,69000,75000
Sales,70666.6666666667,68000,73000

Get the total payroll across all departments:

q -H -d, "SELECT SUM(salary) as total_payroll FROM employees.csv"

The total sum of all salaries:

836000

Step 4: JOIN Multiple CSV Files

One of the most useful features of SQL on CSV files is the ability to JOIN data from separate files. This eliminates the need to merge files manually or write custom scripts.

Join the employees and departments files to see each employee alongside their department manager and budget:

q -H -d, "SELECT e.name, e.department, e.salary, d.manager, d.budget FROM employees.csv e JOIN departments.csv d ON e.department = d.department"

The result merges data from both CSV files into a single output:

Alice,Engineering,95000,Alice,500000
Bob,Marketing,72000,Frank,200000
Carol,Engineering,105000,Alice,500000
Dave,Sales,68000,Dave,150000
Eve,Engineering,98000,Alice,500000
Frank,Marketing,75000,Frank,200000
Grace,Sales,71000,Dave,150000
Hank,Engineering,110000,Alice,500000
Ivy,Marketing,69000,Frank,200000
Jack,Sales,73000,Dave,150000

You can add aggregations on top of joins. For example, find each department’s total salary as a percentage of its budget:

q -H -d, "SELECT e.department, SUM(e.salary) as total_salary, d.budget, ROUND(CAST(SUM(e.salary) AS FLOAT) / d.budget * 100, 1) as pct_used FROM employees.csv e JOIN departments.csv d ON e.department = d.department GROUP BY e.department"

This shows how much of each department’s budget goes to salaries:

Engineering,408000,500000,81.6
Marketing,216000,200000,108.0
Sales,212000,150000,141.3

Step 5: Query CSV Files with csvkit (csvsql)

csvkit is a suite of command-line tools for working with CSV files. The csvsql command from csvkit lets you run SQL queries on CSV data, and the toolkit includes other useful utilities like csvcut, csvgrep, and csvstat. The current version is 2.2.0.

Install csvkit with pip:

pip install csvkit

Run a SQL query on the employees CSV file using csvsql:

csvsql --query "SELECT name, department, salary FROM employees WHERE salary > 90000 ORDER BY salary DESC" employees.csv

The output includes headers by default, which is a nice touch over q:

name,department,salary
Hank,Engineering,110000
Carol,Engineering,105000
Eve,Engineering,98000
Alice,Engineering,95000

Notice that csvsql uses the filename without the .csv extension as the table name. So employees.csv becomes the employees table in your query.

csvkit also includes standalone tools that do not require SQL at all. Use csvstat for quick statistics:

csvstat employees.csv

This prints summary statistics for every column – min, max, mean, unique values, and more. Use csvcut to select specific columns and csvgrep for filtering rows without writing SQL.

Step 6: Use SQLite to Query CSV Files

SQLite is pre-installed on most Linux systems and macOS. It can import CSV files directly into an in-memory database for querying. This approach gives you the full power of SQLite’s SQL dialect without installing any additional tools.

Import and query a CSV file in a single command:

sqlite3 :memory: -cmd ".mode csv" -cmd ".import employees.csv employees" "SELECT department, COUNT(*) as count, AVG(salary) as avg_salary FROM employees GROUP BY department ORDER BY avg_salary DESC;"

SQLite returns the grouped results with averages sorted highest first:

Engineering,4,102000.0
Marketing,3,72000.0
Sales,3,70666.6666666667

The :memory: argument creates a temporary in-memory database that disappears when the command finishes. The .import directive loads the CSV file into a table named employees.

For joining multiple CSV files with SQLite, import both files before running the query:

sqlite3 :memory: \
  -cmd ".mode csv" \
  -cmd ".import employees.csv employees" \
  -cmd ".import departments.csv departments" \
  "SELECT e.name, e.department, e.salary, d.manager FROM employees e JOIN departments d ON e.department = d.department WHERE e.salary > 90000;"

This returns only the high earners matched with their department managers:

Alice,Engineering,95000,Alice
Carol,Engineering,105000,Alice
Eve,Engineering,98000,Alice
Hank,Engineering,110000,Alice

SQLite is the best option when you need full SQL support and do not want to install anything extra. If you manage databases regularly, check out our guide on managing MySQL, PostgreSQL, and SQL Server using SQLPad Editor.

Step 7: Query CSV Files with DuckDB

DuckDB is an embedded analytical database designed for fast processing of large datasets. It reads CSV, Parquet, and JSON files natively and handles files with millions of rows far faster than q, csvkit, or SQLite. The current stable release is version 1.5.0.

Install DuckDB on Linux:

curl -fsSL https://install.duckdb.org | sh

On macOS with Homebrew:

brew install duckdb

Query a CSV file directly without importing it first:

duckdb -c "SELECT department, COUNT(*) as headcount, ROUND(AVG(salary), 0) as avg_salary FROM 'employees.csv' GROUP BY department ORDER BY avg_salary DESC"

DuckDB auto-detects the CSV format and returns formatted output:

┌─────────────┬───────────┬────────────┐
│ department  │ headcount │ avg_salary │
│   varchar   │   int64   │   double   │
├─────────────┼───────────┼────────────┤
│ Engineering │         4 │   102000.0 │
│ Marketing   │         3 │    72000.0 │
│ Sales       │         3 │    70667.0 │
└─────────────┴───────────┴────────────┘

DuckDB handles JOINs across CSV files just as easily:

duckdb -c "SELECT e.name, e.department, e.salary, d.budget FROM 'employees.csv' e JOIN 'departments.csv' d ON e.department = d.department WHERE e.salary > 90000 ORDER BY e.salary DESC"

The results include column headers and type information:

┌─────────┬─────────────┬────────┬────────┐
│  name   │ department  │ salary │ budget │
│ varchar │   varchar   │ int64  │ int64  │
├─────────┼─────────────┼────────┼────────┤
│ Hank    │ Engineering │ 110000 │ 500000 │
│ Carol   │ Engineering │ 105000 │ 500000 │
│ Eve     │ Engineering │  98000 │ 500000 │
│ Alice   │ Engineering │  95000 │ 500000 │
└─────────┴─────────────┴────────┴────────┘

Where DuckDB really shines is with large files. A CSV with a million rows that takes q or csvsql 30+ seconds to process finishes in under a second with DuckDB. It also supports reading compressed files (.csv.gz), Parquet, and JSON out of the box.

Step 8: Process CSV Files with Miller (mlr)

Miller (mlr) is a data processing tool that handles CSV, TSV, JSON, and other structured formats. It does not use SQL syntax – instead it has its own verb-based language. The current version is 6.17.0. Miller is a good fit when you need data transformations beyond what SQL offers.

Install Miller on Ubuntu/Debian:

sudo apt install miller

On macOS:

brew install miller

On Fedora/RHEL:

sudo dnf install miller

Filter rows using the filter verb, similar to a WHERE clause:

mlr --csv filter '$department == "Engineering"' employees.csv

The output keeps CSV format with headers:

id,name,department,salary,start_date
1,Alice,Engineering,95000,2019-03-15
3,Carol,Engineering,105000,2018-01-20
5,Eve,Engineering,98000,2020-02-28
8,Hank,Engineering,110000,2017-06-12

Sort by salary and select specific fields:

mlr --csv cut -f name,department,salary then sort-by -nr salary employees.csv

This produces a sorted list with only the selected columns:

name,department,salary
Hank,Engineering,110000
Carol,Engineering,105000
Eve,Engineering,98000
Alice,Engineering,95000
Frank,Marketing,75000
Jack,Sales,73000
Bob,Marketing,72000
Grace,Sales,71000
Ivy,Marketing,69000
Dave,Sales,68000

Run aggregations with the stats1 verb, which works like GROUP BY:

mlr --csv stats1 -a count,mean,min,max -f salary -g department employees.csv

The result shows full statistics per department:

department,salary_count,salary_mean,salary_min,salary_max
Engineering,4,102000,95000,110000
Marketing,3,72000,69000,75000
Sales,3,70666.666667,68000,73000

Miller excels at converting between formats. Convert CSV to JSON for use with other tools:

mlr --icsv --ojson head -n 2 employees.csv

This converts the first two CSV rows to JSON format:

[
{ "id": 1, "name": "Alice", "department": "Engineering", "salary": 95000, "start_date": "2019-03-15" },
{ "id": 2, "name": "Bob", "department": "Marketing", "salary": 72000, "start_date": "2020-07-01" }
]

Step 9: Tool Comparison – Which CSV Query Tool to Use

Each tool has trade-offs. The right choice depends on your file size, how often you query CSV files, and whether you prefer SQL or a domain-specific language.

ToolBest ForSQL Support
qQuick one-off SQL queries on small CSV filesFull SQLite SQL
csvkit (csvsql)CSV toolkit with SQL plus extra utilities like csvstat, csvcutFull SQL via SQLAlchemy
SQLiteNo install needed – available everywhere, full SQL powerFull SQLite SQL
DuckDBLarge files (millions of rows), analytics, Parquet supportFull SQL with extensions
Miller (mlr)Data transformations, format conversion (CSV to JSON), streamingNo – uses verb-based DSL

For files under 100,000 rows, all tools perform acceptably. Once you cross into millions of rows, DuckDB is the clear winner – it uses columnar processing and parallelism that the other tools lack. For data pipelines and format conversion, Miller’s verb-based approach is more natural than SQL.

If you only need occasional CSV queries and do not want to install anything, SQLite with .import is the simplest path. For daily use, install DuckDB and q – DuckDB for heavy lifting and q for quick lookups.

Conclusion

Running SQL queries on CSV files from the command line saves time and avoids spinning up a database for simple data analysis. The q tool and csvsql work well for small files, SQLite is universally available, DuckDB handles large-scale analytics, and Miller covers format transformation workflows.

For production data pipelines, consider loading frequently queried CSV data into a proper database like PostgreSQL or MariaDB. For ad-hoc analysis and quick data exploration, these command-line tools are all you need.

Related Articles

Databases Install and Configure OrientDB on Ubuntu 22.04|20.04|18.04 Databases How To Install MongoDB 5 on Ubuntu 22.04|20.04|18.04 Databases How To Install MySQL 8.0 on Ubuntu 24.04|22.04|20.04 AlmaLinux Steps of Installing MariaDB or MySQL on Rocky 9|AlmaLinux 9

Leave a Comment

Press ESC to close