Text2SQL: Natural Language CSV Query Tool

Query your CSV data using natural language questions. This tool uses a fine-tuned small language model to convert your questions into SQL queries and execute them against your data.

Model

This app uses distil-qwen3-0.6b-text2sql, a compact 0.6B parameter model fine-tuned for Text2SQL tasks.

Model Performance

Metric	Teacher (DeepSeek-V3)	Base Model	Fine-tuned Model
LLM-as-a-Judge	76%	36%	74%
Exact Match	38%	24%	40%
ROUGE	88.6%	69.3%	88.5%

The model achieves 2x improvement over the base Qwen3-0.6B, approaching teacher performance at 1/1000th the size.

Supported SQL Features

Simple: SELECT, WHERE, COUNT, SUM, AVG, MAX, MIN
Medium: JOIN, GROUP BY, HAVING, ORDER BY, LIMIT
Complex: Subqueries, multiple JOINs

Installation

1. Install Ollama

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

Or download from ollama.com.

2. Download the Model

Download the GGUF model from HuggingFace:

# Download the GGUF file
curl -L -o model.gguf \
  https://huggingface.co/distil-labs/distil-qwen3-0.6b-text2sql/resolve/main/model.gguf

3. Create the Ollama Model

ollama create text2sql -f Modelfile

4. Install Python Dependencies

pip install pandas openai

Usage

Basic Query

python app.py --csv data.csv --question "How many rows are there?"

Show Generated SQL

python app.py --csv data.csv --question "What is the average price?" --show-sql

Multiple Tables (for JOINs)

python app.py --csv orders.csv --csv customers.csv \
  --question "Show total orders per customer"

Command Line Options

Option	Description	Default
`--csv`	Path to CSV file (can be repeated)	Required
`--question`	Natural language question	Required
`--model`	Ollama model name	`text2sql`
`--port`	Ollama server port	`11434`
`--show-sql`	Print generated SQL query	`false`

Examples

Example 1: Employee Data

python app.py --csv example_data/employees.csv \
  --question "How many employees work in Engineering?" --show-sql

Output:

Generated SQL: SELECT COUNT(*) FROM employees WHERE department = 'Engineering';

 COUNT(*)
        4

Example 2: Aggregation

python app.py --csv example_data/employees.csv \
  --question "What is the average salary per department?" --show-sql

Output:

Generated SQL: SELECT department, AVG(salary) FROM employees GROUP BY department;

 department  AVG(salary)
Engineering      87500.0
  Marketing      58333.3
      Sales      68333.3

Example 3: JOIN Query

python app.py --csv example_data/employees.csv --csv example_data/projects.csv \
  --question "List all projects with their lead names" --show-sql

Output:

Generated SQL: SELECT p.name, e.name AS lead_name FROM projects p JOIN employees e ON p.lead_id = e.id;

              name      lead_name
  Website Redesign  Alice Johnson
        Mobile App Carol Williams
   CRM Integration     Henry Chen
Marketing Campaign    David Brown
   Sales Dashboard      Bob Smith

Project Structure

.
├── README.md           # This file
├── Modelfile           # Ollama model configuration
├── model.gguf          # Quantized model weights (download separately)
├── model_client.py     # Python client for the model
├── app.py              # Main CLI application
└── example_data/       # Sample CSV files
    ├── employees.csv
    └── projects.csv

How It Works

Load CSV: The app loads your CSV file(s) into an in-memory SQLite database
Generate Schema: It automatically infers the SQL schema from your data
Generate SQL: Your natural language question + schema are sent to the model
Execute Query: The generated SQL is executed against the SQLite database
Return Results: Results are displayed as a formatted table

Limitations

Optimized for SQLite syntax
Best with 1-2 table schemas
May struggle with highly complex nested subqueries
Trained on English questions only

License

Model: Apache 2.0
App Code: MIT

Text2SQL: Natural Language CSV Query Tool

Model

Model Performance

Supported SQL Features

Installation

1. Install Ollama

2. Download the Model

3. Create the Ollama Model

4. Install Python Dependencies

Usage

Basic Query

Show Generated SQL

Multiple Tables (for JOINs)

Command Line Options

Examples

Example 1: Employee Data

Example 2: Aggregation

Example 3: JOIN Query

Project Structure

How It Works

Limitations

License

Cookie preferences