Query your CSV data using natural language questions. This tool uses a fine-tuned small language model to convert your questions into SQL queries and execute them against your data.
Model
This app uses distil-qwen3-0.6b-text2sql, a compact 0.6B parameter model fine-tuned for Text2SQL tasks.
Model Performance
| Metric | Teacher (DeepSeek-V3) | Base Model | Fine-tuned Model |
|---|---|---|---|
| LLM-as-a-Judge | 76% | 36% | 74% |
| Exact Match | 38% | 24% | 40% |
| ROUGE | 88.6% | 69.3% | 88.5% |
The model achieves 2x improvement over the base Qwen3-0.6B, approaching teacher performance at 1/1000th the size.
Supported SQL Features
- Simple: SELECT, WHERE, COUNT, SUM, AVG, MAX, MIN
- Medium: JOIN, GROUP BY, HAVING, ORDER BY, LIMIT
- Complex: Subqueries, multiple JOINs
Installation
1. Install Ollama
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
Or download from ollama.com.
2. Download the Model
Download the GGUF model from HuggingFace:
# Download the GGUF file
curl -L -o model.gguf \
https://huggingface.co/distil-labs/distil-qwen3-0.6b-text2sql/resolve/main/model.gguf
3. Create the Ollama Model
ollama create text2sql -f Modelfile
4. Install Python Dependencies
pip install pandas openai
Usage
Basic Query
python app.py --csv data.csv --question "How many rows are there?"
Show Generated SQL
python app.py --csv data.csv --question "What is the average price?" --show-sql
Multiple Tables (for JOINs)
python app.py --csv orders.csv --csv customers.csv \
--question "Show total orders per customer"
Command Line Options
| Option | Description | Default |
|---|---|---|
--csv | Path to CSV file (can be repeated) | Required |
--question | Natural language question | Required |
--model | Ollama model name | text2sql |
--port | Ollama server port | 11434 |
--show-sql | Print generated SQL query | false |
Examples
Example 1: Employee Data
python app.py --csv example_data/employees.csv \
--question "How many employees work in Engineering?" --show-sql
Output:
Generated SQL: SELECT COUNT(*) FROM employees WHERE department = 'Engineering';
COUNT(*)
4
Example 2: Aggregation
python app.py --csv example_data/employees.csv \
--question "What is the average salary per department?" --show-sql
Output:
Generated SQL: SELECT department, AVG(salary) FROM employees GROUP BY department;
department AVG(salary)
Engineering 87500.0
Marketing 58333.3
Sales 68333.3
Example 3: JOIN Query
python app.py --csv example_data/employees.csv --csv example_data/projects.csv \
--question "List all projects with their lead names" --show-sql
Output:
Generated SQL: SELECT p.name, e.name AS lead_name FROM projects p JOIN employees e ON p.lead_id = e.id;
name lead_name
Website Redesign Alice Johnson
Mobile App Carol Williams
CRM Integration Henry Chen
Marketing Campaign David Brown
Sales Dashboard Bob Smith
Project Structure
.
├── README.md # This file
├── Modelfile # Ollama model configuration
├── model.gguf # Quantized model weights (download separately)
├── model_client.py # Python client for the model
├── app.py # Main CLI application
└── example_data/ # Sample CSV files
├── employees.csv
└── projects.csv
How It Works
- Load CSV: The app loads your CSV file(s) into an in-memory SQLite database
- Generate Schema: It automatically infers the SQL schema from your data
- Generate SQL: Your natural language question + schema are sent to the model
- Execute Query: The generated SQL is executed against the SQLite database
- Return Results: Results are displayed as a formatted table
Limitations
- Optimized for SQLite syntax
- Best with 1-2 table schemas
- May struggle with highly complex nested subqueries
- Trained on English questions only
License
- Model: Apache 2.0
- App Code: MIT