← All content
Demo Question AnsweringOn-Prem / Edge

Text2SQL: Natural Language CSV Query Tool

Query your CSV data using natural language questions. This tool uses a fine-tuned small language model to convert your questions into SQL queries and execute them against your data.

Model

This app uses distil-qwen3-0.6b-text2sql, a compact 0.6B parameter model fine-tuned for Text2SQL tasks.

Model Performance

MetricTeacher (DeepSeek-V3)Base ModelFine-tuned Model
LLM-as-a-Judge76%36%74%
Exact Match38%24%40%
ROUGE88.6%69.3%88.5%

The model achieves 2x improvement over the base Qwen3-0.6B, approaching teacher performance at 1/1000th the size.

Supported SQL Features

  • Simple: SELECT, WHERE, COUNT, SUM, AVG, MAX, MIN
  • Medium: JOIN, GROUP BY, HAVING, ORDER BY, LIMIT
  • Complex: Subqueries, multiple JOINs

Installation

1. Install Ollama

# macOS
brew install ollama

# Linux
curl -fsSL https://ollama.com/install.sh | sh

Or download from ollama.com.

2. Download the Model

Download the GGUF model from HuggingFace:

# Download the GGUF file
curl -L -o model.gguf \
  https://huggingface.co/distil-labs/distil-qwen3-0.6b-text2sql/resolve/main/model.gguf

3. Create the Ollama Model

ollama create text2sql -f Modelfile

4. Install Python Dependencies

pip install pandas openai

Usage

Basic Query

python app.py --csv data.csv --question "How many rows are there?"

Show Generated SQL

python app.py --csv data.csv --question "What is the average price?" --show-sql

Multiple Tables (for JOINs)

python app.py --csv orders.csv --csv customers.csv \
  --question "Show total orders per customer"

Command Line Options

OptionDescriptionDefault
--csvPath to CSV file (can be repeated)Required
--questionNatural language questionRequired
--modelOllama model nametext2sql
--portOllama server port11434
--show-sqlPrint generated SQL queryfalse

Examples

Example 1: Employee Data

python app.py --csv example_data/employees.csv \
  --question "How many employees work in Engineering?" --show-sql

Output:

Generated SQL: SELECT COUNT(*) FROM employees WHERE department = 'Engineering';

 COUNT(*)
        4

Example 2: Aggregation

python app.py --csv example_data/employees.csv \
  --question "What is the average salary per department?" --show-sql

Output:

Generated SQL: SELECT department, AVG(salary) FROM employees GROUP BY department;

 department  AVG(salary)
Engineering      87500.0
  Marketing      58333.3
      Sales      68333.3

Example 3: JOIN Query

python app.py --csv example_data/employees.csv --csv example_data/projects.csv \
  --question "List all projects with their lead names" --show-sql

Output:

Generated SQL: SELECT p.name, e.name AS lead_name FROM projects p JOIN employees e ON p.lead_id = e.id;

              name      lead_name
  Website Redesign  Alice Johnson
        Mobile App Carol Williams
   CRM Integration     Henry Chen
Marketing Campaign    David Brown
   Sales Dashboard      Bob Smith

Project Structure

.
├── README.md           # This file
├── Modelfile           # Ollama model configuration
├── model.gguf          # Quantized model weights (download separately)
├── model_client.py     # Python client for the model
├── app.py              # Main CLI application
└── example_data/       # Sample CSV files
    ├── employees.csv
    └── projects.csv

How It Works

  1. Load CSV: The app loads your CSV file(s) into an in-memory SQLite database
  2. Generate Schema: It automatically infers the SQL schema from your data
  3. Generate SQL: Your natural language question + schema are sent to the model
  4. Execute Query: The generated SQL is executed against the SQLite database
  5. Return Results: Results are displayed as a formatted table

Limitations

  • Optimized for SQLite syntax
  • Best with 1-2 table schemas
  • May struggle with highly complex nested subqueries
  • Trained on English questions only

License

  • Model: Apache 2.0
  • App Code: MIT