# Research: Open-Source Models for SQL/Schema Generation (March 2026)

## Key Finding: Text-to-DDL is Nearly Untouched

**Text-to-SQL** (query generation) is a crowded, mature space. **Text-to-DDL** (schema generation — what Mantara does) has almost no dedicated models or benchmarks. Mantara is in a unique position.

---

## Top Models for SQL Generation

### Tier 1: State-of-the-Art

| Model | Size | BIRD Score | Key Feature |
|-------|------|-----------|-------------|
| **Snowflake Arctic-Text2SQL-R1** | 7B/14B/32B | 71.83% (32B, SOTA) | RL-trained with execution rewards |
| **OmniSQL** | 7B/14B/32B | Matches GPT-4o | Trained on SynSQL-2.5M (2.5M samples) |
| **XiYanSQL-QwenCoder** | 3B-32B | 69.03% (32B) | Supports PostgreSQL dialect |
| **Contextual AI SQL** | System | Was #1 Feb 2025 | Multi-candidate generation + ranking |

### Tier 2: Established

| Model | Size | Notes |
|-------|------|-------|
| **Defog SQLCoder** | 7B/8B/34B | Most popular SQL model on HF (87K+ downloads) |
| **DuckDB-NSQL** | 7B | Available on Ollama |
| **Qwen2.5-Coder** | 7B/32B | 95.73% on text-to-SQL benchmarks |

### For Mantara Specifically
- **Best local option:** XiYanSQL-QwenCoder-14B (PostgreSQL dialect support)
- **Best if budget allows:** Arctic-Text2SQL-R1-7B (SOTA even at 7B via RL training)
- **Current choice (Qwen2.5-Coder-7B):** Solid baseline, strong code understanding

---

## Benchmarks

| Benchmark | Focus | Status |
|-----------|-------|--------|
| **BIRD-SQL** | Real-world SQL | Premier benchmark, updated Nov 2025 |
| **Spider 2.0** | Enterprise-scale | Active |
| **RSchema/Text2Schema** | Schema DESIGN (only one!) | 381-500 pairs, March 2025 |

**No large-scale benchmark exists for NL-to-DDL.** This is a major opportunity.

---

## Fine-tuning for Schema Generation

### Datasets
| Dataset | Size | Content |
|---------|------|---------|
| **SynSQL-2.5M** | 2.5M samples | DB + SQL + NL + CoT (Apache 2.0) |
| **Gretel synthetic_text_to_sql** | 105K records | NL + SQL + context, 100 domains |
| **b-mc2/sql-create-context** | 78K pairs | NL + SQL + CREATE TABLE context |

### Recommended Approach
1. Base: Qwen2.5-Coder-7B or Qwen3-4B
2. Method: QLoRA (single consumer GPU)
3. Framework: Unsloth + HuggingFace TRL
4. Key insight: Arctic-Text2SQL-R1 proved that RL with execution rewards beats SFT dramatically

---

## Emerging Approaches

- **Multi-agent pipelines** (like Mantara V2) validated by Text2Schema paper
- **RAG for schema generation** — retrieve similar schemas as context
- **Grammar-constrained DDL** — enforce valid SQL at token level
- **Inference-time scaling** — generate 5-10 candidates, rank best one
- **RL fine-tuning** — execute generated DDL, reward valid schemas

---

## Sources
- Snowflake Arctic-Text2SQL-R1 (snowflake.com)
- OmniSQL / SynSQL-2.5M (github.com/RUCKBReasoning/OmniSQL)
- Text2Schema/SchemaAgent (arXiv 2503.23886)
- Defog SQLCoder (huggingface.co/defog)
- XiYanSQL-QwenCoder (github.com/XGenerationLab)
- BIRD Benchmark (bird-bench.github.io)
- Oxen.ai: Fine-tune Qwen3 for Text2SQL