# Architecture

This doc explains how Mantara works end to end — the design decisions, what each piece does, and why things are structured the way they are.

## The core idea

The LLM's job is to understand what the user wants and produce a structured JSON representation of the database schema. That's it. The LLM never writes SQL.

SQL is rendered deterministically by Python from the JSON. This means JSON and SQL can never drift apart — they come from the same source. It also means SQL rendering is instant, free, and testable without an API key.

This was a deliberate choice. Early experiments with having the LLM generate both JSON and SQL led to inconsistencies — a table would exist in the JSON but be missing from the SQL, or column types wouldn't match. Splitting the responsibilities fixed that entirely.

## Data flow

```
User describes a business system (text, FSD, CSV, etc.)
    │
    ▼
LLM generates a MantaraSchema JSON object
(using OpenAI Structured Outputs — the response is guaranteed
 to conform to the Pydantic model, no parsing needed)
    │
    ▼
business_validator.py runs 8 checks
(naming conventions, submenu IDs, table structure,
 referential integrity, JSON Schema compliance)
    │
    ▼
renderer.py converts JSON → PostgreSQL DDL
(CREATE SCHEMA, ENUMs, tables, deferred FKs,
 menu/submenu seed data, COMMENTs — all deterministic)
    │
    ▼
Output: .json + .sql files
```

## The two pipelines

### v1 — single call

For simple inputs. One LLM call with the full system prompt, user input, and `response_format=MantaraSchema`. The model returns a structured JSON object directly — no parsing, no regex, no hoping the model formats things correctly. OpenAI's Structured Outputs handles that.

Works well for up to ~15 tables. Fast (15-30 seconds).

### v2 — multi-step

For complex inputs like full FSDs with 50+ entities. Breaks the problem into 3 separate LLM calls:

**Step 1: Input Analysis** (gpt-4o-mini)
Classifies the input type, extracts entities, relationships, and ENUM candidates. This is a cheap call — just understanding what we're working with.

**Step 2: Schema Planning** (gpt-4o)
Takes the analysis and designs the full schema structure: menus, submenus, tables, columns, foreign keys, ENUMs. This is the most important step — it's where the actual database design happens.

**Step 3: JSON Generation** (gpt-4o + Structured Outputs)
Takes the plan and produces the final MantaraSchema JSON. Structured Outputs ensures it's valid.

If the generated JSON fails business validation, there's a repair loop that feeds the errors back to the model and asks it to fix them (max 2 retries). This catches most issues — things like non-snake_case enum values or missing submenu_id columns.

The analysis feeds into planning, and planning feeds into generation. Each step has a focused prompt, so the model isn't trying to do everything at once.

## Key modules

### models.py
Pydantic models that mirror the mantara.schema.v1 spec: Column, ForeignKey, Table, Submenu, Menu, EnumType, MantaraSchema. These are what OpenAI Structured Outputs uses as the response format. The `to_json_dict()` method adds back the `$schema` marker (can't have that as a Python field name).

### renderer.py
~330 lines of deterministic SQL generation. Handles:
- CREATE SCHEMA
- Menu and submenu tables with INSERT seed data
- ENUM types
- Business tables with all columns and constraints
- COMMENT ON TABLE for every table
- Deferred foreign keys — if table B references table A but A hasn't been created yet, the FK is generated as an `ALTER TABLE ... ADD CONSTRAINT` at the end
- SQL string escaping

No LLM involvement. Same JSON in → same SQL out, every time.

### business_validator.py
8 validation checks that run after JSON generation, before SQL rendering. Catches things the LLM gets wrong: naming conventions, structural rules, missing tables, referential integrity. The v2 repair loop uses these errors to guide the model's fix attempts.

### backends/
The LLM call is behind a protocol (`SchemaGenerator`). Currently only OpenAI is implemented, but the abstraction is there for adding local model backends later. The idea is that swapping the LLM provider should be a backend change only — everything downstream (validation, rendering, CLI, UI) stays the same.

### app.py
Streamlit web UI. Two columns: input on the left (text area or file upload), settings on the right (pipeline version, model). Supports CSV, PDF, DOCX, and TXT uploads. Shows live progress for v2 steps. Validation checklist. Download buttons.

For CSV uploads, the app parses the FSD structure (Module, Feature, Description, Entities, Relationships, ENUM Candidates columns) and converts it into a structured prompt.

Auto-switches to v2 if the input exceeds 2000 characters — v1 tends to timeout or produce incomplete results for large inputs.

## The mantara.schema.v1 spec

The intermediate JSON format that sits between the LLM and the SQL renderer. Structure:

```
{
  "$schema": "mantara.schema.v1",
  "system_name": "...",
  "schema_name": "...",      // snake_case, used as PostgreSQL schema
  "description": "...",
  "menus": [
    {
      "menu_id": 1,
      "menu_name": "...",
      "submenus": [
        {
          "submenu_id": 101,   // menu_id * 100 + sequence
          "tables": [
            {
              "table_name": "...",
              "columns": [...],
              "foreign_keys": [...]
            }
          ]
        }
      ]
    }
  ],
  "enum_types": [
    {
      "type_name": "schema.name_enum",
      "values": ["value_one", "value_two"]
    }
  ]
}
```

Every table must have `id SERIAL PRIMARY KEY` as the first column and `submenu_id INT` as the second. This is enforced by validation.

## System prompt

The system prompt (`prompts/system_prompt.md`) is ~735 lines. It's essentially an ontology definition — it tells the model exactly what mantara.schema.v1 is, what the conventions are, what a menu/submenu structure looks like, how to name things, how to handle ENUMs, and gives examples. It also has specific guidance for tricky cases like blood type enums (use `a_positive` not `A+`).

This is the most important file in the project for output quality. Most tuning happens here.

## File upload handling

The UI supports four formats:
- **TXT/MD** — read directly as the prompt
- **PDF** — extracted via PyMuPDF (fitz)
- **DOCX** — extracted via python-docx
- **CSV** — parsed as a structured FSD with columns: Section, Module, Feature, Description, Entities, Relationships, ENUM Candidates, Priority. Converted to a formatted prompt with headers and sections.

## Testing

48 unit tests across three modules, all runnable without an API key:
- `test_renderer.py` — SQL output structure, deferred FKs, edge cases
- `test_business_validator.py` — each validation check with positive and negative cases
- `test_models.py` — serialization, JSON Schema compliance, round-trip

The eval harness (`eval.py`) runs test prompts through the actual pipeline (needs API key) and reports: pass/fail, table count, ENUM count, latency, errors.

## Why certain decisions were made

**Why Structured Outputs instead of raw JSON parsing?**
Because the model sometimes produces malformed JSON — missing brackets, trailing commas, keys in the wrong order. Structured Outputs guarantees the response matches the Pydantic model. No parsing failures, ever.

**Why deterministic SQL rendering?**
Because LLMs are inconsistent with SQL syntax. One run might use `SERIAL`, another `INT GENERATED ALWAYS AS IDENTITY`. One might put `NOT NULL` before `DEFAULT`, another after. Deterministic rendering means consistent, testable output.

**Why a repair loop instead of just retrying from scratch?**
Because most errors are small — a non-snake_case enum value, a missing column. Feeding the specific errors back to the model and asking it to fix them is cheaper and more effective than regenerating everything.

**Why split analysis and planning in v2?**
The analysis step uses a cheap model (gpt-4o-mini) for classification and entity extraction. Planning uses a strong model (gpt-4o) for the actual schema design. Different tasks, different cost/capability tradeoffs. The analysis also gives the planner a cleaner view of the input — especially useful when the raw input is a messy FSD.
