# Mantara v8 · Short Description

> **One-liner:** A standalone schema engine that takes a Type-S input (system / schema / menu+submenu+fields) and produces a fully-validated PostgreSQL schema (JSON `mantara.schema.v1` + SQL DDL). Multi-step LLM pipeline with deterministic SQL rendering. Independently sellable; LLMatica-Forge calls it via Step 5.

## Why it exists

Most schema generators do one LLM call: text → SQL. That's why they hallucinate enums, miss foreign keys, and produce schemas that won't compile.

Mantara splits the work into **3 LLM calls + deterministic rendering**:
1. **Analyze** (gpt-4o-mini) — classify input, extract entities/relationships
2. **Plan** (gpt-4o) — design the menu/submenu/table hierarchy
3. **Generate** (gpt-4o + Structured Outputs) — emit `MantaraSchema` JSON
4. **Validate** (Python, 25 business validators) — compile-time + business rules
5. **Repair** (LLM, max 3 retries) — re-generate with error feedback if validation fails
6. **Render** (Python, deterministic) — `MantaraSchema` → SQL DDL

**The key invariant:** SQL is rendered from JSON by Python. The LLM never writes SQL directly. JSON and SQL can't drift.

## What v8 changed

v8 introduced the **cfg_*** lookup table mandate. Every status, type, category, or category-like field becomes a separate table (e.g., `cfg_order_status`) with seed values, instead of a PostgreSQL `ENUM` type. This was the v8 system prompt's central design decision.

Why: ENUM values can't be edited by users at runtime; cfg_* tables can. Real enterprise systems need editable lookup data.

## What it produces

```
mantara_v8/output/<run_id>/
├── schema.json          ← MantaraSchema (Pydantic, mantara.schema.v1)
├── schema.sql           ← PostgreSQL DDL
├── validation.json      ← 25 business validator results
├── steps.json           ← per-step timing + tokens
└── input.txt            ← the Type-S input that fed the engine
```

## Real entry points

| Path | Use case |
|---|---|
| `python main.py "Build a hospital system"` | Quick CLI on a typed description |
| `python main.py --input fsd.csv` | CLI on a file (CSV/PDF/DOCX/TXT) |
| `streamlit run app.py` | Web UI with v1/v2 toggle, model picker, live progress |
| `from generator_v2 import generate_v2` | Programmatic — what LLMatica-Forge Step 5 uses |

## Two pipeline modes

| | v1 — Single Call | v2 — Multi-Step |
|---|---|---|
| **Best for** | Short inputs (<2000 chars), simple systems | FSDs, complex enterprise specs |
| **LLM calls** | 1 | 3 (analyze + plan + generate) |
| **Time** | 15–30s | 90–300s |
| **Cost** | ~$0.05–0.20 | ~$0.30–0.80 |
| **Auto-switch** | streamlit app picks v2 if input >2000 chars | — |

## Key features

- **Structured Outputs** — Pydantic schemas enforce JSON shape at the API level. Wrong shape never reaches our code.
- **25 business validators** — primary key, FK target, table naming, cfg_* mandate, submenu_id presence, COMMENT ON discipline, etc.
- **Repair loop** — LLM gets the validator errors back, re-generates. Max 3 retries. Capped to prevent oscillation.
- **Deterministic SQL renderer** — same JSON → byte-identical SQL. Auto-injects `CHECK >= 0` for `price`/`amount`/etc., `CHECK end_date >= start_date` for date pairs, sanitizes MySQL syntax (`ON UPDATE CURRENT_TIMESTAMP` → removed for PostgreSQL).
- **Forward-reference handling** — FKs to not-yet-created tables are deferred to ALTER TABLE statements at end.
- **Multi-backend support** — OpenAI (default), Anthropic, Ollama (local).
- **FSD analyzer** — pre-extracts a structured analysis from prose specs (entities, workflows, rules) before the LLM sees it.
- **Web UI with live progress** — Streamlit app shows each step as it runs.

## What Mantara does NOT do

- ❌ Read `enriched_cir.json` or any LLMatica-Forge artifact (Step 5's adapter does that conversion)
- ❌ Generate frontend / backend code (that's Step 6 of LLMatica-Forge)
- ❌ Modify Mantara's own input file format — Type-S is the contract
- ❌ Write SQL directly — Python renders SQL from validated JSON
- ❌ Emit PostgreSQL ENUM types (v8 mandate: cfg_* tables only)

## Where to read more

- **README.md** — quickstart + project structure
- **ARCHITECTURE.md** — multi-step pipeline, validators, renderer details
- **fulllevelofdetail.md** — full reference: every module, every validator, prompt versioning, backend abstraction
- **prompts/system_prompt_v8.md** — the v8 system prompt (the moat)
- **generator_v2.py** — multi-step pipeline orchestrator
- **business_validator.py** — 25 validators
- **renderer.py** — deterministic SQL emitter
- **models.py** — Pydantic schemas (MantaraSchema, Table, Column, …)

## Test latest

| Mode | Sample | Status | Time | LLM calls | Cost | Tables | cfg_* | Validators |
|---|---|---|---|---|---|---|---|---|
| v2 | ASN warehouse | ✅ pass | ~5 min | 3 | $0.71 | 22 | 0 (orphan in JSON, lifted by Step 5 cfg_enforcer) | 13 errors final |
| v2 | QSR kiosk | ✅ pass | ~4 min | 3 | $0.42 | 24 | 3 | 22 errors final |
