# Mantara v8 · Full Level of Detail

> Comprehensive technical reference for the Mantara v8 schema engine. For a quick overview see [SHORT_DESCRIPTION.md](./SHORT_DESCRIPTION.md). For high-level architecture see [ARCHITECTURE.md](./ARCHITECTURE.md).

---

## 1. Goal

**Validated schema engine.** Take a Type-S input (system / schema / menu+submenu+fields description, optionally with FSD prose) and produce a complete PostgreSQL schema as both:
- `MantaraSchema` JSON (Pydantic, conforms to `mantara.schema.v1`)
- SQL DDL (deterministic Python rendering of the JSON)

The engine is **independently shippable** — it doesn't know about LLMatica-Forge, CIR, or PRD. Type-S is the contract.

---

## 2. Input contract

### 2.1 Type-S format (the canonical input)

```
SYSTEM: <system name>
SCHEMA: <postgres schema name (snake_case)>

DESCRIPTION:
<one-paragraph product description>

MENUS:
  Menu 1: <menu name>
    Submenu 1.1: <submenu name>
      Fields:
        - <field_name>: <type> [constraints] — <description>
        - ...
      Rules:
        - <business rule prose>
    Submenu 1.2: ...

ANTICIPATED LOOKUPS:
  - cfg_<entity>_<status|type>: [val1, val2, ...]

[OPTIONAL] FSD ANALYSIS:
  <pre-extracted entities, workflows, rules block — fed by Step 5 adapter>

[OPTIONAL] CONSTITUTION:
  <preservation contract: count of entities, count of states>
```

### 2.2 Sources of Type-S input

| Source | Path |
|---|---|
| Direct CLI prompt | `python main.py "build a salon system"` |
| File (CSV/PDF/DOCX/TXT) | `python main.py --input fsd.csv` |
| Streamlit UI | textarea or file_uploader |
| LLMatica-Forge Step 5 | `mantara_adapter.py` builds Type-S from CIR + entities.json |

---

## 3. Output contract

```
output/<run_id>/
├── schema.json          ← MantaraSchema dict, conforms to mantara.schema.v1
├── schema.sql           ← PostgreSQL DDL (CREATE SCHEMA → CREATE TABLE → INSERT seeds → ALTER FK)
├── validation.json      ← 25 business validator results
├── steps.json           ← per-step timing + tokens (analyze/plan/generate/validate/repair)
└── input.txt            ← the Type-S input echoed back
```

### 3.1 `schema.json` shape

```python
class MantaraSchema(BaseModel):
    system_name: str
    schema_name: str
    description: str
    menus: list[Menu]            # menus → submenus → tables → columns + foreign_keys
    enum_types: list[EnumType]   # cfg lookups (v8: lifted to tables by Step 5 cfg_enforcer)
    assumptions: list[str]
    open_questions: list[str]
```

### 3.2 SQL ordering (in `schema.sql`)

1. Header comment (system / schema name)
2. Assumptions block
3. Open questions block
4. `CREATE SCHEMA <name>;`
5. `CREATE TABLE <schema>.menu` + `INSERT` seed
6. `CREATE TABLE <schema>.submenu` + `INSERT` seed
7. `CREATE TYPE` blocks (v8: should be empty after Step 5 cfg_enforcer)
8. Per menu, per submenu: `CREATE TABLE` + `COMMENT ON`
9. Deferred ALTER TABLE FOREIGN KEY statements (forward references)
10. `END OF SCHEMA` footer

---

## 4. The 5-stage pipeline (v2 mode)

```
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 1 — INPUT ANALYZE (gpt-4o-mini, ~3-5s, ~$0.005)            │
│  fsd_analyzer.py                                                  │
│   • Classifies input: prose / structured / Type-S                 │
│   • Extracts: entities, attributes, relationships, business rules │
│   • Output: AnalysisResult (Pydantic) — fed to Stage 2 + 3        │
└──────────────────────────────────────────────────────────────────┘
                          ▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 2 — SCHEMA PLAN (gpt-4o, ~30-60s, ~$0.10-0.20)             │
│  generator_v2.py::_plan_schema()                                  │
│   • Designs menu/submenu/table hierarchy                          │
│   • Decides cfg_* lookup candidates                               │
│   • Output: PlanResult (Pydantic) — list of {table, fields, FKs}  │
└──────────────────────────────────────────────────────────────────┘
                          ▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 3 — JSON GENERATE (gpt-4o + Structured Outputs, ~60-120s)  │
│  generator_v2.py::_generate_json()                                │
│   • Emits MantaraSchema JSON conforming to Pydantic models        │
│   • Structured Outputs: shape guaranteed at API level             │
│   • Cost: ~$0.20-0.40                                             │
└──────────────────────────────────────────────────────────────────┘
                          ▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 4 — VALIDATE (Python, $0, ~0.1s)                           │
│  business_validator.py — 25 validators                            │
│   1.  CREATE_SCHEMA_PRESENT                                       │
│   2.  MENU_TABLE_PRESENT                                          │
│   3.  SUBMENU_TABLE_PRESENT                                       │
│   4.  EVERY_TABLE_HAS_PK                                          │
│   5.  EVERY_TABLE_HAS_SUBMENU_ID                                  │
│   6.  FK_TARGETS_EXIST                                            │
│   7.  CFG_TABLE_NAMING (cfg_*)                                    │
│   8.  NO_POSTGRES_ENUMS (v8 mandate)                              │
│   9.  COMMENT_ON_TABLE_DISCIPLINE                                 │
│   10. NO_DUPLICATE_TABLE_NAMES                                    │
│   11. NO_DUPLICATE_COLUMN_NAMES_IN_TABLE                          │
│   12. NO_DUPLICATE_PRIMARY_KEYS                                   │
│   13. CFG_TABLES_HAVE_SEED_DATA                                   │
│   14. SUBMENU_FK_TO_MENU                                          │
│   15. CHARACTER_ENCODING_UTF8                                     │
│   16. NO_RESERVED_KEYWORDS_AS_COLUMN_NAMES                        │
│   17. SCHEMA_NAME_LOWERCASE                                       │
│   18. TABLE_NAMES_LOWERCASE                                       │
│   19. CONSISTENT_PK_NAMING                                        │
│   20. NO_FORWARD_FK_WITHOUT_DEFERRAL                              │
│   21. NO_INVALID_DATA_TYPES                                       │
│   22. CHECK_CONSTRAINT_VALIDITY                                   │
│   23. NO_ORPHAN_FOREIGN_KEYS                                      │
│   24. ENTITY_HAS_FIELDS_FROM_INPUT (upstream-thin-entity)         │
│   25. RULE_REPRESENTED_AS_CONSTRAINT_OR_COMMENT                   │
└──────────────────────────────────────────────────────────────────┘
                          ▼  (if errors)
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 5 — REPAIR LOOP (LLM, max 3 retries)                       │
│  generator_v2.py::_repair()                                       │
│   • Feeds validator errors back to gpt-4o                         │
│   • Asks for re-emission with same input + error feedback         │
│   • Re-validates after each retry                                 │
│   • Aborts if errors increase 2 turns in a row (oscillation)      │
└──────────────────────────────────────────────────────────────────┘
                          ▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 6 — RENDER (Python, $0, ~0.05s)                            │
│  renderer.py::render_sql(schema: MantaraSchema) -> str            │
│   • Deterministic — same input → byte-identical output            │
│   • Auto-injects CHECK >= 0 for price/amount/qty/etc.             │
│   • Auto-injects CHECK end_date >= start_date for date pairs      │
│   • Sanitizes MySQL syntax (ON UPDATE CURRENT_TIMESTAMP removed)  │
│   • Defers forward-reference FKs to ALTER TABLE statements        │
│   • Quotes string literals (single-quote escape)                  │
└──────────────────────────────────────────────────────────────────┘
                          ▼
                    schema.json + schema.sql
```

---

## 5. The 17 modules

```
mantara_v8/
├── main.py                  CLI entry (~150 lines)
├── app.py                   Streamlit web UI (~500 lines)
├── config.py                env vars, paths, model config (~50 lines)
├── models.py                Pydantic schemas (~65 lines)
├── llm_client.py            backend dispatch (openai/anthropic/ollama)
├── backends/
│   ├── openai_backend.py    OpenAI Structured Outputs
│   ├── anthropic_backend.py Anthropic API
│   └── ollama_backend.py    Local Ollama
├── generator.py             v1 — single-call pipeline
├── generator_v2.py          v2 — multi-step pipeline (THE FLAGSHIP)
├── fsd_analyzer.py          Stage 1 — input analysis
├── business_validator.py    Stage 4 — 25 validators
├── renderer.py              Stage 6 — deterministic SQL renderer (~440 lines)
├── eval.py                  evaluation harness
├── comparator.py            schema comparison utilities
├── comparison_models.py     comparator Pydantic models
├── prompts/
│   ├── system_prompt.md          v1 (legacy)
│   ├── system_prompt_v8.md       v8 (production — cfg_* mandate)
│   ├── analyze_prompt.md
│   ├── plan_prompt.md
│   └── generate_prompt.md
├── schemas/
│   └── mantara_schema_v1.json    JSON Schema for MantaraSchema
└── tests/                   pytest suite
```

---

## 6. The renderer (Stage 6 deep-dive)

`renderer.py::render_sql(schema: MantaraSchema) -> str`

### 6.1 Auto-injected constraints

| Pattern | Injected | When |
|---|---|---|
| `CHECK (col >= 0)` | non-negative columns | name matches: `price\|cost\|amount\|qty\|salary\|fee\|charge\|revenue\|balance\|discount\|tax_amount\|quantity\|...` |
| `CHECK (end_date >= start_date)` | date range constraint | both `start_date`/`end_date` columns exist (or `check_in_date`/`check_out_date`, `usage_start_date`/`usage_end_date`, `planned_start_date`/`planned_end_date`) |

Skipped if column already has a CHECK constraint inline.

### 6.2 Sanitization

| MySQL-only syntax | Action |
|---|---|
| `ON UPDATE CURRENT_TIMESTAMP` | stripped (PostgreSQL doesn't support) |

### 6.3 Forward-reference deferral

If a FK targets a table not yet created in the SQL, the FK is moved to:
```sql
ALTER TABLE schema.tbl ADD CONSTRAINT fk_tbl_col
    FOREIGN KEY (col) REFERENCES schema.target(target_id);
```
…appended after all CREATE TABLE statements.

### 6.4 Other discipline

- All strings single-quoted with `''` escape for embedded quotes
- Section separators: `-- ` + 76 `=` characters
- Comments: `COMMENT ON TABLE` and `COMMENT ON COLUMN` for every entity
- Schema name validated lowercase
- Tables grouped by menu, submenus listed in order

---

## 7. The 25 business validators

| # | Validator | Severity | Auto-fix in repair? |
|---|---|---|---|
| 1 | CREATE_SCHEMA_PRESENT | error | yes |
| 2 | MENU_TABLE_PRESENT | error | yes |
| 3 | SUBMENU_TABLE_PRESENT | error | yes |
| 4 | EVERY_TABLE_HAS_PK | error | yes |
| 5 | EVERY_TABLE_HAS_SUBMENU_ID | warn | yes |
| 6 | FK_TARGETS_EXIST | error | yes |
| 7 | CFG_TABLE_NAMING | warn | yes |
| 8 | NO_POSTGRES_ENUMS | error (v8) | yes |
| 9 | COMMENT_ON_TABLE_DISCIPLINE | warn | yes |
| 10 | NO_DUPLICATE_TABLE_NAMES | error | yes |
| 11 | NO_DUPLICATE_COLUMN_NAMES_IN_TABLE | error | yes |
| 12 | NO_DUPLICATE_PRIMARY_KEYS | error | yes |
| 13 | CFG_TABLES_HAVE_SEED_DATA | warn | yes |
| 14 | SUBMENU_FK_TO_MENU | error | yes |
| 15 | CHARACTER_ENCODING_UTF8 | warn | yes |
| 16 | NO_RESERVED_KEYWORDS_AS_COLUMN_NAMES | error | yes |
| 17 | SCHEMA_NAME_LOWERCASE | error | yes |
| 18 | TABLE_NAMES_LOWERCASE | error | yes |
| 19 | CONSISTENT_PK_NAMING | warn | yes |
| 20 | NO_FORWARD_FK_WITHOUT_DEFERRAL | error | yes (renderer) |
| 21 | NO_INVALID_DATA_TYPES | error | yes |
| 22 | CHECK_CONSTRAINT_VALIDITY | warn | yes |
| 23 | NO_ORPHAN_FOREIGN_KEYS | error | yes |
| 24 | ENTITY_HAS_FIELDS_FROM_INPUT | warn (upstream-thin) | partial |
| 25 | RULE_REPRESENTED_AS_CONSTRAINT_OR_COMMENT | warn | yes |

---

## 8. Configuration

### 8.1 Environment variables

```ini
# API Keys
OPENAI_API_KEY=sk-proj-...
ANTHROPIC_API_KEY=sk-ant-...

# Backend selection
MANTARA_BACKEND=openai                  # openai | anthropic | ollama

# Model override
MANTARA_MODEL=gpt-4o                    # gpt-4o-mini for cheaper runs
MANTARA_TEMPERATURE=0.0                 # 0.0 = deterministic; LLMatica Step 5 best-of-n overrides per call

# Token caps
MANTARA_MAX_TOKENS=8000
MANTARA_MAX_DOLLARS_PER_RUN=0.50

# Prompt version
MANTARA_PROMPT_VERSION=v8               # loads prompts/system_prompt_v8.md (cfg_* mandate)
                                         # set to v1 for legacy ENUM-based behavior

# Repair loop
MANTARA_MAX_RETRIES=3                   # max repair attempts before abort

# Logging
MANTARA_LOG_LEVEL=INFO
```

### 8.2 Backend abstraction

`llm_client.py` dispatches to backend implementations under `backends/`:
- `openai_backend.py` — gpt-4o + Structured Outputs (production)
- `anthropic_backend.py` — Claude (alternative; doesn't have Structured Outputs)
- `ollama_backend.py` — Local Ollama (zero-cost path; v2 pipeline limited)

---

## 9. Cost & latency table

| Pipeline | Stage | Model | Tokens | Cost | Latency |
|---|---|---|---|---|---|
| **v1** | single call | gpt-4o | ~3K in / ~5K out | ~$0.05 | 15–30s |
| **v2** | analyze | gpt-4o-mini | ~2K in / ~1K out | ~$0.005 | 3–5s |
| **v2** | plan | gpt-4o | ~5K in / ~3K out | ~$0.10–0.20 | 30–60s |
| **v2** | generate | gpt-4o | ~6K in / ~6K out | ~$0.20–0.40 | 60–120s |
| **v2** | repair (per retry) | gpt-4o | ~8K in / ~6K out | ~$0.20–0.40 | 60–120s |
| **v2 total** | (no repair) | — | — | **~$0.30–0.60** | **~90–180s** |
| **v2 total** | (3 repairs) | — | — | **~$0.90–1.80** | **~270–540s** |

---

## 10. Closed gaps

- ✅ v1 single-call pipeline
- ✅ v2 multi-step pipeline (analyze → plan → generate → validate → repair)
- ✅ 25 business validators
- ✅ Deterministic SQL renderer
- ✅ OpenAI Structured Outputs integration
- ✅ Backend abstraction (openai / anthropic / ollama)
- ✅ Streamlit web UI with live progress
- ✅ FSD analyzer (Stage 1)
- ✅ Repair loop with oscillation guard
- ✅ Auto-inject CHECK constraints (non-negative, date-range)
- ✅ Forward-reference FK deferral
- ✅ MySQL-syntax sanitization
- ✅ v8 system prompt (cfg_* lookup table mandate)
- ✅ TEMPERATURE re-read per call (enables LLMatica's Best-of-N)
- ✅ COMMENT ON discipline

---

## 11. Open gaps

- ⚠️ **cfg_* table emission gap** — v8 prompt mandates cfg_* tables but generator sometimes emits `enum_types[]` instead. **Mitigated** in LLMatica via Step 5 `cfg_enforcer` post-processor (deterministic lift). Without that wrapper, raw Mantara output may have orphan enum_types.
- ⚠️ **Repair loop oscillation** — errors can fluctuate (e.g., 17→13→31→13). LLM-internal failure mode (Huang ICLR 2024). LLMatica mitigates via Best-of-N (Phase D); raw Mantara CLI users hit this directly.
- ⚠️ **Ollama v2 incomplete** — v2 pipeline assumes Structured Outputs; Ollama doesn't support that natively. v1 works on Ollama.
- ⚠️ **No CHECK constraint inference from rules** — generator sometimes leaves business rules as comments only. Could deterministically promote to CHECK constraints.

---

## 12. Roadmap

- Strengthen v8 prompt with worked cfg_* examples (3-5 expansions)
- Native cfg_* renderer mode (skip `_render_enums` if v8)
- Add `--no-repair` flag for predictable-cost runs
- Anthropic Structured Outputs path when API supports it
- Ollama v2 support via JSON-schema mode

---

## 13. Where to read more

- **README.md** — project structure + quickstart
- **ARCHITECTURE.md** — pipeline diagram + module roles
- **SHORT_DESCRIPTION.md** — one-pager
- **prompts/system_prompt_v8.md** — the v8 system prompt (your moat)
- **generator_v2.py** — v2 pipeline orchestrator
- **business_validator.py** — 25 validators
- **renderer.py** — deterministic SQL renderer
- **models.py** — Pydantic schemas
- **fsd_analyzer.py** — Stage 1 input analysis

---

## 14. References

- **OpenAI Structured Outputs** — https://platform.openai.com/docs/guides/structured-outputs
- **Pydantic v2** — JSON schema generation + validation
- **PostgreSQL DDL** — https://www.postgresql.org/docs/current/ddl.html
- **Mantara schema spec** — `schemas/mantara_schema_v1.json`