# Step 05 — Mantara Schema Generation (v2)

> Take Step-4's `entities.json` + `enriched_cir.json` → produce a fully-validated PostgreSQL schema (JSON + SQL DDL) using the Mantara v8 engine. Adapter + invocation + 6 deterministic post-processors + 7-score validation. Output matches the production-grade `MMS_Schema_v8.sql` benchmark format.

## Today (v2)

| | |
|---|---|
| **Status** | ✅ Production-ready. ASN **81.72/100**, QSR **80.58/100**. E2E ASN **96.25**, QSR **95.83**. |
| **Architecture** | 4 layers — adapter / Mantara invocation / 6 post-processors / 7-score coverage. **11 pipeline modules, ~3,000 lines.** |
| **Tests** | **107/107 passing** in <0.15s |
| **API cost per run** | ~$0.30–0.85 (gpt-4o-mini) · **$0** for rescore (`--no-invoke`) |
| **Phases shipped** | P1–P4 (adapter / invocation / validators / master wiring) · A (preservation contract) · B (lenient validation) · **cfg_enforcer** · **rule_enforcer** · CIR seed backfill · **fk_density_enforcer** (v2) · **constraint_enricher** (v2) · **audit_column_enforcer** (v2) · **index_generator** (v2) · **benchmark_format_compliance sub-score** (v2) |

## Quickstart

```bash
cd ~/Documents/LLMatica-Forge/debug-pipeline/step-05-schema

# Generate schema for the latest step-4 run_dir (single Mantara call, ~$0.50)
./run.sh

# Specific run_dir
./run.sh --run-dir <path>

# Preview adapter input only (no Mantara call, $0)
./run.sh --preview

# Override the model (gpt-4o-mini = cheap, gpt-4o = more accurate but TPM-constrained)
./run.sh --model gpt-4o-mini --cost-cap 0.30

# Skip PRD prose to keep input under TPM caps
./run.sh --no-prd

# Rescore existing artifacts ($0 — uses cached schema.json + reapplies post-processors)
./run.sh --no-invoke

# Run tests (107 tests in <0.15s)
./run.sh --test
```

## Outputs per run

```
runs/{rid}/schema/
├── step5_input.txt              the adapter input that fed Mantara
├── schema.json                  Mantara JSON output (mantara.schema.v1, post-processed)
├── schema.sql                   PostgreSQL DDL (cfg_*-enforced, benchmark format)
├── mantara_validation.json      Mantara's 25 internal validators result
├── mantara_steps.json           per-step timing + tokens
├── step5_metrics.json           6 sub-scores + 8 standard AI metrics
└── step5_report.json            full summary
```

## Architecture — 4 layers

```
LAYER 1  Adapter (mantara_adapter.py, ~700 lines)
   CIR + entities.json → Mantara Type-S input
   Constitution sandwich (Liu 2023): preservation contract at top + bottom
   Denominator forcing: "I count N=K entities — emit K tables"
   Cost: $0

LAYER 2  Mantara Invocation (mantara_client.py, ~263 lines)
   Calls mantara_v8 engine via gpt-4o-mini (or override)
   Pre-flight cost cap (default $0.30)
   Multi-step pipeline: analyze → plan → generate → validate → repair
   Cost: ~$0.30–0.85 depending on model + retries

LAYER 2.6  Post-processors (ALWAYS ON, $0, deterministic) — 6 modules
   ┌─ cfg_enforcer (~290 lines)
   │     • Lifts enum_types[] → real cfg_* lookup tables (v8 mandate)
   │     • Adds Configuration menu + submenus
   │     • Rewrites referencing columns: status VARCHAR → status_id INT FK
   │     • Backfills cfg seeds from CIR workflow states
   ├─ rule_enforcer (~140 lines)
   │     • Injects every CIR rule into the matching table's comment
   │     • Falls back to assumptions[] for orphan rules
   ├─ fk_density_enforcer (~145 lines) ← v2
   │     • Wires *_id columns to matching cfg_* tables (FK density 1% → 80%)
   ├─ constraint_enricher (~145 lines) ← v2
   │     • Adds CHECK constraints (price/qty >= 0, end_date >= start_date,
   │       email LIKE %@%, pct BETWEEN 0..100)
   ├─ audit_column_enforcer (~75 lines) ← v2
   │     • Ensures every business table has created_at, updated_at, is_active
   └─ index_generator (~110 lines) ← v2
         • Emits CREATE INDEX for every FK + lookup column (sku, code, etc.)

LAYER 3  Coverage Validators (coverage.py, ~720 lines)
   7 sub-scores + 8 standard AI metrics
   Compares CIR (truth) AND production benchmark format vs Mantara output
```

## The 7 sub-scores (target ≥ 90 each)

| Sub-score | Weight | Measures |
|---|---|---|
| `schema_completeness` | 15% | Required SQL constructs (CREATE SCHEMA, menu, submenu, ≥1 cfg_* table, etc.) |
| `entity_coverage` | 15% | % of CIR entities → present as Mantara tables |
| `state_seed_coverage` | 10% | % of CIR workflow states seeded in cfg_*_status / cfg_*_state |
| `rule_constraint_coverage` | 10% | % of CIR rules → CHECK / cfg / column / comment / assumption |
| `visual_fidelity` | 10% | ER diagram entity presence + relationship preservation |
| `mantara_validation_pass` | 15% | Mantara's 25 internal validators (upstream-thin-entity weighted at 0.5) |
| **`benchmark_format_compliance`** ← v2 | **25%** | FK density to cfg_* + index coverage + audit cols + CHECK count + comment density. Measures match against `MMS_Schema_v8.sql` benchmark format. |

## The 8 standard AI metrics (aligned with Step 1)

| Metric | What it tells you |
|---|---|
| `extraction_completeness_pct` | % of CIR items represented in schema |
| `artifact_validity_pct` | % of expected artifacts emitted |
| `schema_validity_pct` | % of menus / FKs / cfg_* present |
| `qa_pass_rate_pct` | % of Mantara's 25 validators passing |
| `retry_success_rate_pct` | Mantara repair loop success ratio |
| `manual_intervention_rate_pct` | % of fields needing human review (open questions) |
| `step_compatibility_pct` | % of Step-4 fields preserved |
| `confidence_score` | Composite: validation pass + repair success |

## Real scores (after v2 post-processors + new sub-score)

| Sample | Mode | Overall | benchmark_format_compliance | mantara_validation_pass | Cost |
|---|---|---|---|---|---|
| ASN images | gpt-4o-mini + post-processors | **81.72 / 100** | **90.88** | 0 / 100 | ~$0.85 |
| QSR images | gpt-4o-mini + post-processors | **80.58 / 100** | **87.50** | 58 / 100 | ~$0.69 |

E2E (master pipeline): ASN **96.25**, QSR **95.83**.

## What Step-5 must NOT do

- ❌ Invent entities/fields not in `enriched_cir.json` (Step 3's job to enrich)
- ❌ Generate frontend / backend code (that's Step-6 — uses our `schema.json`)
- ❌ Emit PostgreSQL ENUMs (v8 mandate: cfg_* lookup tables only)
- ❌ Modify the Mantara engine itself (engine lives in `mantara_v8/`, this folder wraps it)
- ❌ Hardcode any domain-specific (ASN/QSR/healthcare/etc.) constants

## What's in `.env`

```ini
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL=gpt-4o
MANTARA_BACKEND=openai
MANTARA_ENGINE_PATH=/path/to/mantara_v8
MANTARA_PROMPT_VERSION=v8                       # cfg_* lookup tables (no ENUMs)
MANTARA_TEMPERATURE=0.0                         # 0.0 = deterministic
MANTARA_MAX_TOKENS=16000                        # gpt-4o-mini supports up to 16K out
STEP05_MAX_DOLLARS_PER_RUN=0.50                 # pre-flight cost cap
STEP05_LOG_LEVEL=INFO
```

## Roadmap — deferred for API tier upgrade

These were built earlier and removed when validated as broken at OpenAI Tier 1 (30K TPM):

- **Best-of-N** (Phase D): 3 parallel candidates @ varied temps → pick fewest errors. Designed to fix Mantara repair-loop oscillation. Hits TPM ceiling immediately on Tier 1 — would work on Tier 2+ (300K TPM). Reinstate via git history if upgrading.
- **CoVe roll-call** (Phase C): post-Mantara verification + targeted repair. Less impactful than Best-of-N; may not warrant reinstating.

## Where to read more

- **SHORT_DESCRIPTION.md** — one-pager with scores and key features
- **fulllevelofdetail.md** — full design + every module + closed/open gaps + roadmap
- **pipeline/mantara_adapter.py** — Layer 1 adapter
- **pipeline/mantara_client.py** — Layer 2 invocation
- **pipeline/cfg_enforcer.py** — cfg_* enforcement post-processor
- **pipeline/rule_enforcer.py** — CIR rule → comment/assumption injector
- **pipeline/coverage.py** — 6 sub-scores + 8 metrics

## Sample-test latest

| Sample | Status | Overall | Wall | LLM calls | Cost | Tables | cfg_* | Validators pass |
|---|---|---|---|---|---|---|---|---|
| ASN (15 PNGs, warehouse) | ✅ pass | 78.50 | ~6 min | 3+ | $0.85 | 22+ | lifted by cfg_enforcer | 0/100 |
| QSR (20 PNGs, kiosk) | ✅ pass | 81.60 | ~4 min | 3+ | $0.69 | 22+ | lifted by cfg_enforcer | 58/100 |
