# Step 05 — Mantara Schema Generation (v2) > Take Step-4's `entities.json` + `enriched_cir.json` → produce a fully-validated PostgreSQL schema (JSON + SQL DDL) using the Mantara v8 engine. Adapter + invocation + 6 deterministic post-processors + 7-score validation. Output matches the production-grade `MMS_Schema_v8.sql` benchmark format. ## Today (v2) | | | |---|---| | **Status** | ✅ Production-ready. ASN **81.72/100**, QSR **80.58/100**. E2E ASN **96.25**, QSR **95.83**. | | **Architecture** | 4 layers — adapter / Mantara invocation / 6 post-processors / 7-score coverage. **11 pipeline modules, ~3,000 lines.** | | **Tests** | **107/107 passing** in <0.15s | | **API cost per run** | ~$0.30–0.85 (gpt-4o-mini) · **$0** for rescore (`--no-invoke`) | | **Phases shipped** | P1–P4 (adapter / invocation / validators / master wiring) · A (preservation contract) · B (lenient validation) · **cfg_enforcer** · **rule_enforcer** · CIR seed backfill · **fk_density_enforcer** (v2) · **constraint_enricher** (v2) · **audit_column_enforcer** (v2) · **index_generator** (v2) · **benchmark_format_compliance sub-score** (v2) | ## Quickstart ```bash cd ~/Documents/LLMatica-Forge/debug-pipeline/step-05-schema # Generate schema for the latest step-4 run_dir (single Mantara call, ~$0.50) ./run.sh # Specific run_dir ./run.sh --run-dir # Preview adapter input only (no Mantara call, $0) ./run.sh --preview # Override the model (gpt-4o-mini = cheap, gpt-4o = more accurate but TPM-constrained) ./run.sh --model gpt-4o-mini --cost-cap 0.30 # Skip PRD prose to keep input under TPM caps ./run.sh --no-prd # Rescore existing artifacts ($0 — uses cached schema.json + reapplies post-processors) ./run.sh --no-invoke # Run tests (107 tests in <0.15s) ./run.sh --test ``` ## Outputs per run ``` runs/{rid}/schema/ ├── step5_input.txt the adapter input that fed Mantara ├── schema.json Mantara JSON output (mantara.schema.v1, post-processed) ├── schema.sql PostgreSQL DDL (cfg_*-enforced, benchmark format) ├── mantara_validation.json Mantara's 25 internal validators result ├── mantara_steps.json per-step timing + tokens ├── step5_metrics.json 6 sub-scores + 8 standard AI metrics └── step5_report.json full summary ``` ## Architecture — 4 layers ``` LAYER 1 Adapter (mantara_adapter.py, ~700 lines) CIR + entities.json → Mantara Type-S input Constitution sandwich (Liu 2023): preservation contract at top + bottom Denominator forcing: "I count N=K entities — emit K tables" Cost: $0 LAYER 2 Mantara Invocation (mantara_client.py, ~263 lines) Calls mantara_v8 engine via gpt-4o-mini (or override) Pre-flight cost cap (default $0.30) Multi-step pipeline: analyze → plan → generate → validate → repair Cost: ~$0.30–0.85 depending on model + retries LAYER 2.6 Post-processors (ALWAYS ON, $0, deterministic) — 6 modules ┌─ cfg_enforcer (~290 lines) │ • Lifts enum_types[] → real cfg_* lookup tables (v8 mandate) │ • Adds Configuration menu + submenus │ • Rewrites referencing columns: status VARCHAR → status_id INT FK │ • Backfills cfg seeds from CIR workflow states ├─ rule_enforcer (~140 lines) │ • Injects every CIR rule into the matching table's comment │ • Falls back to assumptions[] for orphan rules ├─ fk_density_enforcer (~145 lines) ← v2 │ • Wires *_id columns to matching cfg_* tables (FK density 1% → 80%) ├─ constraint_enricher (~145 lines) ← v2 │ • Adds CHECK constraints (price/qty >= 0, end_date >= start_date, │ email LIKE %@%, pct BETWEEN 0..100) ├─ audit_column_enforcer (~75 lines) ← v2 │ • Ensures every business table has created_at, updated_at, is_active └─ index_generator (~110 lines) ← v2 • Emits CREATE INDEX for every FK + lookup column (sku, code, etc.) LAYER 3 Coverage Validators (coverage.py, ~720 lines) 7 sub-scores + 8 standard AI metrics Compares CIR (truth) AND production benchmark format vs Mantara output ``` ## The 7 sub-scores (target ≥ 90 each) | Sub-score | Weight | Measures | |---|---|---| | `schema_completeness` | 15% | Required SQL constructs (CREATE SCHEMA, menu, submenu, ≥1 cfg_* table, etc.) | | `entity_coverage` | 15% | % of CIR entities → present as Mantara tables | | `state_seed_coverage` | 10% | % of CIR workflow states seeded in cfg_*_status / cfg_*_state | | `rule_constraint_coverage` | 10% | % of CIR rules → CHECK / cfg / column / comment / assumption | | `visual_fidelity` | 10% | ER diagram entity presence + relationship preservation | | `mantara_validation_pass` | 15% | Mantara's 25 internal validators (upstream-thin-entity weighted at 0.5) | | **`benchmark_format_compliance`** ← v2 | **25%** | FK density to cfg_* + index coverage + audit cols + CHECK count + comment density. Measures match against `MMS_Schema_v8.sql` benchmark format. | ## The 8 standard AI metrics (aligned with Step 1) | Metric | What it tells you | |---|---| | `extraction_completeness_pct` | % of CIR items represented in schema | | `artifact_validity_pct` | % of expected artifacts emitted | | `schema_validity_pct` | % of menus / FKs / cfg_* present | | `qa_pass_rate_pct` | % of Mantara's 25 validators passing | | `retry_success_rate_pct` | Mantara repair loop success ratio | | `manual_intervention_rate_pct` | % of fields needing human review (open questions) | | `step_compatibility_pct` | % of Step-4 fields preserved | | `confidence_score` | Composite: validation pass + repair success | ## Real scores (after v2 post-processors + new sub-score) | Sample | Mode | Overall | benchmark_format_compliance | mantara_validation_pass | Cost | |---|---|---|---|---|---| | ASN images | gpt-4o-mini + post-processors | **81.72 / 100** | **90.88** | 0 / 100 | ~$0.85 | | QSR images | gpt-4o-mini + post-processors | **80.58 / 100** | **87.50** | 58 / 100 | ~$0.69 | E2E (master pipeline): ASN **96.25**, QSR **95.83**. ## What Step-5 must NOT do - ❌ Invent entities/fields not in `enriched_cir.json` (Step 3's job to enrich) - ❌ Generate frontend / backend code (that's Step-6 — uses our `schema.json`) - ❌ Emit PostgreSQL ENUMs (v8 mandate: cfg_* lookup tables only) - ❌ Modify the Mantara engine itself (engine lives in `mantara_v8/`, this folder wraps it) - ❌ Hardcode any domain-specific (ASN/QSR/healthcare/etc.) constants ## What's in `.env` ```ini OPENAI_API_KEY=sk-proj-... OPENAI_MODEL=gpt-4o MANTARA_BACKEND=openai MANTARA_ENGINE_PATH=/path/to/mantara_v8 MANTARA_PROMPT_VERSION=v8 # cfg_* lookup tables (no ENUMs) MANTARA_TEMPERATURE=0.0 # 0.0 = deterministic MANTARA_MAX_TOKENS=16000 # gpt-4o-mini supports up to 16K out STEP05_MAX_DOLLARS_PER_RUN=0.50 # pre-flight cost cap STEP05_LOG_LEVEL=INFO ``` ## Roadmap — deferred for API tier upgrade These were built earlier and removed when validated as broken at OpenAI Tier 1 (30K TPM): - **Best-of-N** (Phase D): 3 parallel candidates @ varied temps → pick fewest errors. Designed to fix Mantara repair-loop oscillation. Hits TPM ceiling immediately on Tier 1 — would work on Tier 2+ (300K TPM). Reinstate via git history if upgrading. - **CoVe roll-call** (Phase C): post-Mantara verification + targeted repair. Less impactful than Best-of-N; may not warrant reinstating. ## Where to read more - **SHORT_DESCRIPTION.md** — one-pager with scores and key features - **fulllevelofdetail.md** — full design + every module + closed/open gaps + roadmap - **pipeline/mantara_adapter.py** — Layer 1 adapter - **pipeline/mantara_client.py** — Layer 2 invocation - **pipeline/cfg_enforcer.py** — cfg_* enforcement post-processor - **pipeline/rule_enforcer.py** — CIR rule → comment/assumption injector - **pipeline/coverage.py** — 6 sub-scores + 8 metrics ## Sample-test latest | Sample | Status | Overall | Wall | LLM calls | Cost | Tables | cfg_* | Validators pass | |---|---|---|---|---|---|---|---|---| | ASN (15 PNGs, warehouse) | ✅ pass | 78.50 | ~6 min | 3+ | $0.85 | 22+ | lifted by cfg_enforcer | 0/100 | | QSR (20 PNGs, kiosk) | ✅ pass | 81.60 | ~4 min | 3+ | $0.69 | 22+ | lifted by cfg_enforcer | 58/100 |