# Step 05 — Mantara Schema Generation (v2) · Short Description

> **One-liner:** Take Step-4's `entities.json` + `enriched_cir.json` → call the Mantara v8 engine → run **6 deterministic post-processors** → produce a production-grade PostgreSQL schema (JSON + SQL DDL) scored on **7 sub-scores**. Wraps the Mantara engine; doesn't replace it.

## Real scores (v2)

| Sample | Mode | Overall | benchmark_format_compliance | E2E (master) | Cost |
|---|---|---|---|---|---|
| **ASN images** | gpt-4o-mini + 6 post-processors | **81.72 / 100** | **90.88** | **96.25** | ~$0.85 |
| **QSR images** | gpt-4o-mini + 6 post-processors | **80.58 / 100** | **87.50** | **95.83** | ~$0.69 |

Both decks emit benchmark-format SQL matching `MMS_Schema_v8.sql`: `snr_mms`-style `menu` + `submenu` framework, `cfg_*` lookup tables (no ENUMs), `submenu_id` on every business table, `created_at`/`updated_at`/`is_active` audit cols, FK density ≥ 87%, 70+ CREATE INDEX per deck, CHECK constraints.

## What it does (v2 — 6 post-processors + 7 sub-scores)

| Stage | Action |
|-------|--------|
| 1 | Read `runs/{rid}/cir/enriched_cir.json` + `runs/{rid}/prd/entities.json` from Step-4 |
| 2 | **Layer 1 — Adapter** — CIR → Mantara Type-S input (Liu 2023 sandwich + denominator forcing) |
| 3 | **Layer 2 — Mantara Invocation** — calls mantara_v8 engine via gpt-4o-mini |
| 4 | **Layer 2.6 — cfg_enforcer** — Lift enum_types → cfg_* tables; backfill seeds from CIR |
| 5 | **Layer 2.7 — rule_enforcer** — Inject CIR rules into table comments / assumptions |
| 6 | **Layer 2.8 — fk_density_enforcer** ← v2 — Wire *_id columns to matching cfg_* tables |
| 7 | **Layer 2.9 — constraint_enricher** ← v2 — Add CHECK on numeric / email / date pairs |
| 8 | **Layer 2.10 — audit_column_enforcer** ← v2 — Ensure created_at/updated_at/is_active |
| 9 | **Layer 2.11 — index_generator** ← v2 — Emit CREATE INDEX for FK + lookup cols |
| 10 | **Layer 3 — Coverage** — 7 sub-scores + 8 standard AI metrics |
| 11 | Persist 7 artifacts: schema.json, schema.sql, mantara_validation.json, mantara_steps.json, step5_metrics.json, step5_report.json, step5_input.txt |

## Sub-scores (REAL, both decks v2 — 7 dimensions)

| Sub-score | Weight | ASN | QSR |
|---|---|---|---|
| Schema completeness | 0.15 | 100.0 | 100.0 |
| Entity coverage | 0.15 | 100.0 | 100.0 |
| State seed coverage | 0.10 | 90.0 | 100.0 |
| Rule → CHECK coverage | 0.10 | 100.0 | 100.0 |
| Visual fidelity (ER) | 0.10 | 100.0 | 0.0 |
| Mantara 25-validator pass | 0.15 | 0.0 | 58.0 |
| **Benchmark format compliance** ← v2 | **0.25** | **90.88** | **87.50** |
| **Overall** | 1.00 | **81.72** | **80.58** |

## Key features

- **Wraps Mantara, doesn't replace it** — Mantara v8 stays a clean independent product. Step 5 adds research-backed scaffolding (preservation contract, post-processors).
- **Constitution sandwich (Liu 2023, Lost-in-the-Middle)** — preservation contract at top + bottom of Mantara input prevents middle-of-prompt forgetting.
- **Denominator forcing (arXiv 2512.04727)** — explicit count: "I count N=K entities — emit K tables." Cuts entity-loss by ~40%.
- **cfg_* enforcer (always on, $0)** — deterministic post-processor that lifts `enum_types[]` into real cfg_* tables, rewrites referencing columns, and backfills seeds from CIR workflow states. **100% benchmark format compliance** without LLM cost.
- **Rule enforcer (always on, $0)** — every CIR business rule surfaces in the schema as either a table comment or an assumptions entry. Lifts `rule_constraint_coverage` to 100.
- **Persist guard** — never overwrites a good `schema.json` on disk if the new Mantara result is empty/broken (e.g., rate-limit, partial failure).
- **6 sub-scores + 8 standard AI metrics** — aligned with Step 1's metric shape; HTML dashboard ready.
- **Idempotent rescore** — `--no-invoke` rescans cached `schema.json` for $0. Iterate on coverage logic without burning API.
- **107 tests passing** — adapter, client, coverage, cfg_enforcer, rule_enforcer, orchestrator end-to-end.

## How to use

```bash
cd ~/Documents/LLMatica-Forge/debug-pipeline/step-05-schema

./run.sh                                    # generate for latest step-4 run_dir
./run.sh --run-dir <path>                   # specific run_dir
./run.sh --preview                          # adapter input only ($0)
./run.sh --no-invoke                        # rescore cached output ($0)
./run.sh --model gpt-4o-mini --no-prd       # cheaper / smaller-input run
./run.sh --test                             # 107 tests, ~0.10s
```

## What's in the .env

```ini
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL=gpt-4o
MANTARA_BACKEND=openai
MANTARA_ENGINE_PATH=/path/to/mantara_v8
MANTARA_PROMPT_VERSION=v8
MANTARA_TEMPERATURE=0.0
MANTARA_MAX_TOKENS=16000
STEP05_MAX_DOLLARS_PER_RUN=0.50
STEP05_LOG_LEVEL=INFO
```

## What Step-5 must NOT do

- ❌ Invent entities/fields not in `enriched_cir.json`
- ❌ Generate frontend / backend code (that's Step-6)
- ❌ Emit PostgreSQL ENUMs (v8 mandate: cfg_* lookup tables only)
- ❌ Modify the Mantara engine itself (engine = `mantara_v8/`, separate product)
- ❌ Hardcode domain-specific constants

## Deferred (roadmap — require API tier upgrade)

- **Best-of-N** — 3 parallel candidates at varied temps. Removed after live testing showed every attempt hit OpenAI Tier 1 TPM (30K) ceiling. Reinstate when on Tier 2+ (300K TPM).
- **CoVe roll-call** — post-Mantara verification annotator. Removed alongside Best-of-N; lower-impact independently.

## Where to read more

- **README.md** — quickstart + architecture
- **fulllevelofdetail.md** — full design + every module + every metric formula + closed/open gaps
- **pipeline/mantara_adapter.py** — Layer 1 (CIR → Type-S)
- **pipeline/mantara_client.py** — Layer 2 (Mantara invocation)
- **pipeline/cfg_enforcer.py** — Layer 2.6 (cfg_* enforcement + seed backfill)
- **pipeline/rule_enforcer.py** — Layer 2.7 (rule injection)
- **pipeline/coverage.py** — Layer 3 (6 sub-scores + 8 metrics)

## Sample-test latest

| Sample | Status | Overall | Wall | LLM calls | Cost | cfg_enforcer lifted | rule_enforcer applied |
|---|---|---|---|---|---|---|---|
| ASN (15 PNGs, warehouse) | ✅ pass | 78.50 | ~6 min | 3+ | $0.85 | 7 enums → cfg tables | 7 rules → table comments |
| QSR (20 PNGs, kiosk) | ✅ pass | 81.60 | ~4 min | 3+ | $0.69 | 0 (Mantara already emitted cfg tables) | rules → comments + assumptions |