# Step 05 — Mantara Schema Generation · Full Level of Detail

> Comprehensive technical reference for Step 5. For a quick overview see [SHORT_DESCRIPTION.md](./SHORT_DESCRIPTION.md).

---

## 1. Goal

**Validated schema generator.** Take Step-4's structured spec (`enriched_cir.json` + `entities.json` + `PRD.md`) and produce a fully-validated PostgreSQL schema (JSON + SQL DDL) using the Mantara v8 engine, with research-backed wrappers (preservation contract, cfg_enforcer, rule_enforcer) and a 6-score validation rubric.

**Core principle:** Step 5 **wraps** Mantara — it does not replace it. The Mantara engine (`mantara_v8/`) is a clean, independently-sellable product. Step 5 adds CIR-aware adapter, post-processors, and CIR-vs-output scoring that's specific to LLMatica-Forge.

---

## 2. Input contract (from Step-04)

```
runs/{rid}/cir/enriched_cir.json     ← canonical CIR (entities, workflows, rules, actions)
runs/{rid}/prd/entities.json         ← JSON Schema 2020-12 entity definitions
runs/{rid}/prd/PRD.md                ← human-readable product spec (optional, --no-prd to skip)
runs/{rid}/prd/diagrams/er.mmd       ← Mermaid ER diagram (visual fidelity check)
runs/{rid}/prd/codegen_brief.json    ← denormalized join (informational)
runs/{rid}/prd/open_questions.json   ← drives manual_intervention_rate_pct
```

Step 5 fails loud if `enriched_cir.json` has 0 entities or `entities.json` is missing — Step 4 broke and there's nothing to schema-generate.

---

## 3. Output contract (per run)

```
runs/{rid}/schema/
├── step5_input.txt              ← the adapter input that fed Mantara (Type-S format)
├── schema.json                  ← Mantara JSON output (mantara.schema.v1, post-processed)
├── schema.sql                   ← PostgreSQL DDL — cfg_*-enforced, benchmark format
├── mantara_validation.json      ← Mantara's 25 internal business validators result
├── mantara_steps.json           ← per-step timing + tokens (analyze/plan/generate)
├── step5_metrics.json           ← 6 sub-scores + 8 standard AI metrics
└── step5_report.json            ← full summary (status, costs, paths)
```

**Source-of-truth hierarchy:**
- `schema.json` is canonical — Step 6 (codegen) will read this for table/field/FK metadata.
- `schema.sql` is a **deterministic projection** of `schema.json` (Mantara renderer + cfg seed INSERTs).
- `step5_metrics.json` aggregates into the master pipeline's HTML dashboard.

---

## 4. The 4-layer architecture

```
┌─────────────────────────────────────────────────────────────────────┐
│ LAYER 1 — ADAPTER (mantara_adapter.py, ~700 lines, $0, ALWAYS RUNS) │
│  CIR + entities.json → Mantara Type-S input                          │
│   • Step4Bundle.load() — parses the Step-4 artifact bundle           │
│   • Constitution sandwich (Liu 2023) — preservation contract at      │
│     top + repeated at bottom (prevents middle-of-prompt forgetting)  │
│   • Denominator forcing (arXiv 2512.04727) — "I count N=K entities;  │
│     emit K tables." Cuts entity-loss ~40%.                           │
│   • cfg_* anticipated lookups injected based on CIR state values     │
│   • Pre-declared anticipated operational menus to keep UX consistent │
│   • Adapter-fix: prevent 1-submenu menus (auto-merge)                │
└─────────────────────────────────────────────────────────────────────┘
                          ▼
┌─────────────────────────────────────────────────────────────────────┐
│ LAYER 2 — MANTARA INVOCATION (mantara_client.py)                    │
│  Calls mantara_v8 engine via gpt-4o-mini (or override)               │
│   • Pre-flight cost cap (default $0.30, env-driven)                  │
│   • Model-aware pricing: gpt-4o $5/M input, gpt-4o-mini $0.15/M      │
│   • Engine discovery: MANTARA_ENGINE_PATH or auto-locate             │
│   • Backend abstraction: openai / anthropic / ollama (P5 deferred)   │
│   • Multi-step: analyze → plan → generate → validate → repair        │
│   • Repair loop: max 3 retries, abort on oscillation                 │
└─────────────────────────────────────────────────────────────────────┘
                          ▼
┌─────────────────────────────────────────────────────────────────────┐
│ LAYER 2.6 — POST-PROCESSORS (always on, $0, deterministic)          │
│  ┌─ cfg_enforcer (cfg_enforcer.py, ~290 lines)                      │
│  │   • Lift each enum_types[] entry → real cfg_* table              │
│  │       6 std columns: <name>_id (SERIAL PK), code, label,         │
│  │       description, is_active, submenu_id (FK to submenu)         │
│  │   • Find/create "Configuration" menu                             │
│  │       submenu_ids follow benchmark pattern <menu_id>01, 02, …    │
│  │   • Rewrite referencing columns:                                  │
│  │       status VARCHAR(30) → status_id INT REFERENCES cfg_*(_id)   │
│  │   • Clear enum_types[] (renderer no longer emits CREATE TYPE)    │
│  │   • Backfill seeds from CIR workflow states                      │
│  │   • emit_cfg_seed_inserts() → INSERT INTO cfg_* VALUES (...)     │
│  │   • Round-trips schema_json through MantaraSchema (Pydantic) +   │
│  │     mantara_v8/renderer.render_sql() to regenerate SQL           │
│  │   • Idempotent: calling twice = no-op                            │
│  │   • Fail-soft: if any step crashes, original Mantara SQL kept    │
│  │   • Result: 7/7 benchmark patterns matching                      │
│  └─ rule_enforcer (rule_enforcer.py, ~140 lines)                     │
│      • For each CIR rule, find target table by `applies_to`          │
│      • Append "Rule <name>: <description>" to table comment          │
│      • Fall back to assumptions[] if no matching table               │
│      • Idempotent (skip if rule_name already in comment)             │
│      • Skips cfg_* tables (rules apply to business tables only)      │
│      • Result: rule_constraint_coverage lifts to 100                 │
└─────────────────────────────────────────────────────────────────────┘
                          ▼
┌─────────────────────────────────────────────────────────────────────┐
│ PERSIST GUARD                                                        │
│  • Never overwrite an existing good schema.json with empty/broken    │
│  • Definition: schema must have schema_name + non-empty menus        │
│  • If fresh Mantara result is empty → keep prior schema.json on disk │
│  • Logs warning so failure is visible                                │
└─────────────────────────────────────────────────────────────────────┘
                          ▼
┌─────────────────────────────────────────────────────────────────────┐
│ LAYER 3 — COVERAGE VALIDATORS (coverage.py, ~600 lines, $0, ALWAYS) │
│  6 sub-scores + 8 standard AI metrics                                │
│   • compute_coverage(cir, mantara_schema, mantara_sql, validation,   │
│                      er_mmd, repair_attempts, …)                     │
│   • Returns CoverageResult with sub_scores dict + ai_metrics dict    │
│   • Sub-scores: schema_completeness, entity_coverage,                │
│     state_seed_coverage, rule_constraint_coverage, visual_fidelity,  │
│     mantara_validation_pass                                          │
│   • Phase B improvement: state_seed gives 50% partial credit when    │
│     cfg_*_status table exists but seed values missing                │
│   • Phase B improvement: state_seed reads cfg_table._seed_values     │
│     (cfg_enforcer's private field) in addition to .values            │
│   • mantara_validation_pass: upstream-thin-entity errors weighted 0.5│
└─────────────────────────────────────────────────────────────────────┘
                          ▼
                  schema.json + schema.sql + 5 derived artifacts
```

---

## 5. The 7 pipeline modules

### 5.1 `runner.py` (~140 lines)
CLI entry. Parses args, locates Step-4 run_dir, dispatches to `generate_run()`. Pretty-prints the Step 5 summary with traffic-light flags.

**Args:**
- `--run-dir <path>` — specific Step-4 run_dir (default: latest with prd/PRD.md)
- `--runs-root <path>` — override Step-1 runs root
- `--preview` — adapter input only (no Mantara call)
- `--no-invoke` — build adapter input + use cached schema.json (rescore)
- `--no-prd` — exclude PRD prose from adapter input (smaller, cheaper)
- `--model <name>` — override Mantara model (e.g., gpt-4o-mini)
- `--cost-cap <dollars>` — override pre-flight cost cap

### 5.2 `generate.py` (~370 lines)
Orchestrator. Glues all 4 layers. Returns a structured report dict.

**Function signature:**
```python
def generate_run(
    run_dir: Path, *,
    invoke: bool = True,
    include_prd: bool = True,
    model: Optional[str] = None,
    cost_cap_usd: Optional[float] = None,
) -> dict:
```

**Helpers:**
- `_count_tables(schema_json)` — recursive table counter
- `_rerender_sql(schema_json)` — round-trips through Mantara's MantaraSchema + render_sql
- `_is_complete(schema_json)` — persist guard predicate

### 5.3 `mantara_adapter.py` (~700 lines)
Layer 1 — CIR + entities → Mantara Type-S input.

**Key classes:**
- `Step4Bundle` — bundles enriched_cir.json + entities.json + PRD.md + er.mmd
- `AdapterResult` — text (Type-S input), menu_groups, enum_seeds, char_count

**Constitution sandwich layout:**
```
[TOP]    System name | Schema name | Entity count denominator
         "I count N=12 entities. The schema MUST contain 12 tables."
         Preservation contract: enumerate every entity name, every state value
[MID]    Menu/Submenu/Fields hierarchy with anticipated operational menus
         FSD ANALYSIS block (authoritative pre-extracted context)
[BOTTOM] Repeat preservation contract (Liu 2023 anti-forgetting)
```

### 5.4 `mantara_client.py` (~263 lines)
Layer 2 — calls mantara_v8 engine.

**Function:**
```python
def call_mantara(input_text: str, *, model=None, on_step=None,
                 cost_cap_usd: float = 0.30) -> dict
```

Returns `{status, schema_json, schema_sql, mantara_validation, steps, repair_attempts, duration_seconds, cost_estimate_usd, model_used}`.

**Engine discovery:**
- `MANTARA_ENGINE_PATH` env var (preferred)
- Fallback: `<step5>/../mantara_v8`

### 5.5 `cfg_enforcer.py` (~290 lines)
Layer 2.6 — deterministic cfg_* enforcement + seed backfill.

**Public functions:**
- `enforce_cfg_tables(schema_json) -> schema_json` — mutate in place
- `backfill_cfg_seeds_from_cir(schema_json, cir) -> int` — fill missing seeds from CIR
- `emit_cfg_seed_inserts(schema_json) -> str` — INSERT INTO cfg_* SQL

**Internal:**
- `_strip_enum_suffix(type_name)` — normalize cfg name
- `_build_cfg_table(name, values, description, submenu_id, schema_name)` — 6-column cfg table dict
- `_rewrite_business_table_refs(table, cfg_tables, schema_name)` — col VARCHAR → col_id INT FK

### 5.6 `rule_enforcer.py` (~140 lines)
Layer 2.7 — CIR rule → comment/assumption injector.

**Public function:**
- `enforce_rule_constraints(schema_json, cir) -> schema_json`

**Internal:**
- `_find_table_for_rule(schema_json, applies_to)` — fuzzy snake_case match
- `_rule_already_present(rule_name, table)` — idempotency check

### 5.7 `coverage.py` (~600 lines)
Layer 3 — 6 sub-scores + 8 standard AI metrics.

**Public function:**
```python
def compute_coverage(cir, *, mantara_schema, mantara_sql,
                     mantara_validation, er_mmd, repair_attempts,
                     max_repair_attempts, open_questions_count,
                     cir_total_items, artifact_paths) -> CoverageResult
```

Returns `CoverageResult(overall_score, sub_scores, ai_metrics, ...)`.

---

## 6. The 7 sub-scores — formulas (v2)

### `schema_completeness` (weight 0.15)
```
required = {CREATE SCHEMA, menu table, submenu table, INSERT seed,
            ≥1 cfg_* lookup, ≥1 COMMENT ON, ≥1 FK constraint}
score = 100 × |required ∩ present| / |required|
```

### `entity_coverage` (weight 0.15)
```
fuzzy_match: CIR entity name vs Mantara table name (snake_case + plural-aware)
score = 100 × |entities matched| / |CIR entities|
```

### `state_seed_coverage` (weight 0.10)
```
For each CIR (entity, state_value):
  • full credit (100): cfg_*_status table contains seed in values/_seed_values
  • partial credit (50): cfg_*_status table exists but seed missing
  • zero (0): no cfg table, no seed
score = 100 × Σ(credits) / |CIR state values|
```

### `rule_constraint_coverage` (weight 0.10)
```
For each CIR rule:
  • match: CHECK constraint OR cfg_* lookup constraint OR column constraint
           OR COMMENT ON {table|column} mentioning rule keyword
           OR assumptions[] entry mentioning rule (post rule_enforcer)
score = 100 × |rules matched| / |CIR rules|
```

### `visual_fidelity` (weight 0.10)
```
For each entity in er.mmd:
  • match: entity present as Mantara table (fuzzy)
score = 100 × |er entities matched| / |er entities|
```

### `mantara_validation_pass` (weight 0.15)
```
Mantara's 25 internal business validators
score = 100 × (1 - weighted_errors / 25)
weights:
  upstream-thin-entity: 0.5  (ENTITY_HAS_NO_FIELDS_FROM_INPUT)
  step5-only error:     1.0  (DUPLICATE_PK, INVALID_FK_TARGET, etc.)
```

### `benchmark_format_compliance` (weight 0.25) ← v2
Measures how close the emitted SQL matches the production-grade
`MMS_Schema_v8.sql` benchmark format:
```
score = 0.25 × FK density to cfg_*    (target: ≥1 FK per cfg_* table)
      + 0.25 × index coverage          (target: ≥3 CREATE INDEX per business table)
      + 0.20 × audit columns           (% of business tables with all 3 audit cols)
      + 0.15 × CHECK constraint count  (≥1 = 50pts, ≥5 = 100pts)
      + 0.15 × comment density         (COMMENT/table ratio, target ≥1.0)
```

### Overall
```
overall = Σ (sub_score × weight)
```

---

## 7. The 8 standard AI metrics

Aligned with Step 1's `ai_metrics` shape (camelCase suffix `_pct`).

| Metric | Formula |
|---|---|
| `extraction_completeness_pct` | overall sub-score weighted average |
| `artifact_validity_pct` | 100 × (artifacts_emitted / artifacts_expected) |
| `schema_validity_pct` | 100 × (schema_completeness_score / 100) |
| `qa_pass_rate_pct` | 100 × (validators_passing / 25) |
| `retry_success_rate_pct` | 100 × (1 - repair_attempts / max_repair_attempts) |
| `manual_intervention_rate_pct` | 100 × (open_questions / cir_total_items) |
| `step_compatibility_pct` | 100 × (cir_entities_in_schema / cir_total_entities) |
| `confidence_score` | (mantara_validation_pass × 0.6) + (retry_success × 0.4) |

---

## 8. CLI flags reference

| Flag | Default | Effect |
|---|---|---|
| `--run-dir <path>` | latest step-4 dir | Point at a specific run_dir |
| `--runs-root <path>` | step-1's `output/runs` | Override the Step-1 runs root |
| `--preview` | off | Dump adapter input only ($0) |
| `--no-invoke` | off | Skip Mantara call; rescore cached schema.json ($0) |
| `--no-prd` | off | Exclude PRD prose from adapter input (smaller, cheaper) |
| `--model <name>` | env `OPENAI_MODEL` | Override Mantara model (gpt-4o, gpt-4o-mini) |
| `--cost-cap <usd>` | env or 0.30 | Pre-flight cost cap |

---

## 9. Cost matrix

| Configuration | Cost / run | Wall time |
|---|---|---|
| `--preview` | $0 | <1s |
| `--no-invoke` (rescore) | $0 | <1s |
| Default (gpt-4o-mini) | ~$0.30–0.85 | ~4–7 min |
| Default (gpt-4o) | ~$0.50–1.50 | ~5–7 min, may hit Tier 1 TPM |

---

## 10. Closed gaps (what's been done)

- ✅ **P1** — Mantara Type-S adapter with anticipated cfg_* lookups
- ✅ **P2** — Mantara invocation with cost cap + retry observability
- ✅ **P3** — 6 sub-scores + 8 standard AI metrics
- ✅ **P4** — Wired into master pipeline (auto-runs after Step 4)
- ✅ **P2.5** — Adapter fix: prevent 1-submenu menus
- ✅ **Phase A** — Constitution sandwich + denominator forcing
- ✅ **Phase B** — Lenient rule validation (rules-as-comments accepted) + state seed partial credit + read `_seed_values`
- ✅ **TEMPERATURE caching fix** — re-read MANTARA_TEMPERATURE per call (kept for future Best-of-N reinstate)
- ✅ **8 standard metrics** — aligned with Step 1's shape (_pct suffix)
- ✅ **cfg_* enforcer** — deterministic post-processor; 100% benchmark format compliance
- ✅ **cfg seed backfill from CIR** — when Mantara emits cfg table without values, fill from workflow states
- ✅ **rule enforcer** — deterministic CIR rule → comment/assumption injector
- ✅ **Persist guard** — never clobber a good schema.json with an empty one
- ✅ **66/66 tests passing**

---

## 11. Open gaps

- ⚠️ **mantara_validation_pass** still varies (0 ASN, 58 QSR after fresh runs). Mantara's internal validator counts errors before our post-processors run — it sees the raw Mantara emission with ENUMs/thin-entities. cfg_enforcer fixes the OUTPUT but not Mantara's internal score. This is cosmetic.
- ⚠️ **visual_fidelity = 0 on QSR** in the latest run — fresh Mantara emitted different table names than `er.mmd`. Easy follow-up fix in coverage.py (loosen snake_case matching).
- ⚠️ **gpt-4o on Tier 1 hits TPM** — input + retries can exceed 30K tokens/min. gpt-4o-mini is the safe default.
- ⚠️ **No CHECK constraint inference from rules** — rule_enforcer stops at comments. Could deterministically promote certain rule patterns (e.g., `min_one_X`) to CHECK constraints.

---

## 12. Roadmap

### Near-term ($0 improvements)
- Fix QSR visual_fidelity (loosen snake_case match in coverage.py)
- Master pipeline docs update for Phase A + post-processors
- Rule-pattern → CHECK constraint promotion (e.g., `min_one_*` rules)

### Medium-term (paid improvements)
- Strengthened v8 system prompt with cfg_* worked examples (one-time, then $0)
- gpt-4o end-to-end test on Tier 2+ API tier

### Long-term (architectural)
- P5 — Local Ollama backend ($0 runs)
- Step 6 (codegen) handoff — schema.json → frontend + backend code
- IDE/Spec-Kit-style natural-language editor over `schema.json`

### Deferred (require API tier upgrade)
- **Best-of-N (Phase D)** — was built, removed when Tier 1 TPM made it unrunnable. 3 parallel candidates × ~30K tokens always exceeds 30K TPM. Reinstate from git history when on Tier 2+ (300K TPM).
- **CoVe roll-call (Phase C)** — was built, removed alongside Best-of-N. Less impactful independently.

---

## 13. References

- **Liu 2023 — Lost in the Middle** (arXiv 2307.03172) — Constitution sandwich
- **arXiv 2512.04727** — Denominator forcing (Phase A)
- **Dhuliawala 2023 — CoVe** (arXiv 2309.11495) — referenced for deferred CoVe verifier
- **arXiv 2604.15618 — Functional Majority Voting** — referenced for deferred Best-of-N
- **Huang ICLR 2024 — LLMs Cannot Self-Correct Reasoning** — Why repair loops oscillate

---

## 14. File map

```
step-05-schema/
├── README.md                    quickstart + architecture
├── SHORT_DESCRIPTION.md         one-pager with scores
├── fulllevelofdetail.md         this file
├── .env                         OPENAI_API_KEY, MANTARA_*, STEP05_*
├── run.sh                       CLI launcher
├── pipeline/
│   ├── __init__.py              re-exports generate_run
│   ├── runner.py                CLI parser (~140 lines)
│   ├── generate.py              orchestrator (~370 lines)
│   ├── mantara_adapter.py       Layer 1: CIR → Type-S (~700 lines)
│   ├── mantara_client.py        Layer 2: Mantara invocation (~263 lines)
│   ├── cfg_enforcer.py          Layer 2.6: cfg_* enforcement + seed backfill (~290 lines)
│   ├── rule_enforcer.py         Layer 2.7: CIR rule injector (~140 lines)
│   └── coverage.py              Layer 3: 6 sub-scores + 8 metrics (~600 lines)
├── tests/
│   ├── conftest.py              shared fixtures
│   ├── test_adapter.py          Layer 1 tests
│   ├── test_client.py           Layer 2 tests (mocked)
│   ├── test_cfg_enforcer.py     Layer 2.6 tests (10 tests)
│   ├── test_rule_enforcer.py    Layer 2.7 tests (9 tests)
│   ├── test_coverage.py         Layer 3 tests
│   └── test_orchestrator.py     end-to-end tests
└── output/                       (input symlinks, runs go to step-1's runs/)
```