# Components

In dpg the "components" are the pipeline step modules. Each module is a self-contained unit with a defined input contract, transformation logic, output contract, and error path. This document treats each step as a component.

---

## COMPONENT: Step 01 — Input Ingestion

```
FILE: pipeline/step-01-input-ingestion/pipeline/ingestion.py
```

**INPUT (Function signature):**

| Parameter | Type | Required | Source |
|-----------|------|----------|--------|
| `inputs_dir` | `Path` | yes | `shared/config.INPUTS_DIR` or CLI `--inputs-dir` |
| `user_prompt` | `str \| None` | no | CLI `--prompt` |
| `images_dir` | `Path \| None` | no | CLI `--images-dir` |
| `output_dir` | `Path \| None` | no | computed by orchestrator as `s01_out` |

**PROCESS:**
Checks if an explicit `images_dir` is provided; if not, looks for `inputs_dir/images/`. Collects image files matching `IMG_EXTENSIONS`. Returns an `IngestedInputs` dataclass and optionally writes a `ingested_inputs.json` artifact.

**OUTPUT:**
```python
IngestedInputs(
    user_prompt:  str | None,
    images_dir:   Path | None,   # resolved images directory
    image_paths:  list[Path]     # sorted list of image files found
)
```
Goes to: `orchestrator.generate_project()` → passed to step-02.

**CONSUMES FROM CONFIG:**
- `shared.config.IMG_EXTENSIONS = {".png", ".jpg", ".jpeg", ".gif", ".webp"}`
- `shared.config.INPUTS_DIR` (default input root)

**ERROR PATH:**
- `FileNotFoundError` if explicit `images_dir` doesn't exist → propagates to `RunLog.step()`
- Missing `inputs_dir/images/` is silently treated as "no images" (not an error)

---

## COMPONENT: Step 02 — PRD Generation

```
FILE: pipeline/step-02-prd-generation/pipeline/prd_generator.py
```

**INPUT (Function signatures):**

`generate_from_prompt()` — used when no images are provided:

| Parameter | Type | Required | Source |
|-----------|------|----------|--------|
| `user_prompt` | `str` | yes | `IngestedInputs.user_prompt` |
| `output_dir` | `Path \| None` | no | `s02_out` from orchestrator |

`generate_prd()` — used when images are provided:

| Parameter | Type | Required | Source |
|-----------|------|----------|--------|
| `images_dir` | `Path \| None` | yes | `IngestedInputs.images_dir` |
| `user_prompt` | `str \| None` | no | `IngestedInputs.user_prompt` |
| `output_dir` | `Path \| None` | no | `s02_out` |

**PROCESS:**
- `generate_from_prompt()`: Makes 2 Bedrock calls (PRD generation + DDL generation) using `BedrockLLMClient`.
- `generate_prd()`: Delegates to the Mantara sub-pipeline (`mantara/main.py`) which handles image-to-PRD synthesis.
- Sub-step 02c: Optionally runs Mantara schema + Dalfin JSON compilation (separate `try/except` — failure is non-fatal).

**OUTPUT:**
```python
LoadedPRD(
    path:     Path,           # path to full_prd.md
    text:     str,            # PRD markdown content
    images:   list[PRDImage], # embedded images (from PDF/PPTX source)
    ddl_path: Path | None     # path to schema.sql (None if generation failed)
)
```
Goes to: `orchestrator.generate_project()` → `ddl_path` passed to step-03, `text`+`images` passed to step-04.

**CONSUMES FROM CONFIG:**
- `shared.config.BACKEND_MODEL_ID` (via `BedrockLLMClient`)
- `shared.config.BACKEND_TEMPERATURE`

**ERROR PATH:**
- `ValueError` if `ingested.user_prompt` is None in prompt mode
- `ValueError` if `generate_prd()` returns `None`
- `ValueError` if DDL is not produced → orchestrator raises before step-03

**Artifacts written:**
- `s02_out/full_prd.md`
- `s02_out/schema.sql`
- `s02_out/metadata.json`
- `s02_out/mantara_schema.json` (optional)
- `s02_out/dalfin.json` (optional)

---

## COMPONENT: Step 03 — Backend Generation

```
FILE: pipeline/step-03-backend-generation/pipeline/backend_gen/orchestrator.py
      class Orchestrator
```

**INPUT (Constructor + run()):**

| Parameter | Type | Required | Source |
|-----------|------|----------|--------|
| `llm_client` | `LLMClient` | yes | `create_llm_client()` → `BedrockLLMClient` |
| `output_dir` | `Path` | yes | `backend_dir` (project_root/backend/) |
| `loaded_input.ddl_file` | `str` | yes | `LoadedPRD.ddl_path` |
| `loaded_input.prd_file` | `str` | yes | `LoadedPRD.path` |
| `loaded_input.prompt_text` | `str \| None` | no | `IngestedInputs.user_prompt` |
| `loaded_input.image_files` | `list[str] \| None` | no | `IngestedInputs.image_paths` |

**PROCESS (9 sub-steps):**
1. `RequirementAnalyzer.analyze()` — LLM extracts system name, modules, entities, endpoints
2. `consolidate()` — normalize names and validate integrity
3. `FastAPICodeGenerator.generate()` — LLM generates per-module Python files
4. `generate_manifest_json()` — AST-parse generated files → `api_manifest.json`
5. `MarkdownGenerator.generate()` — LLM generates `BACKEND_SPEC.md`
6. `ReadmeGenerator.generate()` — LLM generates `README.md`
7. `ProjectFileGenerator` — deterministic `pyproject.toml` + `requirements.txt`
8. `OutputWriter.write_all()` — writes all files to disk
9. `ValidationRunner.validate()` — AST-validates all generated `.py` files

**OUTPUT:**
```python
{
    "system_name":     str,
    "api_manifest":    dict,          # structured endpoint catalogue
    "module_names":    list[str],
    "endpoint_count":  int,
    "file_count":      int,
    "generated_files": dict[str, str] # filename → source code
}
```
`api_manifest` shape:
```json
{
  "version": "1.0",
  "modules": [
    {
      "name": "orders",
      "prefix": "/orders",
      "endpoints": [
        { "method": "GET", "path": "/orders", "summary": "..." }
      ]
    }
  ],
  "enums": {}
}
```

**Artifacts written:**
- `project_root/backend/` — full FastAPI project
- `project_root/api_manifest.json`
- `s03_out/api_manifest.json`
- `s03_out/summary.json`

**ERROR PATH:**
- `SyntaxError` if generated Python is invalid (raised by `ValidationRunner`)
- `boto3 ClientError` on LLM call failure
- Both propagate to `RunLog.step("step-03-backend-generation")`

---

## COMPONENT: Step 04 — IR Generation

```
FILE: pipeline/step-04-ir-generation/pipeline/ir_pipeline/services/multi_page_service.py
      generate_app_ir()
```

**INPUT:**

| Parameter | Type | Required | Source |
|-----------|------|----------|--------|
| `images_dir` | `Path \| None` | no | `IngestedInputs.images_dir` |
| `user_prompt` | `str \| None` | no | `IngestedInputs.user_prompt` |
| `api_manifest` | `dict \| None` | no | step-03 output |
| `prd_text` | `str \| None` | no | `LoadedPRD.text` |
| `prd_images` | `list[dict] \| None` | no | `LoadedPRD.images_as_dicts` |
| `model_config` | `MultiPageModelConfig \| None` | no | defaults from env vars |

**PROCESS:**
1. Encodes all images to `EncodedImage` (base64)
2. Calls `detect_pages()` with Opus — one LLM call → `AppPlan` (all pages)
3. Sanitizes page IDs to snake_case via `sanitize_page_id()`
4. For each page: calls `generate_ir_bundle_for_page()` with Opus — one LLM call → `IRBundle`
5. Assembles `AppIRBundle`

**OUTPUT:**
```python
AppIRBundle(
    app_plan:       AppPlan,           # app_name, pages, shared_state, design_system
    ir_pages:       list[IRPage],      # page_node + ir_bundle per page
    run_id:         str,
    encoded_images: list[EncodedImage]
)
```

`IRBundle` shape: 12 sections — `page_ir`, `data_ir`, `data_fetch_ir`, `data_model_ir`, `behaviour_ir`, `component_ir`, `layout_ir`, `navigation_ir`, `realtime_ir`, `metadata`, `page_data_contract`, `schema_ir`

**Artifacts written:**
- `s04_out/ir/app_plan.json`
- `s04_out/ir/<page_id>_ir.json` × N pages

**CONSUMES FROM CONFIG:**
- `MultiPageModelConfig.detection_model` → `claude-opus-4-6` (default)
- `MultiPageModelConfig.ir_model` → `claude-opus-4-6` (default)

**ERROR PATH:**
- `ValueError` if no inputs provided
- `pydantic.ValidationError` if LLM returns invalid JSON for `AppPlan` or `IRBundle`
- Both propagate to `RunLog.step("step-04-ir-generation")`

---

## COMPONENT: Step 05 — React Generation

```
FILE: pipeline/step-05-react-generation/pipeline/react_gen.py
      generate_react_pages()
```

**INPUT:**

| Parameter | Type | Required | Source |
|-----------|------|----------|--------|
| `app_ir` | `AppIRBundle` | yes | step-04 output |
| `api_manifest` | `dict \| None` | no | step-03 output |
| `model_config` | `MultiPageModelConfig \| None` | no | defaults from env vars |

**PROCESS:**
1. For each `IRPage`: calls `generate_page_react_code()` — one Sonnet call → TSX source
2. `generate_router_code(app_plan)` — deterministic, no LLM
3. `generate_context_code(app_plan)` — deterministic, no LLM
4. Assembles `MultiPageBundle`

**OUTPUT:**
```python
MultiPageBundle(
    app_plan:     AppPlan,
    pages:        list[PageBundle],  # page_node + ir_bundle + react_code (TSX string)
    router_code:  str,               # App.tsx content
    context_code: str,               # AppContext.tsx content
    run_id:       str
)
```

**Artifacts written (by orchestrator after this call):**
- `s05_out/pages/<PascalCase>Page.tsx` × N
- `s05_out/AppContext.tsx`
- `s05_out/App.tsx`
- `project_root/frontend/src/pages/<PascalCase>Page.tsx` × N
- `project_root/frontend/src/App.tsx`
- `project_root/frontend/src/AppContext.tsx`

**CONSUMES FROM CONFIG:**
- `MultiPageModelConfig.react_model` → `claude-sonnet-4-5` (default)

**ERROR PATH:**
- `boto3 ClientError` on LLM failure
- Propagates to `RunLog.step("step-05-react-generation")`

---

## COMPONENT: RunLog

```
FILE: shared/run_log.py
```

**Input:** Any step registers itself via `with run_log.step("name") as ev:`
**Input:** Any LLM caller registers via `run_log.record_llm_call(...)`

**State shape before finalization:**
```python
RunLog {
    run_id:     str,
    started_at: str (ISO 8601),
    steps:      list[StepEvent],
    _stack:     list[StepEvent],  # currently running steps
    summary:    dict
}

StepEvent {
    name:        str,
    started_at:  str,
    duration_ms: float | None,
    status:      "running" | "ok" | "failed",
    error:       str | None,
    llm_calls:   list[LLMCall],
    notes:       dict
}

LLMCall {
    step:          str,
    model:         str,
    label:         str | None,
    duration_ms:   float,
    input_tokens:  int,
    output_tokens: int
}
```

**Output after `finalize()`:**
- `master_out/run_log.json` — full telemetry as JSON
- `master_out/run_log.md` — human-readable markdown table

**ERROR PATH:**
- `finalize()` is always called in `generate_project()`'s `finally:` block
- `record_llm_call()` silently no-ops on any internal error

---

## Related source files

- [pipeline/step-01-input-ingestion/pipeline/ingestion.py](../pipeline/step-01-input-ingestion/pipeline/ingestion.py)
- [pipeline/step-02-prd-generation/pipeline/prd_generator.py](../pipeline/step-02-prd-generation/pipeline/prd_generator.py)
- [pipeline/step-03-backend-generation/pipeline/backend_gen/orchestrator.py](../pipeline/step-03-backend-generation/pipeline/backend_gen/orchestrator.py)
- [pipeline/step-04-ir-generation/pipeline/ir_pipeline/services/multi_page_service.py](../pipeline/step-04-ir-generation/pipeline/ir_pipeline/services/multi_page_service.py)
- [pipeline/step-05-react-generation/pipeline/react_gen.py](../pipeline/step-05-react-generation/pipeline/react_gen.py)
- [shared/run_log.py](../shared/run_log.py)
- [shared/schemas/ir_bundle.py](../shared/schemas/ir_bundle.py)
