# Data Pipeline Flowchart

End-to-end data transformation from user input to generated project files.

```mermaid
flowchart LR
    A["User Action\ninput: --prompt str\n--images-dir Path"]
    -->|"argv: list[str]"| B["main._build_parser()\ninput: sys.argv\noutput: argparse.Namespace"]

    B -->|"args.prompt: str\nargs.output: Path|None"| C["cmd_generate(args)\ninput: Namespace\noutput: project_root: Path"]

    C -->|"user_prompt: str\nimages_dir: None\noutput_dir: Path"| D["generate_project()\ninput: user_prompt, images_dir\noutput: Path (project_root)"]

    D -->|"inputs_dir: Path\nuser_prompt: str|None\nimages_dir: Path|None"| E["ingest()\nSTEP 01\noutput: IngestedInputs"]

    E -->|"IngestedInputs:\nuser_prompt: str|None\nimages_dir: Path|None\nimage_paths: list[Path]"| F{"images_dir\nis None?"}

    F -->|"YES\n(prompt-only mode)"| G["generate_from_prompt()\nSTEP 02\nBedrock × 2 calls"]

    F -->|"NO\n(image mode)"| H["generate_prd()\nSTEP 02\nMantara sub-pipeline"]

    G -->|"LoadedPRD:\npath: Path\ntext: str\nddl_path: Path"| I["backend_gen.Orchestrator.run()\nSTEP 03\nBedrock × N calls"]

    H -->|"LoadedPRD:\npath: Path\ntext: str\nddl_path: Path|None"| I

    I -->|"api_manifest: dict\nsystem_name: str\nbackend files on disk"| J["generate_app_ir()\nSTEP 04\nOpus × (1 + N_pages) calls"]

    J -->|"AppIRBundle:\napp_plan: AppPlan\nir_pages: list[IRPage]\nrun_id: str"| K["generate_react_pages()\nSTEP 05\nSonnet × N_pages calls"]

    K -->|"MultiPageBundle:\npages: list[PageBundle]\nrouter_code: str\ncontext_code: str"| L["scaffold_frontend()\n+\nwrite_generated_files()"]

    L -->|"files written to\nfrontend_dir/src/"| M["Project on disk\nbackend/ + frontend/ + ir/\nready to run"]

    style A fill:#e8f4f8,stroke:#2c7bb6
    style M fill:#d7f0d7,stroke:#2d8a2d
    style F fill:#fff3cd,stroke:#856404
```

---

## Per-Step Input/Output Detail

```mermaid
flowchart LR
    subgraph S01["STEP 01 — Input Ingestion"]
        direction TB
        S01_IN["INPUT\nuser_prompt: str|None\nimages_dir: Path|None\ninputs_dir: Path"]
        S01_PROC["_collect_images()\nscan for .png/.jpg/.jpeg/.gif/.webp"]
        S01_OUT["OUTPUT\nIngestedInputs dataclass\nimage_paths: list[Path]"]
        S01_IN --> S01_PROC --> S01_OUT
    end

    subgraph S02["STEP 02 — PRD Generation"]
        direction TB
        S02_IN["INPUT\nuser_prompt: str\n(or images_dir: Path)"]
        S02_LLM1["Bedrock call 1\nclaude-3-5-sonnet\n→ PRD markdown"]
        S02_LLM2["Bedrock call 2\nclaude-3-5-sonnet\n→ Postgres DDL"]
        S02_OUT["OUTPUT\nLoadedPRD\npath + text + ddl_path"]
        S02_IN --> S02_LLM1 --> S02_LLM2 --> S02_OUT
    end

    subgraph S03["STEP 03 — Backend Generation"]
        direction TB
        S03_IN["INPUT\nddl_file: str\nprd_file: str"]
        S03_ANALYZE["RequirementAnalyzer\nLLM → structured analysis"]
        S03_CODEGEN["FastAPICodeGenerator\nLLM × N modules → Python files"]
        S03_MANIFEST["generate_manifest_json()\nAST parse → api_manifest.json"]
        S03_OUT["OUTPUT\napi_manifest: dict\nbackend/ directory on disk"]
        S03_IN --> S03_ANALYZE --> S03_CODEGEN --> S03_MANIFEST --> S03_OUT
    end

    subgraph S04["STEP 04 — IR Generation"]
        direction TB
        S04_IN["INPUT\napi_manifest: dict\nprd_text: str\nencode images"]
        S04_DETECT["detect_pages()\nOpus + vision → AppPlan"]
        S04_IR["generate_ir_bundle_for_page()\nOpus × N pages → IRBundle"]
        S04_OUT["OUTPUT\nAppIRBundle\napp_plan + ir_pages"]
        S04_IN --> S04_DETECT --> S04_IR --> S04_OUT
    end

    subgraph S05["STEP 05 — React Generation"]
        direction TB
        S05_IN["INPUT\nAppIRBundle\napi_manifest: dict"]
        S05_TSX["generate_page_react_code()\nSonnet × N pages → TSX"]
        S05_ROUTER["generate_router_code()\ndeterministic → App.tsx"]
        S05_CTX["generate_context_code()\ndeterministic → AppContext.tsx"]
        S05_OUT["OUTPUT\nMultiPageBundle\n+ files written to frontend/src/"]
        S05_IN --> S05_TSX --> S05_ROUTER --> S05_CTX --> S05_OUT
    end

    S01 -->|IngestedInputs| S02
    S02 -->|LoadedPRD| S03
    S03 -->|api_manifest| S04
    S04 -->|AppIRBundle| S05
```

---

## Bedrock Model Assignment

```mermaid
flowchart LR
    ENV["Environment Variables\nBEDROCK_MODEL_ID\nBEDROCK_OPUS_MODEL_ID\nBEDROCK_HAIKU_MODEL_ID\netc."]

    ENV --> M1["BACKEND_MODEL_ID\nclaude-3-5-sonnet-20241022\nUsed by: Step 02 + Step 03"]
    ENV --> M2["PAGE_DETECTION_MODEL\nclaude-opus-4-6\nUsed by: Step 04a"]
    ENV --> M3["IR_MODEL_ID\nclaude-opus-4-6\nUsed by: Step 04b (×N pages)"]
    ENV --> M4["REACT_MODEL_ID\nclaude-sonnet-4-5\nUsed by: Step 05 (×N pages)"]

    M1 -->|"invoke_model\n(boto3 direct)"| B1["AWS Bedrock\nbedrock-runtime"]
    M2 -->|"invoke\n(LangChain Converse)"| B1
    M3 -->|"invoke\n(LangChain Converse)"| B1
    M4 -->|"invoke\n(LangChain Converse)"| B1
```

---

## Error Propagation

```mermaid
flowchart TD
    BEDROCK["Bedrock raises\nClientError / ThrottlingException"]
    BEDROCK --> TLC["timed_llm_call() OR\nBedrockLLMClient.generate()\nlogs FAILED, re-raises"]
    TLC --> SERVICE["Service function\n(detect_pages, generate_ir_bundle, etc.)\nno catch — propagates"]
    SERVICE --> RLSTEP["RunLog.step() context manager\nsets StepEvent.status = 'failed'\nrecords error = repr(exc)\nre-raises"]
    RLSTEP --> ORCH["generate_project()\ntry/finally:\nrun_log.finalize() ALWAYS called"]
    ORCH --> FINALLY["run_log.json + run_log.md\nwritten even on failure"]
    ORCH --> MAIN["cmd_generate()\ncatches FileNotFoundError, ValueError\nprints to stderr\nreturns exit code 2"]
```

---

## Related source files

- [pipeline/master-pipeline/pipeline/orchestrator.py](../../pipeline/master-pipeline/pipeline/orchestrator.py) — wires all steps
- [shared/llm_logging.py](../../shared/llm_logging.py) — timed_llm_call
- [shared/run_log.py](../../shared/run_log.py) — telemetry
- [shared/bedrock_client.py](../../shared/bedrock_client.py) — ChatBedrockConverse factory
- [shared/bedrock_raw_client.py](../../shared/bedrock_raw_client.py) — direct boto3 client
