# Services

This document traces every major service function in the dpg pipeline with full input/output/error detail.

---

## SERVICE: ingest()

```
FILE: pipeline/step-01-input-ingestion/pipeline/ingestion.py
```

### Full Pipeline

```
Step 1 → orchestrator calls ingest(inputs_dir, user_prompt, images_dir, output_dir)
Step 2 → _collect_images() scans images_dir or inputs_dir/images/
Step 3 → IngestedInputs dataclass is constructed
Step 4 → output_dir/ingested_inputs.json is written
Step 5 → IngestedInputs is returned to orchestrator
```

```
┌──────────────────────────────────────────────────────────────────┐
│ STEP: _collect_images() — ingestion.py                           │
├──────────────────────────────────────────────────────────────────┤
│ INPUT                                                             │
│   from:    ingest()                                               │
│   data:    { directory: Path | None }                             │
│   via:     parameter                                              │
│   required: no (None → returns [])                               │
├──────────────────────────────────────────────────────────────────┤
│ PROCESS                                                           │
│   Iterates the directory and collects files whose suffix is in   │
│   IMG_EXTENSIONS = {.png, .jpg, .jpeg, .gif, .webp}.            │
│   Returns sorted list of matching Paths.                         │
├──────────────────────────────────────────────────────────────────┤
│ OUTPUT                                                            │
│   returns: list[Path]  (sorted by filename)                      │
│   to:      ingest() → IngestedInputs.image_paths                 │
│   triggers: nothing — pure data                                  │
├──────────────────────────────────────────────────────────────────┤
│ ERROR PATH                                                        │
│   throws:  nothing — returns [] if dir is None / missing         │
│   caught:  N/A                                                    │
│   shown:   no error surfaced; empty image list passed downstream │
└──────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│ STEP: ingest() — ingestion.py                                    │
├──────────────────────────────────────────────────────────────────┤
│ INPUT                                                             │
│   from:    orchestrator.generate_project()                        │
│   data:    {                                                      │
│              inputs_dir: Path,          (required)               │
│              user_prompt: str | None,   (optional)               │
│              images_dir: Path | None,   (optional)               │
│              output_dir: Path | None    (optional)               │
│            }                                                      │
│   via:     keyword arguments                                      │
├──────────────────────────────────────────────────────────────────┤
│ PROCESS                                                           │
│   If images_dir is None, checks inputs_dir/images/ for images.  │
│   If images_dir is provided but missing → raises FileNotFoundError│
│   Constructs IngestedInputs dataclass, writes ingested_inputs.json│
├──────────────────────────────────────────────────────────────────┤
│ OUTPUT                                                            │
│   returns: IngestedInputs {                                      │
│              user_prompt:  str | None,                           │
│              images_dir:   Path | None,                          │
│              image_paths:  list[Path]                            │
│            }                                                      │
│   to:      orchestrator.generate_project() → step-02 inputs      │
│   written: output_dir/ingested_inputs.json                       │
├──────────────────────────────────────────────────────────────────┤
│ ERROR PATH                                                        │
│   throws:  FileNotFoundError if explicit images_dir doesn't exist│
│   caught:  RunLog.step("step-01-input-ingestion") context manager│
│   shown:   step status="failed" in run_log.md                   │
└──────────────────────────────────────────────────────────────────┘
```

---

## SERVICE: generate_from_prompt()

```
FILE: pipeline/step-02-prd-generation/pipeline/prd_generator.py
```

### Full Pipeline

```
Step 1 → orchestrator calls generate_from_prompt(user_prompt, output_dir)
Step 2 → create_llm_client() → BedrockLLMClient
Step 3 → _llm_generate_prd(llm, user_prompt) → Bedrock call 1
Step 4 → _llm_generate_ddl(llm, prd_text, user_prompt) → Bedrock call 2
Step 5 → write prd_file + ddl_file + metadata.json
Step 6 → return LoadedPRD
```

```
┌──────────────────────────────────────────────────────────────────┐
│ STEP: generate_from_prompt() — prd_generator.py                  │
├──────────────────────────────────────────────────────────────────┤
│ INPUT                                                             │
│   from:    orchestrator.generate_project() step-02 block         │
│   data:    {                                                      │
│              user_prompt: str,         (required)                │
│              output_dir:  Path | None  (optional)                │
│            }                                                      │
│   via:     keyword arguments                                      │
├──────────────────────────────────────────────────────────────────┤
│ PROCESS                                                           │
│   Makes two sequential Bedrock LLM calls:                        │
│   1. _llm_generate_prd() — prompt → markdown PRD                 │
│   2. _llm_generate_ddl() — PRD + prompt → Postgres DDL           │
│   Writes both to disk (tempdir if output_dir is None).           │
├──────────────────────────────────────────────────────────────────┤
│ OUTPUT                                                            │
│   returns: LoadedPRD {                                           │
│              path:     Path ("full_prd.md"),                      │
│              text:     str  (PRD markdown),                       │
│              images:   []   (no images in prompt mode),           │
│              ddl_path: Path ("schema.sql")                        │
│            }                                                      │
│   to:      orchestrator → step-03 reads ddl_path + path          │
│   triggers: orchestrator raises ValueError if ddl_path is None   │
├──────────────────────────────────────────────────────────────────┤
│ ERROR PATH                                                        │
│   throws:  boto3 ClientError (Bedrock call failure)              │
│   caught:  RunLog.step("step-02-prd-generation") records failure │
│   shown:   pipeline aborts; run_log.md shows failed step         │
└──────────────────────────────────────────────────────────────────┘
```

---

## SERVICE: generate_backend()

```
FILE: pipeline/step-03-backend-generation/pipeline/__init__.py
     → delegates to backend_gen/orchestrator.Orchestrator.run()
```

### Full Pipeline (9 internal sub-steps)

```
Step 1 → RequirementAnalyzer.analyze() — LLM: inputs → structured analysis
Step 2 → consolidate(analysis_result)   — normalize + validate
Step 3 → FastAPICodeGenerator.generate() — LLM × modules: → Python files
Step 4 → generate_manifest_json(module_files) — AST: files → api_manifest
Step 5 → MarkdownGenerator.generate()   — LLM: → BACKEND_SPEC.md
Step 6 → ReadmeGenerator.generate()     — LLM: → README.md
Step 7 → ProjectFileGenerator.generate_pyproject() — deterministic
Step 8 → OutputWriter.write_all()       — writes all files to output_dir
Step 9 → ValidationRunner.validate()    — AST-validates generated Python
```

```
┌──────────────────────────────────────────────────────────────────┐
│ STEP: Orchestrator.run() — backend_gen/orchestrator.py           │
├──────────────────────────────────────────────────────────────────┤
│ INPUT                                                             │
│   from:    generate_backend() called by orchestrator.py step-03  │
│   data:    LoadedGenerationInput {                               │
│              ddl_file:     str (path to schema.sql),             │
│              prd_file:     str (path to full_prd.md),            │
│              prompt_text:  str | None,                           │
│              image_files:  list[str] | None                      │
│            }                                                      │
│   via:     constructor + run() method                             │
├──────────────────────────────────────────────────────────────────┤
│ PROCESS                                                           │
│   Runs 9 internal sub-steps sequentially. Each sub-step logs its │
│   progress. LLM calls go to BedrockLLMClient (claude-3-5-sonnet).│
│   Validates Python syntax of generated files via AST parsing.    │
├──────────────────────────────────────────────────────────────────┤
│ OUTPUT                                                            │
│   returns: dict {                                                │
│              system_name:    str,                                │
│              api_manifest:   dict,                               │
│              module_names:   list[str],                          │
│              endpoint_count: int,                                │
│              file_count:     int,                                │
│              generated_files: dict[str, str]                     │
│            }                                                      │
│   written: output_dir/ contains all backend Python files         │
│   to:      orchestrator extracts api_manifest → step-04 + step-05│
├──────────────────────────────────────────────────────────────────┤
│ ERROR PATH                                                        │
│   throws:  boto3 ClientError, SyntaxError (if validation fails)  │
│   caught:  propagates to RunLog.step("step-03-backend-generation")│
│   shown:   step fails; partial backend files remain on disk      │
└──────────────────────────────────────────────────────────────────┘
```

---

## SERVICE: generate_app_ir()

```
FILE: pipeline/step-04-ir-generation/pipeline/ir_pipeline/services/multi_page_service.py
```

### Full Pipeline

```
Step 1 → encode images from prd_images list + images_dir
Step 2 → detect_pages(encoded_images, model, user_prompt) → AppPlan
Step 3 → sanitize_page_id() for each page
Step 4 → for each page_node:
           generate_ir_bundle_for_page(page_node, page_images, app_plan, ...)
           → IRPage(page_node, ir_bundle)
Step 5 → return AppIRBundle(app_plan, ir_pages, run_id, encoded_images)
```

```
┌──────────────────────────────────────────────────────────────────┐
│ STEP: generate_app_ir() — multi_page_service.py                  │
├──────────────────────────────────────────────────────────────────┤
│ INPUT                                                             │
│   from:    orchestrator.generate_project() step-04 block         │
│   data:    {                                                      │
│              images_dir:   Path | None,                          │
│              user_prompt:  str | None,                           │
│              api_manifest: dict | None,                          │
│              prd_text:     str | None,                           │
│              prd_images:   list[dict] | None  (media_type+data)  │
│            }                                                      │
│   via:     keyword arguments                                      │
├──────────────────────────────────────────────────────────────────┤
│ PROCESS                                                           │
│   Encodes all images to EncodedImage (base64). Calls detect_pages│
│   (one Opus call) to get AppPlan. Then calls                     │
│   generate_ir_bundle_for_page (one Opus call per page).          │
├──────────────────────────────────────────────────────────────────┤
│ OUTPUT                                                            │
│   returns: AppIRBundle {                                         │
│              app_plan:       AppPlan,                            │
│              ir_pages:       list[IRPage],                       │
│              run_id:         str,                                │
│              encoded_images: list[EncodedImage]                  │
│            }                                                      │
│   to:      orchestrator → step-05 generate_react_pages()         │
│   side effect: RunLog records N+1 LLM calls                      │
├──────────────────────────────────────────────────────────────────┤
│ ERROR PATH                                                        │
│   throws:  ValueError if no images + no text + no manifest       │
│            pydantic ValidationError if LLM returns bad JSON      │
│   caught:  RunLog.step("step-04-ir-generation")                  │
│   shown:   pipeline aborts; IR JSON not written                  │
└──────────────────────────────────────────────────────────────────┘
```

---

## SERVICE: generate_react_pages()

```
FILE: pipeline/step-05-react-generation/pipeline/react_gen.py
```

### Full Pipeline

```
Step 1 → for each IRPage in AppIRBundle.ir_pages:
           generate_page_react_code(ir_bundle, page_node, app_plan, model, api_manifest)
           → react_code: str  (TSX source)
Step 2 → generate_router_code(app_plan) → deterministic App.tsx
Step 3 → generate_context_code(app_plan) → deterministic AppContext.tsx
Step 4 → return MultiPageBundle
```

```
┌──────────────────────────────────────────────────────────────────┐
│ STEP: generate_react_pages() — react_gen.py                      │
├──────────────────────────────────────────────────────────────────┤
│ INPUT                                                             │
│   from:    orchestrator.generate_project() step-05 block         │
│   data:    {                                                      │
│              app_ir:       AppIRBundle,                          │
│              api_manifest: dict | None                           │
│            }                                                      │
│   via:     keyword arguments                                      │
├──────────────────────────────────────────────────────────────────┤
│ PROCESS                                                           │
│   For each IRPage calls Sonnet once to generate TSX code.        │
│   Router and context are generated deterministically (no LLM).   │
│   All code is collected into a MultiPageBundle.                  │
├──────────────────────────────────────────────────────────────────┤
│ OUTPUT                                                            │
│   returns: MultiPageBundle {                                     │
│              app_plan:    AppPlan,                               │
│              pages:       list[PageBundle],   (page_node + ir_bundle + react_code)
│              router_code: str  (App.tsx),                        │
│              context_code: str (AppContext.tsx),                 │
│              run_id:      str                                    │
│            }                                                      │
│   to:      orchestrator → write_generated_files()                │
│   side effect: RunLog records N LLM calls (one per page)         │
├──────────────────────────────────────────────────────────────────┤
│ ERROR PATH                                                        │
│   throws:  boto3 ClientError if LLM call fails                   │
│   caught:  RunLog.step("step-05-react-generation")               │
│   shown:   pipeline aborts; partial TSX files may remain         │
└──────────────────────────────────────────────────────────────────┘
```

---

## SERVICE: scaffold_frontend() + write_generated_files()

```
FILE: pipeline/step-05-react-generation/pipeline/scaffolder.py
```

```
┌──────────────────────────────────────────────────────────────────┐
│ STEP: scaffold_frontend() — scaffolder.py                        │
├──────────────────────────────────────────────────────────────────┤
│ INPUT                                                             │
│   from:    orchestrator.generate_project() step-05 block         │
│   data:    {                                                      │
│              output_dir:   Path  (frontend_dir),                 │
│              api_manifest: dict,                                 │
│              api_base_url: str,                                  │
│              app_plan:     AppPlan                               │
│            }                                                      │
├──────────────────────────────────────────────────────────────────┤
│ PROCESS                                                           │
│   Creates Vite project skeleton: package.json, tsconfig.json,    │
│   vite.config.ts, index.html, src/ structure. Writes             │
│   src/config.ts with API_BASE_URL baked in.                      │
├──────────────────────────────────────────────────────────────────┤
│ OUTPUT                                                            │
│   written: frontend_dir/ directory tree (Vite + React skeleton)  │
│   to:      write_generated_files() fills in the page components  │
├──────────────────────────────────────────────────────────────────┤
│ ERROR PATH                                                        │
│   throws:  IOError if disk write fails                            │
│   caught:  propagates to RunLog step                             │
│   shown:   step fails                                            │
└──────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│ STEP: write_generated_files() — scaffolder.py                    │
├──────────────────────────────────────────────────────────────────┤
│ INPUT                                                             │
│   from:    orchestrator.generate_project() step-05 block         │
│   data:    {                                                      │
│              frontend_dir:  Path,                                │
│              router_code:   str  (App.tsx content),              │
│              context_code:  str  (AppContext.tsx content),        │
│              pages:         dict[filename: str, tsx_code: str]   │
│            }                                                      │
├──────────────────────────────────────────────────────────────────┤
│ PROCESS                                                           │
│   Writes App.tsx to src/, AppContext.tsx to src/, each page TSX  │
│   to src/pages/<PascalCaseName>Page.tsx.                         │
├──────────────────────────────────────────────────────────────────┤
│ OUTPUT                                                            │
│   written: frontend_dir/src/App.tsx                              │
│            frontend_dir/src/AppContext.tsx                        │
│            frontend_dir/src/pages/<PageName>Page.tsx × N         │
│   to:      assembled project — ready for npm install && npm run dev
├──────────────────────────────────────────────────────────────────┤
│ ERROR PATH                                                        │
│   throws:  IOError on disk write failure                          │
│   caught:  propagates up                                          │
│   shown:   step fails in RunLog                                  │
└──────────────────────────────────────────────────────────────────┘
```

---

## SERVICE: validate_and_load_prd()

```
FILE: shared/media/prd_loader.py
```

```
┌──────────────────────────────────────────────────────────────────┐
│ STEP: validate_and_load_prd() — prd_loader.py                    │
├──────────────────────────────────────────────────────────────────┤
│ INPUT                                                             │
│   from:    prd_generator.generate_prd() or generate_prd_from_ddl()
│   data:    prd_file: str | Path                                  │
│   required: yes                                                   │
├──────────────────────────────────────────────────────────────────┤
│ PROCESS                                                           │
│   Validates extension is in {.txt, .md, .pdf, .pptx, .docx}.    │
│   Dispatches to _extract_from_pdf / _extract_from_pptx /         │
│   _extract_from_docx or plain read_text() for .md/.txt.          │
│   Returns LoadedPRD with full text + embedded images as base64.  │
├──────────────────────────────────────────────────────────────────┤
│ OUTPUT                                                            │
│   returns: LoadedPRD { path, text: str, images: list[PRDImage] } │
│   to:      prd_generator wraps in LoadedPRD with ddl_path set    │
│   image shape: PRDImage { media_type: str, data: str (base64) }  │
├──────────────────────────────────────────────────────────────────┤
│ ERROR PATH                                                        │
│   throws:  FileNotFoundError if path doesn't exist               │
│            ValueError if extension unsupported                   │
│            ValueError if file is empty (no text + no images)     │
│            ImportError if pymupdf/python-pptx/python-docx missing│
│   caught:  caller (prd_generator)                                │
│   shown:   propagates to RunLog step-02                          │
└──────────────────────────────────────────────────────────────────┘
```

---

## SERVICE: build_chat_model()

```
FILE: shared/bedrock_client.py
```

```
┌──────────────────────────────────────────────────────────────────┐
│ STEP: build_chat_model() — bedrock_client.py                     │
├──────────────────────────────────────────────────────────────────┤
│ INPUT                                                             │
│   from:    step-04 and step-05 service files                     │
│   data:    {                                                      │
│              model_name:  str | None    (optional, env-var wins) │
│              temperature: float = 0,                             │
│              max_tokens:  int | None                             │
│            }                                                      │
├──────────────────────────────────────────────────────────────────┤
│ PROCESS                                                           │
│   Resolves AWS credentials from env vars (explicit keys, profile,│
│   or implicit default chain). Resolves model name via            │
│   _MODEL_ALIASES dict. Constructs boto3 Session + applies        │
│   botocore.Config with configured timeouts.                      │
├──────────────────────────────────────────────────────────────────┤
│ OUTPUT                                                            │
│   returns: ChatBedrockConverse instance                          │
│   to:      page_detection_service, ir_generation_service,        │
│            react_generation_service                              │
├──────────────────────────────────────────────────────────────────┤
│ ERROR PATH                                                        │
│   throws:  RuntimeError if credentials not found                 │
│            ModuleNotFoundError if boto3 or langchain-aws missing  │
│   caught:  caller at module import time                          │
│   shown:   printed to stderr; process exits                      │
└──────────────────────────────────────────────────────────────────┘
```

---

## SERVICE: RunLog.step() + record_llm_call()

```
FILE: shared/run_log.py
```

```
┌──────────────────────────────────────────────────────────────────┐
│ STEP: RunLog.step() context manager — run_log.py                 │
├──────────────────────────────────────────────────────────────────┤
│ INPUT                                                             │
│   from:    orchestrator.generate_project() — each step block     │
│   data:    name: str  (e.g. "step-01-input-ingestion")           │
│   via:     with run_log.step("name") as ev:                      │
├──────────────────────────────────────────────────────────────────┤
│ PROCESS                                                           │
│   Creates StepEvent, appends to self.steps, pushes to _stack.   │
│   On exit: records duration_ms, sets status="ok" or "failed".   │
│   On failure: records error = repr(exc), then re-raises.         │
├──────────────────────────────────────────────────────────────────┤
│ OUTPUT                                                            │
│   yields:  StepEvent  (caller can set ev.notes)                  │
│   to:      RunLog.steps list → finalize() → run_log.json/md      │
├──────────────────────────────────────────────────────────────────┤
│ ERROR PATH                                                        │
│   on exception: status="failed" is recorded but exception re-raised
│   shown:   in run_log.md step table column "Status"              │
└──────────────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────────────┐
│ STEP: RunLog.record_llm_call() — run_log.py                      │
├──────────────────────────────────────────────────────────────────┤
│ INPUT                                                             │
│   from:    timed_llm_call() OR BedrockLLMClient.generate()       │
│   data:    {                                                      │
│              model:         str,                                 │
│              duration_ms:   float,                               │
│              input_tokens:  int,                                 │
│              output_tokens: int,                                 │
│              label:         str | None                           │
│            }                                                      │
├──────────────────────────────────────────────────────────────────┤
│ PROCESS                                                           │
│   No-op if _stack is empty. Otherwise appends LLMCall to the    │
│   currently-active StepEvent's llm_calls list.                   │
├──────────────────────────────────────────────────────────────────┤
│ OUTPUT                                                            │
│   StepEvent.llm_calls grows by 1 entry                          │
│   to:      RunLog.finalize() aggregates all into run_log.json    │
├──────────────────────────────────────────────────────────────────┤
│ ERROR PATH                                                        │
│   throws:  nothing (silently no-ops on errors inside)            │
│   shown:   N/A — intentionally fire-and-forget                   │
└──────────────────────────────────────────────────────────────────┘
```

---

## Related source files

- [pipeline/step-01-input-ingestion/pipeline/ingestion.py](../pipeline/step-01-input-ingestion/pipeline/ingestion.py)
- [pipeline/step-02-prd-generation/pipeline/prd_generator.py](../pipeline/step-02-prd-generation/pipeline/prd_generator.py)
- [pipeline/step-03-backend-generation/pipeline/backend_gen/orchestrator.py](../pipeline/step-03-backend-generation/pipeline/backend_gen/orchestrator.py)
- [pipeline/step-04-ir-generation/pipeline/ir_pipeline/services/multi_page_service.py](../pipeline/step-04-ir-generation/pipeline/ir_pipeline/services/multi_page_service.py)
- [pipeline/step-05-react-generation/pipeline/react_gen.py](../pipeline/step-05-react-generation/pipeline/react_gen.py)
- [shared/run_log.py](../shared/run_log.py)
- [shared/bedrock_client.py](../shared/bedrock_client.py)
- [shared/media/prd_loader.py](../shared/media/prd_loader.py)