# LLM Backend Configuration Guide

Mantara v8 supports **6 backends**. Pick the one that fits your billing setup,
TPM tolerance, and quality bar. Switch between them by changing two env vars
— **no code changes needed**.

Backend code lives in `mantara_v8/backends/`. Each is one file (~150 LOC) that
implements the same `generate(prompt) → MantaraSchema` contract.

---

## 60-second overview

| Backend | When to use | TPM cap (approx) | Cost per Step-5 run | Quality |
|---|---|---|---|---|
| **openai** | Default, easiest | 30k–800k (tier-based) | $0.85–1.00 | Excellent |
| **bedrock** | High volume, separate billing | 50k+ (raisable) | $1.20–1.50 | Excellent |
| **anthropic** | Direct Claude, no AWS | 40k–400k (tier-based) | $1.20–1.50 | Excellent |
| **ollama** | Local, free, dev iteration | unlimited (local) | $0 | Weaker |
| **claude_cli** | Use your existing Claude Code subscription | n/a | included in Pro/Max | Excellent |
| **llamacpp** | Fully local, custom GGUF model | unlimited (local) | $0 | Varies |

---

## Where to plug in your credentials

**File to edit:** `mantara_v8/.env` (copy from `.env.example` if it doesn't exist)

```bash
cd debug-pipeline/mantara_v8
cp .env.example .env       # one-time
$EDITOR .env               # paste your keys
```

The `.env` is gitignored — your real keys never get committed.

After editing `.env`, **just run the pipeline normally** — Step 5 reads the env on every invocation:

```bash
cd ../step-05-schema
./run.sh --run-dir <run_dir>
```

That's it. No code changes. No restart.

---

## Per-backend setup

### 1. OpenAI (default)

**Where to plug in:** `mantara_v8/.env` → `OPENAI_API_KEY=...`

```env
MANTARA_BACKEND=openai
OPENAI_API_KEY=sk-proj-...
OPENAI_MODEL=gpt-4o
```

- **Get a key:** https://platform.openai.com/api-keys
- **Hit the TPM cap a lot?** Upgrade your tier: https://platform.openai.com/account/limits
- **Tier 1 (default)** = 30k TPM gpt-4o. Insufficient for typical Mantara runs (~30k input).
- **Tier 2** ($50 paid + 7 days) = 80k TPM. Fits comfortably.

### 2. AWS Bedrock — recommended if you have AWS

**Where to plug in:** `mantara_v8/.env` → AWS section

```env
MANTARA_BACKEND=bedrock
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=ap-southeast-7
BEDROCK_MODEL=apac.anthropic.claude-sonnet-4-20250514-v1:0
```

- **One-time setup:**
  1. https://console.aws.amazon.com/bedrock/home?#/modelaccess → request access for **Claude Sonnet 4**
  2. Create an IAM user with policy `AmazonBedrockFullAccess` (or just `bedrock:InvokeModel`)
  3. Generate access key/secret for that user → paste above
  4. Pick a region where Sonnet 4 is enabled: `us-west-2`, `us-east-1`, `eu-west-3`, `ap-southeast-1`, `ap-southeast-7`
  5. `pip install anthropic` (the SDK is shared)
- **Why use it:** higher TPM cap than OpenAI Tier 1, separate AWS billing, no quota issues.

### 3. Anthropic Direct (Claude without AWS)

**Where to plug in:** `mantara_v8/.env` → `ANTHROPIC_API_KEY=...`

```env
MANTARA_BACKEND=anthropic
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_MODEL=claude-sonnet-4-20250514
```

- **Get a key:** https://console.anthropic.com/settings/keys
- Same model as Bedrock; just a different gateway.

### 4. Ollama (local, free)

**Where to plug in:** `mantara_v8/.env` → `OLLAMA_*`

```env
MANTARA_BACKEND=ollama
OLLAMA_BASE_URL=http://localhost:11434/v1
OLLAMA_MODEL=qwen2.5-coder:7b
```

- **Setup:**
  ```bash
  brew install ollama
  ollama serve              # in another terminal
  ollama pull qwen2.5-coder:7b   # ~4 GB
  ```
- **Quality reality:** noticeably weaker than gpt-4o for schema gen. Output is structurally correct (the deterministic normaliser fixes most issues) but business columns are sparser. Good for dev iteration when you don't want to burn API credits.

### 5. Claude Code CLI

**Where to plug in:** `mantara_v8/.env` → `CLAUDE_CLI_*`

```env
MANTARA_BACKEND=claude_cli
CLAUDE_CLI_BIN=claude
CLAUDE_CLI_MODEL=claude-sonnet-4
```

- Uses your local `claude` CLI installation, billed via your Claude Code subscription (no extra API charges).
- Requires Claude Code installed + authenticated (`claude` works in your terminal).

### 6. llama.cpp

**Where to plug in:** `mantara_v8/.env` → `LLAMACPP_*`

```env
MANTARA_BACKEND=llamacpp
LLAMACPP_BASE_URL=http://localhost:8080/v1
LLAMACPP_MODEL=local
```

- Setup:
  ```bash
  git clone https://github.com/ggerganov/llama.cpp
  cd llama.cpp && make
  ./llama-server -m <path-to-model.gguf> --port 8080
  ```
- For when you want a specific GGUF model that Ollama doesn't carry.

---

## Quick switching examples

**Burn through OpenAI quota during development → flip to local:**
```bash
sed -i '' 's/^MANTARA_BACKEND=.*/MANTARA_BACKEND=ollama/' mantara_v8/.env
./run.sh --run-dir <run_dir>
```

**Demo for a customer who has AWS but no OpenAI:**
```bash
sed -i '' 's/^MANTARA_BACKEND=.*/MANTARA_BACKEND=bedrock/' mantara_v8/.env
./run.sh --run-dir <run_dir>
```

**One-off override (without editing .env):**
```bash
MANTARA_BACKEND=anthropic ANTHROPIC_API_KEY=sk-ant-... ./run.sh --run-dir <run_dir>
```

---

## Troubleshooting

| Symptom | Likely cause | Fix |
|---|---|---|
| `RateLimitError 429: Request too large for gpt-4o ... TPM Limit 30000` | OpenAI Tier 1 cap | Upgrade tier OR switch to `bedrock`/`anthropic`/`ollama` |
| `RateLimitError 429: insufficient_quota` | OpenAI account out of credits | Add credits OR switch backend |
| `RuntimeError: AWS credentials not set` | Bedrock without keys | Fill in `AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY` in `.env` |
| `RuntimeError: anthropic package not installed` | Bedrock/Anthropic without SDK | `pip install anthropic` |
| `ConnectionRefused: localhost:11434` | Ollama not running | `ollama serve` in another terminal |
| `claude: command not found` | Claude CLI backend without CLI | Install Claude Code or pick a different backend |

---

## Cost comparison (typical ASN-sized run, ~30k input + 5k output)

| Backend | Step 5 cost | Notes |
|---|---|---|
| openai gpt-4o | $0.85–1.00 | Best quality, hits TPM cap on Tier 1 |
| openai gpt-4o-mini | $0.30–0.50 | Thinner schema, no TPM issues |
| bedrock Sonnet 4 | $1.20–1.50 | Comparable quality, AWS billing |
| anthropic Sonnet 4 | $1.20–1.50 | Same as Bedrock |
| ollama qwen2.5-coder:7b | $0 | Local; quality varies by hardware |
| claude_cli | included | Whatever your Claude Code plan is |
| llamacpp | $0 | Local |

---

## What this MR ships vs what stays for follow-up

**Shipped in this MR:**
- `.env.example` with every backend documented
- `BACKENDS.md` (this file) — the user-facing setup guide
- All backend code (`mantara_v8/backends/*.py`) — already existed, now usable

**Not yet shipped (potential follow-up):**
- Auto-fallback on 429 (try gpt-4o → on rate-limit, retry with gpt-4o-mini)
- Per-step backend override (different backend for Step 4 vs Step 5)
- Bedrock TPM-aware request batching
