๐ณ Cookbook: Agentic Upload Engine¶
Build an intelligent file ingestion system that automatically extracts, translates, and understands uploaded documents using OCR and AI agents.
๐ฏ What You'll Build¶
A service that: 1. Accepts file uploads (PDF, images, JSON, CSV) 2. Automatically performs OCR on images 3. Translates content if needed 4. Returns structured, searchable data
๐๏ธ Phase 1: Clone & Setup¶
1. Get the Code¶
2. Install Dependencies¶
3. Configure Environment¶
Key Variables:
# LLM Configuration
LLM_MODEL=gpt-4o
LLM_API_KEY=your-api-key
LLM_BASE_URL=https://switchai.traylinx.com/v1
# File Engine (Required!)
FILE_ENGINE_BASE_URL=https://api.traylinx.com/file-engine
๐ Phase 2: Run the Service¶
๐งช Phase 3: Upload a Simple File¶
Upload a PDF¶
curl -X POST http://localhost:8000/v1/upload \
-H "Authorization: Bearer your-token" \
-F "file=@invoice.pdf"
Response:
{
"file_id": "file_abc123",
"filename": "invoice.pdf",
"extracted_text": "Invoice #12345\nDate: 2025-01-15\nTotal: $500.00",
"metadata": {
"pages": 1,
"has_images": false
}
}
๐ท Phase 4: OCR Processing¶
Upload an image and automatically extract text.
Upload an Image¶
curl -X POST http://localhost:8000/v1/upload \
-H "Authorization: Bearer your-token" \
-F "file=@receipt.jpg"
The engine: 1. Detects it's an image 2. Sends to OCR processor 3. Returns extracted text
๐ Phase 5: Upload + Translate¶
Process a document AND translate it in one request.
curl -X POST http://localhost:8000/v1/upload \
-H "Authorization: Bearer your-token" \
-F "file=@german_contract.pdf" \
-F "translate_to=en"
Response:
{
"file_id": "file_xyz789",
"original_text": "Vertrag zwischen...",
"translated_text": "Contract between...",
"source_language": "de",
"target_language": "en"
}
๐ The Processing Pipeline¶
File Upload
โ
โผ
โโโโโโโโโโโโโโโโโ
โ Type Detector โ โ PDF? Image? JSON?
โโโโโโโโโฌโโโโโโโโ
โ
โโโโโโโโโดโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโ โโโโโโโโโโโโโ
โ OCR โ โ Parser โ
โ (Images)โ โ (Text/PDF)โ
โโโโโโฌโโโโโ โโโโโโโฌโโโโโโ
โ โ
โโโโโโโโฌโโโโโโโโ
โผ
โโโโโโโโโโโโโโโ
โ Translator? โ โ If translate_to provided
โโโโโโโโฌโโโโโโโ
โผ
โโโโโโโโโโโโโโโ
โ Output โ
โโโโโโโโโโโโโโโ
๐ Phase 6: A2A Authentication¶
Enable machine-to-machine file uploads.
Add Credentials to .env¶
Upload as an Agent¶
curl -X POST http://localhost:8000/v1/upload \
-H "X-Agent-Secret-Token: your-token" \
-H "X-Agent-User-Id: your-agent-id" \
-F "file=@data.csv"
๐ Supported File Types¶
| Type | Extensions | Processing |
|---|---|---|
| Documents | .pdf, .docx |
Text extraction |
| Images | .jpg, .png, .webp |
OCR |
| Data | .json, .csv |
Parsing |
| Text | .txt, .md |
Direct read |
๐ Next Steps¶
- Build a file search index with vector embeddings
- Add webhook callbacks for async processing
- Integrate with the RAG Cookbook for document Q&A