Traylinx Router Agent¶

Status: ✅ Production Ready (Sprint 1 & 2 Complete)
Version: 2.0.0
Port: 8080

Overview¶

The Router Agent is the central message broker for the Traylinx platform. It provides two core capabilities:

Request Routing (Sprint 1) - Routes requests to agents based on capabilities
Event Publishing (Sprint 2) - Publishes events to multiple subscribers in parallel

Think of it as: A smart switchboard that knows which agent can handle what request, and a broadcast system that delivers events to interested parties.

🎯 Key Features¶

Request Routing (Sprint 1)¶

✅ Capability-Based Discovery - Find agents via Registry
✅ Automatic Agent Selection - Pick best agent by score
✅ Request Forwarding - Forward to target agent with A2A auth
✅ Retry Logic - Try multiple agents on failure
✅ Error Handling - Graceful degradation
✅ Performance Tracking - Report stats back to Registry

Event Publishing (Sprint 2)¶

✅ Subscriber Discovery - Query Subscription Service
✅ Parallel Fan-out - Deliver to multiple subscribers concurrently
✅ Delivery Tracking - Success/failure per subscriber
✅ Error Recovery - Continue on partial failures
✅ Metrics Collection - Track event delivery stats

Technical Features¶

✅ A2A Authentication - All endpoints secured
✅ Health Checks - Monitor Registry & Subscription Service
✅ Structured Logging - Comprehensive operation logs
✅ Metrics Endpoint - In-memory stats (Prometheus-ready)
✅ Dependency Injection - Clean architecture
✅ Async/Await - High-performance async operations

Architecture¶

Request Routing Flow¶

1. Client sends request with capabilities
2. Router queries Registry for matching agents
3. Registry returns ranked list of agents
4. Router selects best agent (by score)
5. Router forwards request to agent
6. If failure, try next agent (up to 3 attempts)
7. Router reports stats back to Registry
8. Return response to client

Event Publishing Flow¶

1. Publisher sends event to Router
2. Router queries Subscription Service for subscribers
3. Subscription Service returns matching subscribers
4. Router fans out event to all subscribers in parallel
5. Router tracks delivery success/failure
6. Return delivery summary to publisher

Quick Start¶

Prerequisites¶

Python 3.9+
Poetry
Agent Registry running (port 8000)
Subscription Service running (port 8001)

1. Install Dependencies¶

poetry install

2. Configure Environment¶

Create .env file:

# Registry Service
REGISTRY_SERVICE_URL=http://localhost:8000

# Subscription Service
SUBSCRIPTION_SERVICE_URL=http://localhost:8001

# Router Configuration
ROUTER_TIMEOUT_SECONDS=30
ROUTER_RETRY_ATTEMPTS=3
ROUTER_MAX_AGENTS_TO_TRY=3

# Event Publishing
EVENT_FANOUT_TIMEOUT_SECONDS=30
EVENT_MAX_PARALLEL_DELIVERIES=50

# Agent Identity
ROUTER_AGENT_KEY=traylinx-router-agent

3. Start Router¶

poetry run uvicorn app.main:app --host 0.0.0.0 --port 8080 --reload

4. Verify Health¶

# Liveness
curl http://localhost:8080/health

# Readiness (checks Registry & Subscription Service)
curl http://localhost:8080/ready

# Metrics
curl http://localhost:8080/metrics

5. View API Docs¶

Open http://localhost:8080/docs

API Reference¶

Base URL¶

http://localhost:8080

Authentication¶

All endpoints require A2A authentication:

Authorization: Bearer {access_token}
X-Agent-Key: {agent_key}
X-Agent-Secret-Token: {agent_secret}

🔀 1. Request Routing¶

Route a request to an agent based on capabilities.

Endpoint: POST /a2a/route

Request¶

{
  "capabilities": [
    {"key": "domain", "value": "flights"},
    {"key": "op", "value": "search"}
  ],
  "endpoint": "/a2a/search",
  "payload": {
    "query": "NYC to LAX",
    "date": "2025-12-01"
  },
  "timeout": 30
}

Response (Success)¶

{
  "success": true,
  "data": {
    "results": [
      {"flight": "AA123", "price": 299}
    ]
  },
  "agent_key": "flights-search-agent",
  "latency_ms": 245
}

Response (Failure)¶

{
  "success": false,
  "error": "No agents found matching capabilities",
  "latency_ms": 50
}

Error Scenarios¶

Error	HTTP Status	Description
`No agents found`	404	No agents match the capabilities
`All agents failed`	503	All candidate agents failed
`Registry unavailable`	503	Cannot connect to Registry
`Agent timeout`	504	Agent didn't respond in time

Routing Behavior¶

Agent Discovery: Query Registry with capabilities
Agent Selection: Pick agent with highest score (success_rate, latency, freshness)
Request Forwarding: Forward to {agent.base_url}{endpoint}
Retry Logic: On failure, try next agent (up to 3 total attempts)
Stats Reporting: Report success/failure/latency to Registry

📡 2. Event Publishing¶

Publish an event to all subscribers.

Endpoint: POST /a2a/event

Request¶

{
  "event_type": "job.completed",
  "job_id": "job-123",
  "payload": {
    "status": "success",
    "result": "All tests passed",
    "duration": 120
  },
  "timeout": 30
}

Response (Success)¶

{
  "success": true,
  "delivered": 3,
  "failed": 0,
  "total_subscribers": 3,
  "latency_ms": 150,
  "errors": null
}

Response (Partial Failure)¶

{
  "success": true,
  "delivered": 2,
  "failed": 1,
  "total_subscribers": 3,
  "latency_ms": 200,
  "errors": [
    {
      "agent_key": "agent-c",
      "error": "Connection timeout"
    }
  ]
}

Response (No Subscribers)¶

{
  "success": true,
  "delivered": 0,
  "failed": 0,
  "total_subscribers": 0,
  "latency_ms": 10
}

Event Publishing Behavior¶

Query Subscribers: Call Subscription Service with event details
Subscriber Discovery: Get list of matching agents
Parallel Fan-out: Deliver to all subscribers concurrently (up to 50 at once)
Track Results: Count successful and failed deliveries
Return Summary: Report delivery statistics to publisher

Event Delivery¶

Target Endpoint: {agent.base_url}/a2a/event/receive
Concurrency: Up to 50 parallel deliveries
Timeout: Configurable per event (default 30s)
Error Handling: Continue on partial failures

🏥 Health & Monitoring¶

Health Endpoints¶

Liveness Probe¶

GET /health

Returns 200 if service is running.

Readiness Probe¶

GET /ready

Checks connectivity to: - Agent Registry - Subscription Service

Returns 200 if both are healthy, 503 otherwise.

Response:

{
  "service": "traylinx-router-agent",
  "ready": true,
  "registry": "healthy",
  "subscription_service": "healthy"
}

Metrics¶

GET /metrics

Returns in-memory metrics:

{
  "uptime_seconds": 3600,
  "routing_requests_total": 1523,
  "routing_requests_success": 1487,
  "routing_requests_failed": 36,
  "success_rate": 97.6,
  "latency_avg_ms": 234.5,
  "latency_p95_ms": 450.2,
  "latency_p99_ms": 890.1,
  "agent_selection_counts": {
    "flights-search-agent": 892,
    "hotels-search-agent": 595
  },
  "event_publishes": 234,
  "event_deliveries": 702,
  "event_failures": 12
}

Configuration¶

Environment Variables¶

Variable	Default	Description
`REGISTRY_SERVICE_URL`	`http://localhost:8000`	Agent Registry URL
`SUBSCRIPTION_SERVICE_URL`	`http://localhost:8001`	Subscription Service URL
`ROUTER_TIMEOUT_SECONDS`	`30`	Default request timeout
`ROUTER_RETRY_ATTEMPTS`	`3`	Max retry attempts
`ROUTER_MAX_AGENTS_TO_TRY`	`3`	Max agents to try per request
`EVENT_FANOUT_TIMEOUT_SECONDS`	`30`	Event delivery timeout
`EVENT_MAX_PARALLEL_DELIVERIES`	`50`	Max parallel event deliveries
`ROUTER_AGENT_KEY`	`traylinx-router-agent`	Router's agent key

A2A (Agent-to-Agent) Authentication¶

Incoming Requests¶

The Router protects all endpoints with @require_a2a_auth:

from traylinx_auth_client import require_a2a_auth
from fastapi import APIRouter

router = APIRouter()

@router.post("/a2a/route")
@require_a2a_auth
async def route_request(request: RouteRequest):
    # Only authenticated agents reach here
    ...

Outgoing Requests¶

When calling other services, the Router uses get_request_headers():

from traylinx_auth_client import get_request_headers
import httpx

headers = get_request_headers()
response = await client.post(
    agent_url,
    headers=headers,
    json=payload
)

This automatically includes: - Authorization: Bearer {access_token} - X-Agent-Key: {agent_key} - X-Agent-Secret-Token: {agent_secret}

📁 Project Structure¶

traylinx_router_agent/
├── app/
│   ├── __init__.py
│   ├── main.py                    # FastAPI application
│   ├── config.py                  # Settings
│   ├── models.py                  # Request/Response models
│   ├── exceptions.py              # Custom exceptions
│   ├── dependencies.py            # Dependency injection
│   ├── routers/
│   │   ├── __init__.py
│   │   ├── health.py              # Health checks
│   │   ├── router.py              # Routing endpoint
│   │   └── events.py              # Event endpoint
│   └── services/
│       ├── __init__.py
│       ├── registry_client.py     # Registry API client
│       ├── subscription_client.py # Subscription API client
│       ├── agent_client.py        # Agent communication
│       ├── routing_service.py     # Routing logic
│       ├── event_service.py       # Event fan-out logic
│       └── metrics.py             # Metrics collection
├── tests/
│   ├── __init__.py
│   ├── test_router.py             # Unit tests
│   └── integration_test.py        # Integration tests
├── pyproject.toml                 # Dependencies
├── poetry.lock                    # Locked dependencies
├── README.md                      # This file
└── API_REFERENCE.md              # Detailed API docs

🧪 Testing¶

Unit Tests¶

poetry run pytest tests/test_router.py -v

Integration Tests¶

Requires all services running:

# Terminal 1: Start Registry
cd ../traylinx_agent_registry
poetry run uvicorn app.main:app --port 8000

# Terminal 2: Start Subscription Service
cd ../traylinx_subscription_service
poetry run uvicorn app.main:app --port 8001

# Terminal 3: Start Router
cd ../traylinx_router_agent
poetry run uvicorn app.main:app --port 8080

# Terminal 4: Run tests
poetry run pytest tests/integration_test.py -v

🐛 Troubleshooting¶

Router Won't Start¶

# Check dependencies
poetry install

# Check Python version
python --version  # Should be 3.9+

# Run with debug logging
LOG_LEVEL=DEBUG poetry run uvicorn app.main:app --port 8080

Registry Connection Issues¶

# Test Registry connectivity
curl http://localhost:8000/health

# Check REGISTRY_SERVICE_URL
echo $REGISTRY_SERVICE_URL

Subscription Service Connection Issues¶

# Test Subscription Service connectivity
curl http://localhost:8001/health

# Check SUBSCRIPTION_SERVICE_URL
echo $SUBSCRIPTION_SERVICE_URL

Routing Failures¶

Common issues: 1. No agents found: Check Registry has agents with matching capabilities 2. All agents failed: Check target agents are running and healthy 3. Authentication errors: Verify A2A auth setup

Check logs for detailed error messages.

Performance¶

Routing Performance¶

Average Latency: ~200-300ms (Registry lookup + agent call)
P95 Latency: ~500ms
P99 Latency: ~1000ms
Throughput: 100+ requests/second (single instance)

Event Publishing Performance¶

Fan-out Latency: ~50-200ms for 10 subscribers
Parallel Deliveries: Up to 50 concurrent
Throughput: 50+ events/second (single instance)

Optimization Tips¶

Use caching: Consider caching Registry responses
Tune timeouts: Adjust based on agent response times
Scale horizontally: Run multiple Router instances
Monitor metrics: Watch /metrics for bottlenecks

🚦 Production Checklist¶

Before deploying to production:

[ ] Configure production service URLs
[ ] Set appropriate timeouts
[ ] Configure log aggregation
[ ] Add Prometheus metrics
[ ] Set up monitoring alerts
[ ] Load test routing and events
[ ] Configure container orchestration
[ ] Set up horizontal pod autoscaling
[ ] Review retry and timeout settings
[ ] Test failure scenarios

API Reference: API_REFERENCE.md
Agent Registry: ../traylinx_agent_registr../index.md
Subscription Service: ../traylinx_subscription_servic../index.md
Ecosystem Architecture: ../TRAYLINX_API_DOCUMENTATION.md
Development Status: ../TRAYLINX_DEVELOPMENT_STATUS.md
Sprint 2 Summary: ../SPRINT_2_COMPLETE.md

🔄 Version History¶

v2.0.0 (Sprint 2)¶

✅ Added event publishing endpoint
✅ Integrated Subscription Service client
✅ Implemented parallel event fan-out
✅ Added event metrics
✅ Updated health checks

v1.0.0 (Sprint 1)¶

✅ Capability-based routing
✅ Registry integration
✅ Retry logic
✅ Performance tracking
✅ A2A authentication

🤝 Support¶

For issues or questions: 1. Check logs for error messages 2. Verify Registry and Subscription Service are healthy 3. Test A2A authentication 4. Review configuration settings

📄 License¶

[Your License Here]

Built with: FastAPI, httpx, traylinx_auth_client, Pydantic v2

Traylinx Router Agent¶

Overview¶

🎯 Key Features¶

Request Routing (Sprint 1)¶

Event Publishing (Sprint 2)¶

Technical Features¶

Architecture¶

Request Routing Flow¶

Event Publishing Flow¶

Quick Start¶

Prerequisites¶

1. Install Dependencies¶

2. Configure Environment¶

3. Start Router¶

4. Verify Health¶

5. View API Docs¶

API Reference¶

Base URL¶

Authentication¶

🔀 1. Request Routing¶

Request¶

Response (Success)¶

Response (Failure)¶

Error Scenarios¶

Routing Behavior¶

📡 2. Event Publishing¶

Request¶

Response (Success)¶

Response (Partial Failure)¶

Response (No Subscribers)¶

Event Publishing Behavior¶

Event Delivery¶

🏥 Health & Monitoring¶

Health Endpoints¶

Liveness Probe¶

Readiness Probe¶

Metrics¶

Configuration¶

Environment Variables¶

A2A (Agent-to-Agent) Authentication¶

Incoming Requests¶

Outgoing Requests¶

📁 Project Structure¶

🧪 Testing¶

Unit Tests¶

Integration Tests¶

🐛 Troubleshooting¶

Router Won't Start¶

Registry Connection Issues¶

Subscription Service Connection Issues¶

Routing Failures¶

Performance¶

Routing Performance¶

Event Publishing Performance¶

Optimization Tips¶

🚦 Production Checklist¶

Related Documentation¶

🔄 Version History¶

v2.0.0 (Sprint 2)¶

v1.0.0 (Sprint 1)¶

🤝 Support¶

📄 License¶