NAT Traversal & Circuit Relay v2¶
Last Updated: January 2026
Stargate Version: v0.2.0+
Overview¶
Traylinx Stargate v0.2.0 introduces comprehensive NAT traversal capabilities, enabling agents behind firewalls and NAT to communicate reliably through Circuit Relay v2.
The NAT Problem¶
Most agents run behind Network Address Translation (NAT), which prevents direct peer-to-peer connections:
Agent A (NAT) ❌ Cannot connect directly ❌ Agent B (NAT)
│ │
│ ✅ Solution: Relay Node ✅ │
│ │ │
└─────────────────────┴────────────────────────┘
Common NAT Scenarios: - Home networks behind routers - Corporate networks with firewalls - Cloud instances with security groups - Mobile devices on cellular networks
Circuit Relay v2¶
Circuit Relay v2 is a libp2p protocol that enables peers to communicate through public relay nodes.
Architecture¶
┌─────────────────────────────────────────────────────────────┐
│ CIRCUIT RELAY v2 FLOW │
└─────────────────────────────────────────────────────────────┘
Peer A (NAT) Relay Node Peer B (NAT)
│ (Public IP) │
│ │ │
│ 1. Connect to relay │ │
│────────────────────────▶│ │
│ │ 2. Connect to relay │
│ │◀───────────────────────│
│ │ │
│ 3. Request relay to B │ │
│────────────────────────▶│ │
│ │ 4. Establish circuit │
│ │───────────────────────▶│
│ │ │
│ 5. Data flows through relay │
│◀────────────────────────┼───────────────────────▶│
Connection Priority¶
Stargate uses a 3-tier fallback strategy:
- Direct P2P (Lowest latency) - Attempts direct connection first
- Circuit Relay v2 (Medium latency) - Falls back to relay if direct fails
- NATS Relay (Highest reliability) - Final fallback for maximum compatibility
Features¶
1. Automatic NAT Detection¶
Stargate automatically detects your NAT configuration:
from traylinx_stargate import StarGateNode
node = StarGateNode(display_name="my-agent")
await node.start()
# Check NAT status
status = node.get_status()
print(f"NAT Type: {status['nat_status']['nat_type']}")
print(f"Public Addresses: {status['nat_status']['public_addrs']}")
print(f"Requires Relay: {status['nat_status']['requires_relay']}")
NAT Types Detected:
- public - Direct internet connection
- private - Behind NAT (10.x, 172.16-31.x, 192.168.x)
- symmetric - Symmetric NAT (requires relay)
- unknown - Unable to determine
2. Connection Pooling¶
Reuses connections to frequently-contacted peers:
# Connections are automatically pooled
result1 = await node.call("peer_id", "method1", {})
result2 = await node.call("peer_id", "method2", {}) # Reuses connection
# Pool configuration
transport._connection_pool.get_stats()
# {'size': 5, 'max_size': 50, 'ttl': 300.0}
Benefits: - Reduced latency for repeated calls - Lower network overhead - Automatic connection cleanup
3. Connection Retry with Exponential Backoff¶
Automatic retry for transient failures:
# Retry is automatic with configurable parameters
result = await node.call(
"peer_id",
"method",
payload,
max_retries=3, # Default: 3
backoff_factor=2.0 # Default: 2.0 (exponential)
)
Retry Schedule: - Attempt 1: Immediate - Attempt 2: After 2 seconds - Attempt 3: After 4 seconds - Attempt 4: After 8 seconds
4. Comprehensive Metrics¶
Track connection performance:
# Get metrics for a specific peer
metrics = node.transport.get_metrics("peer_id")
print(f"Connection Type: {metrics.connection_type}") # direct/relay/nats
print(f"Latency: {metrics.latency_ms}ms")
print(f"Success Rate: {metrics.success_rate * 100}%")
print(f"Total Requests: {metrics.total_requests}")
print(f"Failed Requests: {metrics.failed_requests}")
5. Relay Health Monitoring¶
Automatic health checks for relay nodes:
# Health checks run every 5 seconds
# Failures detected within 10 seconds
# Get relay health status
health = node.transport.get_relay_health_status()
for relay, status in health.items():
print(f"Relay: {relay}")
print(f" Healthy: {status['is_healthy']}")
print(f" Consecutive Failures: {status['consecutive_failures']}")
print(f" Last Success: {status['last_success']}")
Using Circuit Relay v2¶
Quick Start¶
import asyncio
from traylinx_stargate import StarGateNode
async def main():
# Create node with libp2p transport
node = StarGateNode(
display_name="my-agent",
transport="libp2p"
)
await node.start()
# Enable Circuit Relay v2 (uses default Traylinx relays)
await node.transport.enable_circuit_relay_v2()
# Now you can connect to peers through relays
result = await node.call("peer_id", "ping", {"message": "hello"})
print(result)
await node.stop()
asyncio.run(main())
Custom Relay Nodes¶
Use your own relay infrastructure:
# Configure custom relay nodes
custom_relays = [
"/ip4/203.0.113.42/tcp/4001/p2p/QmRelay1...",
"/ip4/203.0.113.43/tcp/4001/p2p/QmRelay2...",
]
await node.transport.enable_circuit_relay_v2(relay_addrs=custom_relays)
Running a Relay Node¶
Deploy your own relay node for private networks:
# Start a relay node
traylinx stargate relay --port 4001 --max-connections 1000
# The relay will display its multiaddr:
# Relay node started: QmYourRelayPeerID...
# Multiaddrs: ['/ip4/203.0.113.42/tcp/4001/p2p/QmYourRelayPeerID...']
See Also: Relay Node Deployment Guide
Default Relay Nodes¶
Stargate includes default relay nodes operated by Traylinx:
| Relay | Address | Location | Status |
|---|---|---|---|
| relay1.traylinx.io | /ip4/relay1.traylinx.io/tcp/4001/p2p/QmRelay1... |
US East | Planned |
| relay2.traylinx.io | /ip4/relay2.traylinx.io/tcp/4001/p2p/QmRelay2... |
EU West | Planned |
Note: Default relay addresses are placeholders and will be updated when production relays are deployed.
Performance Characteristics¶
Latency Comparison¶
| Connection Type | Typical Latency | Use Case |
|---|---|---|
| Direct P2P | 5-50ms | Same network or public IPs |
| Circuit Relay v2 | 50-200ms | NAT traversal required |
| NATS Relay | 100-300ms | Maximum compatibility |
Resource Usage¶
Connection Pool: - Default: 50 connections max - TTL: 300 seconds (5 minutes) - Memory: ~2-4 MB per connection
Relay Health Checks: - Interval: 5 seconds - Failure detection: < 10 seconds - Bandwidth: Minimal (~1 KB/check)
Troubleshooting¶
Connection Failures¶
Symptom: Cannot connect to peer
Diagnosis:
# Check NAT status
status = node.get_status()
print(status['nat_status'])
# Check relay status
if node.transport._circuit_relay_enabled:
print(f"Relays: {node.transport._connected_relays}")
else:
print("Circuit Relay v2 not enabled")
Solutions:
1. Enable Circuit Relay v2: await node.transport.enable_circuit_relay_v2()
2. Check relay health: node.transport.get_relay_health_status()
3. Verify firewall allows outbound connections on port 4001
High Latency¶
Symptom: Slow response times
Diagnosis:
# Check connection metrics
metrics = node.transport.get_metrics("peer_id")
print(f"Connection Type: {metrics.connection_type}")
print(f"Latency: {metrics.latency_ms}ms")
Solutions: 1. If using relay, deploy geographically closer relay nodes 2. Check network connectivity 3. Consider direct connection if both peers have public IPs
Relay Failures¶
Symptom: Relay connections failing
Diagnosis:
# Check relay health
health = node.transport.get_relay_health_status()
for relay, status in health.items():
if not status['is_healthy']:
print(f"Unhealthy relay: {relay}")
print(f"Failures: {status['consecutive_failures']}")
Solutions: 1. Configure multiple relay nodes for redundancy 2. Deploy your own relay nodes 3. Check relay node logs for issues
Best Practices¶
1. Use Multiple Relays¶
Configure at least 2-3 relay nodes for redundancy:
relays = [
"/ip4/relay1.example.com/tcp/4001/p2p/QmRelay1...",
"/ip4/relay2.example.com/tcp/4001/p2p/QmRelay2...",
"/ip4/relay3.example.com/tcp/4001/p2p/QmRelay3...",
]
await node.transport.enable_circuit_relay_v2(relay_addrs=relays)
2. Monitor Connection Metrics¶
Track performance over time:
# Periodically check metrics
all_metrics = node.transport.get_metrics()
for peer_id, metrics in all_metrics.items():
if metrics.success_rate < 0.9: # Less than 90% success
print(f"Warning: Low success rate for {peer_id}")
3. Handle Connection Failures Gracefully¶
Implement retry logic in your application:
async def call_with_fallback(node, peer_id, method, payload):
try:
return await node.call(peer_id, method, payload)
except ConnectionError:
# Try alternative peer or method
return await fallback_method(payload)
4. Deploy Relay Nodes for Production¶
For production deployments, run your own relay infrastructure:
- Deploy in multiple geographic regions
- Use high-bandwidth servers (1+ Gbps)
- Monitor relay health and performance
- Configure appropriate connection limits
See: Relay Node Deployment Guide
Security Considerations¶
Relay Node Trust¶
Relay nodes can see: - Connection metadata (who is connecting to whom) - Message sizes and timing
Relay nodes cannot see: - Message content (encrypted end-to-end) - Private keys or identities
Recommendation: Only use trusted relay nodes or deploy your own.
End-to-End Encryption¶
All messages are encrypted regardless of connection type:
# Messages are automatically encrypted with Noise protocol
# No additional configuration needed
result = await node.call("peer_id", "method", {"sensitive": "data"})
Rate Limiting¶
Relay nodes enforce connection limits:
API Reference¶
enable_circuit_relay_v2(relay_addrs=None, hop_enabled=False)¶
Enable Circuit Relay v2 for NAT traversal.
Parameters:
- relay_addrs (list[str] | None): List of relay node multiaddrs. If None, uses default Traylinx relay nodes.
- hop_enabled (bool): If True, this node can act as a relay for others. Default: False
Returns:
- bool: True if relay was successfully enabled
Raises:
- ConnectionError: If not connected to the network
get_metrics(peer_id=None)¶
Get connection metrics for one or all peers.
Parameters:
- peer_id (str | None): Specific peer ID. If None, returns all metrics.
Returns:
- dict[str, ConnectionMetrics] or ConnectionMetrics or None
get_relay_health_status(relay_addr=None)¶
Get health status for relay nodes.
Parameters:
- relay_addr (str | None): Specific relay address. If None, returns all relay health status.
Returns:
- dict: Health status information
See Also¶
- Stargate Architecture - System design overview
- Protocols - A2A messaging specification
- Security - Authentication and encryption
- Relay Node Deployment Guide - Deploy your own relays