Deployment Overview

This section covers deploying AgentWeave agents to various production environments, from containers to cloud platforms.

Deployment Options

AgentWeave supports multiple deployment strategies to fit your infrastructure:

Deployment Target Best For Complexity Documentation
Docker Development, simple deployments Low Docker Guide
Kubernetes Production, scalable deployments Medium Kubernetes Guide
Helm Kubernetes with templating Medium Helm Guide
AWS AWS-native deployments Medium-High AWS Guide
GCP GCP-native deployments Medium-High GCP Guide
Azure Azure-native deployments Medium-High Azure Guide
Hybrid/Multi-Cloud Cross-cloud architectures High Hybrid Guide

Prerequisites

Before deploying AgentWeave agents, ensure you have:

Required Infrastructure

  1. SPIRE Infrastructure
    • SPIRE Server (central identity authority)
    • SPIRE Agent (on each node/container)
    • Configured trust domain
    • Registration entries for workloads
  2. OPA Infrastructure
    • OPA server or sidecar
    • Authorization policies loaded
    • Policy bundles (optional)
  3. Network Connectivity
    • mTLS-capable network
    • DNS or service discovery
    • Firewall rules for agent communication

Optional Components

  • Tailscale: For simplified cross-cloud networking
  • Prometheus: For metrics collection
  • Jaeger/Zipkin: For distributed tracing
  • ELK/Loki: For centralized logging

Architecture Patterns

Each agent pod runs with SPIRE and OPA sidecars:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
┌─────────────────────────────────┐
│         Agent Pod               │
│  ┌──────────────────────────┐   │
│  │  Agent Container         │   │
│  │  (Your Application)      │   │
│  └──────────┬───────────────┘   │
│             │                   │
│  ┌──────────┴───────────────┐   │
│  │  SPIRE Agent Sidecar     │   │
│  └──────────┬───────────────┘   │
│             │                   │
│  ┌──────────┴───────────────┐   │
│  │  OPA Sidecar             │   │
│  └──────────────────────────┘   │
└─────────────────────────────────┘

Pros:

  • Strong isolation between agents
  • Independent scaling per agent
  • Minimal shared infrastructure

Cons:

  • Higher resource usage
  • More complex configuration

Pattern 2: DaemonSet Model

SPIRE Agent runs as DaemonSet, shared by all agents on node:

1
2
3
4
5
6
7
8
9
10
11
12
13
┌─────────────────────────────────┐
│           Node                  │
│  ┌──────────┐  ┌──────────┐     │
│  │ Agent 1  │  │ Agent 2  │     │
│  │  + OPA   │  │  + OPA   │     │
│  └────┬─────┘  └─────┬────┘     │
│       │              │          │
│       └──────┬───────┘          │
│              │                  │
│  ┌───────────┴──────────────┐   │
│  │  SPIRE Agent (DaemonSet) │   │
│  └──────────────────────────┘   │
└─────────────────────────────────┘

Pros:

  • Efficient resource usage
  • Simplified SPIRE management
  • Faster pod startup

Cons:

  • SPIRE Agent is shared resource
  • Node-level failure affects all agents

Pattern 3: Centralized OPA

Central OPA cluster for all authorization decisions:

1
2
3
4
5
6
7
8
9
10
┌──────────┐  ┌──────────┐  ┌──────────┐
│ Agent 1  │  │ Agent 2  │  │ Agent 3  │
└────┬─────┘  └─────┬────┘  └─────┬────┘
     │              │              │
     └──────────────┼──────────────┘
                    │
         ┌──────────┴───────────┐
         │   OPA Cluster        │
         │  (HA, Load Balanced) │
         └──────────────────────┘

Pros:

  • Centralized policy management
  • Reduced per-agent overhead
  • Easier policy updates

Cons:

  • Network dependency for authz
  • Potential bottleneck
  • Higher latency

Security Considerations

Production Security Checklist

Before deploying to production, verify:

  • SPIRE trust domain is properly configured
  • SPIRE registration entries use restrictive selectors
  • OPA policies default to deny
  • TLS 1.3 is enforced (minimum 1.2)
  • Peer verification is set to "strict"
  • Audit logging is enabled
  • Secrets are stored in vault/secret manager
  • Network policies restrict agent communication
  • Resource limits are set on containers
  • Health checks are configured
  • Monitoring and alerting are active

Common Security Pitfalls

  1. Overly Permissive SPIRE Selectors
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    
    # BAD: Too broad
    spire-server entry create \
      -spiffeID spiffe://example.com/agent/my-agent \
      -selector k8s:ns:default
    
    # GOOD: Specific
    spire-server entry create \
      -spiffeID spiffe://example.com/agent/my-agent \
      -selector k8s:ns:production \
      -selector k8s:sa:my-agent \
      -selector k8s:pod-label:app:my-agent
    
  2. Weak OPA Policies
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    # BAD: Allows everything
    package agentweave.authz
    default allow = true
    
    # GOOD: Explicit allow rules
    package agentweave.authz
    default allow = false
    
    allow {
        input.caller_spiffe_id == "spiffe://example.com/agent/orchestrator"
        input.action == "search"
    }
    
  3. Disabling Peer Verification
    1
    2
    3
    4
    5
    6
    7
    
    # BAD: Don't do this in production
    transport:
      peer_verification: "none"
    
    # GOOD: Always verify
    transport:
      peer_verification: "strict"
    

Environment-Specific Guides

Choose your deployment target:

Container Platforms

Cloud Providers

Advanced Scenarios

Quick Start by Environment

Local Development (5 minutes)

1
2
3
4
5
6
7
8
9
# Clone starter template
git clone https://github.com/aj-geddes/agentweave-starter.git
cd agentweave-starter

# Start infrastructure
docker-compose up -d

# Deploy your agent
docker-compose up agent-search

See Docker Guide for details.

Kubernetes Production (30 minutes)

1
2
3
4
5
6
7
8
9
10
# Install SPIRE
kubectl apply -f https://github.com/aj-geddes/agentweave/releases/latest/download/spire.yaml

# Install OPA
kubectl apply -f https://github.com/aj-geddes/agentweave/releases/latest/download/opa.yaml

# Deploy agent with Helm
helm install my-agent agentweave/agentweave \
  --set agent.name=my-agent \
  --set spiffe.trustDomain=example.com

See Kubernetes Guide for details.

AWS EKS (1 hour)

1
2
3
4
5
6
7
8
9
10
# Create EKS cluster with Terraform
terraform apply

# Install SPIRE with OIDC federation
kubectl apply -f manifests/spire-aws.yaml

# Deploy agent with IAM roles
helm install my-agent agentweave/agentweave \
  --set cloud.provider=aws \
  --set aws.iamRole=arn:aws:iam::123456789012:role/my-agent

See AWS Guide for details.

Resource Requirements

Minimum Requirements (Per Agent)

Component CPU Memory Storage
Agent Container 100m 128Mi 1Gi
SPIRE Agent (sidecar) 50m 64Mi 100Mi
OPA (sidecar) 50m 64Mi 100Mi
Total 200m 256Mi ~1.2Gi
Component CPU Memory Storage
Agent Container 500m 512Mi 10Gi
SPIRE Agent (sidecar) 100m 128Mi 1Gi
OPA (sidecar) 100m 128Mi 1Gi
Total 700m 768Mi ~12Gi

Scaling Considerations

  • Horizontal Scaling: Add more agent replicas for throughput
  • Vertical Scaling: Increase resources for complex workloads
  • SPIRE Server: 2-4 CPU, 4-8Gi memory for 1000+ workloads
  • OPA: Consider centralized cluster for 50+ agents

Monitoring and Observability

All deployment guides include setup for:

  • Metrics: Prometheus endpoints on port 9090
  • Traces: OpenTelemetry OTLP export
  • Logs: Structured JSON to stdout
  • Health Checks: Liveness and readiness probes

Example Prometheus scrape config:

1
2
3
4
5
6
7
8
9
10
11
scrape_configs:
  - job_name: 'agentweave-agents'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        regex: agentweave-agent
        action: keep
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true

Troubleshooting

Common deployment issues and solutions:

SPIRE Agent Connection Failed

1
2
3
4
5
6
7
8
# Check SPIRE Agent socket
kubectl exec -it my-agent-pod -c agent -- ls -l /run/spire/sockets/

# Verify SPIRE Agent is running
kubectl get pods -l app=spire-agent

# Check SPIRE Agent logs
kubectl logs -l app=spire-agent -f

OPA Policy Denied

1
2
3
4
5
6
7
8
9
10
11
# Test policy locally
agentweave authz check \
  --caller spiffe://example.com/agent/caller \
  --callee spiffe://example.com/agent/callee \
  --action search

# Check OPA logs
kubectl logs -l app=opa -f

# Validate policy syntax
opa test policies/

Agent Not Getting SVID

1
2
3
4
5
6
7
8
# Check SPIRE registration entry
spire-server entry show -spiffeID spiffe://example.com/agent/my-agent

# Verify selectors match pod
kubectl get pod my-agent-pod -o yaml | grep -A 10 metadata

# Check SPIRE Agent attestation
kubectl logs -l app=spire-agent | grep attestation

Next Steps

  1. Choose your deployment environment from the guides above
  2. Review Security Best Practices
  3. Set up Monitoring and Observability
  4. Plan your High Availability strategy

Related Documentation: