Threat Model

This document describes the AgentWeave SDK threat model: what assets we protect, who we defend against, what attack vectors we consider, and how we mitigate threats.

Table of Contents

  1. Threat Model
    1. Assets to Protect
      1. Primary Assets
      2. Secondary Assets
    2. Trust Boundaries
      1. Boundary 1: Agent Process
      2. Boundary 2: Network Communication
      3. Boundary 3: Authorization Decision
      4. Boundary 4: External Systems
    3. Threat Actors
      1. Tier 1: Opportunistic Attackers
      2. Tier 2: Skilled Attackers
      3. Tier 3: Advanced Persistent Threats (APT)
      4. Tier 4: Insider Threats
    4. STRIDE Analysis
      1. Spoofing Identity
      2. Tampering with Data
      3. Repudiation
      4. Information Disclosure
      5. Denial of Service (DoS)
      6. Elevation of Privilege
    5. Attack Vectors
      1. 1. Network-Based Attacks
      2. 2. Identity-Based Attacks
      3. 3. Authorization Attacks
      4. 4. Application-Level Attacks
      5. 5. Operational Attacks
    6. Residual Risks
      1. 1. SPIRE Infrastructure Compromise
      2. 2. Policy Misconfiguration
      3. 3. Application Logic Vulnerabilities
      4. 4. Distributed Denial of Service
      5. 5. Zero-Day Exploits
      6. 6. Insider Threats with Legitimate Access
    7. Assumptions
    8. Threat Model Maintenance
    9. References
    10. Next Steps

Assets to Protect

Primary Assets

  1. Agent Identity Credentials (SVIDs)
    • X.509 certificates issued by SPIRE
    • Private keys associated with identities
    • SPIFFE IDs (identity URIs)
    • Impact if compromised: Impersonation attacks, unauthorized access
  2. Authorization Policies
    • OPA policy files (Rego)
    • Policy data (allowlists, capabilities)
    • Impact if compromised: Unauthorized access, privilege escalation
  3. Agent Communication
    • Request/response payloads
    • Task parameters and results
    • Context data passed between agents
    • Impact if compromised: Data leakage, tampering
  4. Configuration Data
    • Agent configurations (YAML)
    • SPIRE connection details
    • OPA endpoints
    • Impact if compromised: System misconfiguration, security bypass attempts
  5. Audit Logs
    • Authorization decisions
    • Request traces
    • Security events
    • Impact if compromised: Loss of forensic evidence, compliance violations

Secondary Assets

  1. Agent Availability
    • Uptime and responsiveness
    • Resource capacity
    • Impact if compromised: Denial of service
  2. System Integrity
    • Agent binary and dependencies
    • Configuration immutability
    • Impact if compromised: Code execution, backdoors

Trust Boundaries

Boundary 1: Agent Process

1
2
3
4
5
6
7
8
9
10
11
12
13
┌─────────────────────────────────────────────┐
│         Agent Process (Trusted)             │
│  • Agent code                               │
│  • AgentWeave SDK                           │
│  • Python runtime                           │
└─────────────────────────────────────────────┘
              │ SPIFFE Workload API
              │ (Unix domain socket)
┌─────────────▼─────────────────────────────┐
│      SPIRE Agent (Trusted)                │
│  • Issues SVIDs                           │
│  • Validates workload identity            │
└───────────────────────────────────────────┘

Crossing this boundary:

  • Agent fetches SVIDs via Unix socket (local only)
  • Attestation proves agent identity to SPIRE
  • Trust established through platform-specific attestation

Boundary 2: Network Communication

1
2
3
4
5
6
7
8
9
10
11
12
13
14
┌─────────────────┐                  ┌─────────────────┐
│   Agent A       │                  │   Agent B       │
│   (Trusted)     │                  │   (Untrusted)   │
└────────┬────────┘                  └────────┬────────┘
         │                                    │
         │  mTLS Handshake                    │
         │  • Verify peer SVID                │
         │  • Validate trust domain           │
         │◄───────────────────────────────────►│
         │                                    │
         │  Encrypted Communication           │
         │  • TLS 1.3                         │
         │  • Mutual authentication           │
         │◄───────────────────────────────────►│

Crossing this boundary:

  • mTLS handshake with peer verification
  • Trust domain validation
  • All data encrypted in transit
  • Authorization check before processing request

Boundary 3: Authorization Decision

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
┌─────────────────────────────────────────────┐
│           Agent Runtime (Trusted)           │
│                                             │
│  Request received ──────┐                  │
│                         │                  │
│                         ▼                  │
│              ┌──────────────────┐          │
│              │  OPA Authorization│          │
│              │  (Policy Engine) │          │
│              └──────────────────┘          │
│                         │                  │
│                         ▼                  │
│            Allow / Deny Decision           │
│                         │                  │
│         ┌───────────────┴────────────┐     │
│         ▼                            ▼     │
│   Execute Capability          Log & Reject │
└─────────────────────────────────────────────┘

Crossing this boundary:

  • Every capability invocation checked
  • Default deny policy enforced
  • Authorization context evaluated
  • Decision logged for audit

Boundary 4: External Systems

1
2
3
4
5
6
7
8
9
10
11
┌─────────────────┐
│   Agent         │
└────────┬────────┘
         │
         │ (Application-level)
         ▼
┌─────────────────┐
│  External API   │
│  Database       │
│  Service        │
└─────────────────┘

Crossing this boundary:

  • Agent is responsible for authentication to external systems
  • AgentWeave provides secure context but not external auth
  • Application must implement additional security controls

Threat Actors

Tier 1: Opportunistic Attackers

  • Motivation: Casual exploitation, automated attacks
  • Capabilities: Script kiddies, automated scanners
  • Mitigations: Default security, secure defaults, input validation

Tier 2: Skilled Attackers

  • Motivation: Targeted data theft, service disruption
  • Capabilities: Exploit development, social engineering
  • Mitigations: Defense in depth, monitoring, incident response

Tier 3: Advanced Persistent Threats (APT)

  • Motivation: Long-term access, espionage, sabotage
  • Capabilities: Zero-day exploits, supply chain attacks, insider threats
  • Mitigations: Comprehensive defense, threat hunting, security audits

Tier 4: Insider Threats

  • Motivation: Malicious insiders, compromised credentials
  • Capabilities: Legitimate access, insider knowledge
  • Mitigations: Least privilege, audit logging, separation of duties

STRIDE Analysis

Spoofing Identity

Threats:

  • Attacker impersonates legitimate agent
  • Stolen or forged credentials
  • SVID replay attacks

Mitigations:

  • ✅ SPIFFE cryptographic identity (cannot forge without private key)
  • ✅ Platform attestation prevents identity theft
  • ✅ Short-lived SVIDs (1 hour TTL) limit exposure
  • ✅ Automatic rotation prevents replay
  • ✅ Mutual TLS verifies both parties
  • ✅ Trust domain validation prevents cross-domain attacks

Residual Risk: Low - requires compromise of SPIRE server or node

Tampering with Data

Threats:

  • Man-in-the-middle (MITM) attacks
  • Message modification in transit
  • Request payload tampering

Mitigations:

  • ✅ TLS 1.3 encryption protects data in transit
  • ✅ Perfect forward secrecy (PFS) prevents decryption of past sessions
  • ✅ Message integrity via TLS MAC
  • ✅ Mutual authentication prevents MITM
  • ✅ Input validation at application layer

Residual Risk: Low - requires breaking TLS 1.3 cryptography

Repudiation

Threats:

  • Agent denies performing action
  • Caller denies making request
  • No proof of authorization decision

Mitigations:

  • ✅ Comprehensive audit logging with SPIFFE IDs
  • ✅ Request traces with distributed trace IDs
  • ✅ Authorization decisions logged
  • ✅ Immutable audit logs (when using write-once storage)
  • ✅ Timestamps and correlation IDs

Residual Risk: Low - full audit trail maintained

Information Disclosure

Threats:

  • Eavesdropping on communication
  • Log exposure with sensitive data
  • Error messages leak system info
  • Unauthorized access to capabilities

Mitigations:

  • ✅ TLS 1.3 encryption for all communication
  • ✅ No plaintext fallback possible
  • ✅ Authorization enforced before data access
  • ✅ Secure error handling (no stack traces to callers)
  • ✅ Audit logs configurable to redact sensitive fields
  • ✅ Network policies restrict communication paths

Residual Risk: Medium - application must protect data in use

Denial of Service (DoS)

Threats:

  • Request flooding
  • Resource exhaustion
  • SVID rotation failures
  • OPA policy evaluation overload

Mitigations:

  • ✅ Rate limiting per capability
  • ✅ Circuit breakers prevent cascading failures
  • ✅ Request timeouts
  • ✅ Resource limits (memory, CPU)
  • ✅ Connection limits
  • ⚠️ Application-level DoS protection required

Residual Risk: Medium - distributed DoS harder to mitigate

Elevation of Privilege

Threats:

  • Bypass authorization checks
  • Exploit policy logic bugs
  • Gain unauthorized capabilities
  • Privilege escalation through federation

Mitigations:

  • ✅ Default deny authorization
  • ✅ OPA policy enforcement cannot be bypassed
  • ✅ Least privilege by default
  • ✅ Trust domain isolation
  • ✅ Capability-level granularity
  • ✅ No security bypass flags or configs
  • ✅ Policy testing framework

Residual Risk: Low - requires policy misconfiguration

Attack Vectors

1. Network-Based Attacks

MITM Attack on Agent Communication

Attack: Attacker intercepts traffic between agents

Mitigations:

  • mTLS required for all communication
  • Peer verification prevents MITM
  • Network policies restrict routing
  • No plaintext fallback

Likelihood: Very Low Impact: High Risk: Low

TLS Downgrade Attack

Attack: Force use of weaker TLS version

Mitigations:

  • TLS 1.3 enforced (min version)
  • No version negotiation allowed
  • Cipher suite hardening
  • Secure renegotiation disabled

Likelihood: Very Low Impact: High Risk: Low

2. Identity-Based Attacks

SVID Theft

Attack: Steal SVID from agent process

Mitigations:

  • SVIDs stored in memory only
  • Platform attestation required for issuance
  • Short TTL (1 hour)
  • Process isolation
  • Read-only root filesystem

Likelihood: Low Impact: High Risk: Medium

SPIRE Server Compromise

Attack: Compromise SPIRE server to issue fraudulent SVIDs

Mitigations:

  • SPIRE server should run in isolated environment
  • Database encryption
  • Server authentication
  • Monitoring and alerting
  • Regular security audits

Likelihood: Low Impact: Critical Risk: High (external to SDK)

3. Authorization Attacks

Policy Bypass

Attack: Circumvent OPA policy evaluation

Mitigations:

  • Policy enforcement in SDK code path (cannot bypass)
  • No configuration to disable authorization
  • Complete mediation (all requests checked)
  • Default deny

Likelihood: Very Low Impact: Critical Risk: Low

Policy Logic Bugs

Attack: Exploit bugs in Rego policies

Mitigations:

  • Policy testing framework
  • Example policies provided
  • Policy review best practices
  • OPA built-in functions (audited)

Likelihood: Medium Impact: High Risk: Medium

4. Application-Level Attacks

Malicious Capability Implementation

Attack: Developer writes capability that leaks data

Mitigations:

  • ⚠️ Application responsibility
  • Code review required
  • Context isolation per request
  • Audit logging of all invocations

Likelihood: Medium Impact: High Risk: Medium (application-level)

Dependency Vulnerabilities

Attack: Exploit vulnerability in Python dependencies

Mitigations:

  • Regular dependency scanning
  • Pin dependency versions
  • Security update process
  • Minimal dependencies

Likelihood: Medium Impact: Varies Risk: Medium

5. Operational Attacks

Configuration Tampering

Attack: Modify agent configuration to weaken security

Mitigations:

  • Configuration validation at startup
  • Immutable config (ConfigMap in K8s)
  • No flags to disable security
  • Config file checksums

Likelihood: Low Impact: High Risk: Medium

Log Tampering

Attack: Modify audit logs to hide attacks

Mitigations:

  • Write-once storage (recommended)
  • Send logs to SIEM immediately
  • Log signing (optional)
  • Access controls on log files

Likelihood: Medium Impact: Medium Risk: Medium

Residual Risks

Even with all mitigations, some risks remain:

1. SPIRE Infrastructure Compromise

Risk: If SPIRE server is compromised, attacker can issue arbitrary SVIDs

Acceptance Rationale:

  • SPIRE security is external to SDK
  • Follow SPIRE production deployment guide
  • Use HA setup with monitoring
  • Regular security audits

2. Policy Misconfiguration

Risk: Overly permissive OPA policies grant excessive access

Acceptance Rationale:

  • Policy testing is operator responsibility
  • Provide policy examples and best practices
  • Policy review tooling recommended
  • Start with restrictive policies, expand as needed

3. Application Logic Vulnerabilities

Risk: Bugs in capability implementation compromise security

Acceptance Rationale:

  • Application code is outside SDK scope
  • SDK provides secure primitives
  • Audit logging enables detection
  • Secure coding guidelines provided

4. Distributed Denial of Service

Risk: Coordinated attack from many sources overwhelms agent

Acceptance Rationale:

  • DDoS mitigation is infrastructure concern
  • Use cloud provider DDoS protection
  • Rate limiting reduces impact
  • Circuit breakers prevent cascading failures

5. Zero-Day Exploits

Risk: Unknown vulnerabilities in dependencies (Python, OpenSSL, etc.)

Acceptance Rationale:

  • Keep dependencies updated
  • Subscribe to security advisories
  • Incident response plan required
  • Defense in depth limits blast radius

6. Insider Threats with Legitimate Access

Risk: Malicious insider with valid credentials abuses access

Acceptance Rationale:

  • Least privilege limits damage
  • Comprehensive audit logging enables detection
  • Separation of duties recommended
  • Anomaly detection can flag suspicious behavior

Assumptions

The threat model makes these assumptions:

  1. SPIRE is Securely Deployed
    • SPIRE server is protected and monitored
    • Platform attestation is trustworthy
    • Node agents are authentic
  2. Platform is Trusted
    • Kubernetes/OS is not compromised
    • Container runtime is secure
    • Host OS kernel is trustworthy
  3. Network Infrastructure
    • Network stack is reliable
    • DNS is not poisoned
    • Time synchronization is accurate (for cert validation)
  4. Cryptography is Sound
    • TLS 1.3 implementation is correct
    • X.509 certificate validation works
    • Random number generators are secure
  5. Operator Competence
    • Security best practices followed
    • Regular updates applied
    • Monitoring and alerting configured
    • Incident response plan exists

Threat Model Maintenance

This threat model should be reviewed:

  • Quarterly: Routine review of new threats
  • When: New features are added to SDK
  • When: New attack vectors are published
  • When: Security incidents occur

References

Next Steps