Audit Logging Guide

Audit logging is critical for security monitoring, compliance, incident response, and forensics. AgentWeave provides comprehensive audit logging for all security-relevant events.

Table of Contents

  1. Audit Logging Guide
    1. What Gets Logged
      1. 1. Authorization Decisions
      2. 2. Capability Invocations
      3. 3. Identity Events
      4. 4. Authentication Events
      5. 5. Security Events
      6. 6. Agent Lifecycle Events
    2. Audit Log Configuration
      1. Basic Configuration
      2. Log Levels
      3. Field Selection
      4. Payload Logging
    3. Log Destinations
      1. 1. File Destination
      2. 2. Syslog Destination
      3. 3. Cloud Logging
      4. 4. SIEM Integration
    4. Log Retention
      1. Retention Requirements
      2. Retention Configuration
      3. Archive to Cold Storage
    5. Log Analysis
      1. Common Queries
      2. Security Queries
    6. Alerting on Security Events
      1. Prometheus Alerts
      2. SIEM Alerts
    7. Compliance Reporting
      1. SOC 2 Audit Report
      2. HIPAA Access Report
      3. PCI DSS Cardholder Data Access
    8. Log Security
      1. Protect Log Files
      2. Encrypt Logs in Transit
      3. Sign Logs
      4. Immutable Storage
    9. Best Practices
      1. Do's
      2. Don'ts
    10. Troubleshooting
      1. Logs Not Appearing
      2. High Log Volume
    11. Summary
    12. Next Steps

What Gets Logged

AgentWeave logs security-relevant events at multiple layers:

1. Authorization Decisions

Every authorization check is logged:

1
2
3
4
5
6
7
8
9
10
11
12
13
{
  "timestamp": "2024-01-15T10:30:00.123Z",
  "level": "info",
  "event_type": "authorization",
  "caller_spiffe_id": "spiffe://example.com/agent/api-gateway/prod",
  "callee_spiffe_id": "spiffe://example.com/agent/data-processor/prod",
  "capability": "process_data",
  "action": "execute",
  "decision": "allow",
  "reason": "same_trust_domain",
  "trace_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "span_id": "1234567890abcdef"
}

Logged fields:

  • timestamp: ISO 8601 timestamp with milliseconds
  • event_type: Type of event (authorization, capability_call, etc.)
  • caller_spiffe_id: Who made the request
  • callee_spiffe_id: Who received the request
  • capability: Capability being invoked
  • action: Specific action (execute, query, etc.)
  • decision: allow or deny
  • reason: Why access was allowed/denied (from OPA)
  • trace_id: Distributed trace ID for correlation
  • span_id: Span ID for detailed tracing

2. Capability Invocations

Every capability call is logged:

1
2
3
4
5
6
7
8
9
10
11
12
{
  "timestamp": "2024-01-15T10:30:00.456Z",
  "level": "info",
  "event_type": "capability_call",
  "caller_spiffe_id": "spiffe://example.com/agent/api-gateway/prod",
  "callee_spiffe_id": "spiffe://example.com/agent/data-processor/prod",
  "capability": "process_data",
  "status": "success",
  "duration_ms": 123.45,
  "trace_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "span_id": "abcdef1234567890"
}

3. Identity Events

SVID rotation and identity changes:

1
2
3
4
5
6
7
8
9
{
  "timestamp": "2024-01-15T10:15:00.789Z",
  "level": "info",
  "event_type": "svid_update",
  "spiffe_id": "spiffe://example.com/agent/data-processor/prod",
  "expiry": "2024-01-15T11:15:00Z",
  "ttl_seconds": 3600,
  "trust_domain": "example.com"
}

4. Authentication Events

mTLS handshake results:

1
2
3
4
5
6
7
8
9
10
{
  "timestamp": "2024-01-15T10:30:00.012Z",
  "level": "info",
  "event_type": "authentication",
  "peer_spiffe_id": "spiffe://example.com/agent/api-gateway/prod",
  "peer_trust_domain": "example.com",
  "tls_version": "1.3",
  "cipher_suite": "TLS_AES_256_GCM_SHA384",
  "status": "success"
}

5. Security Events

Anomalies and security-relevant events:

1
2
3
4
5
6
7
8
9
{
  "timestamp": "2024-01-15T10:30:05.678Z",
  "level": "warning",
  "event_type": "security_event",
  "description": "High rate of authorization denials",
  "caller_spiffe_id": "spiffe://unknown.com/agent/suspicious",
  "denial_count": 50,
  "time_window_seconds": 60
}

6. Agent Lifecycle Events

Startup, shutdown, configuration changes:

1
2
3
4
5
6
7
8
{
  "timestamp": "2024-01-15T10:00:00.000Z",
  "level": "info",
  "event_type": "agent_start",
  "agent_spiffe_id": "spiffe://example.com/agent/data-processor/prod",
  "version": "1.0.0",
  "config_hash": "sha256:abc123..."
}

Audit Log Configuration

Basic Configuration

Enable audit logging in your agent configuration:

1
2
3
4
5
observability:
  audit_log:
    enabled: true
    level: "info"  # debug, info, warning, error
    format: "json"  # json or text

Log Levels

Choose appropriate log level:

1
2
3
observability:
  audit_log:
    level: "info"

Levels:

  • debug: All events including verbose diagnostics
  • info: Normal operational events (recommended)
  • warning: Warnings and errors only
  • error: Errors only

Recommendations:

  • Production: info (captures all security events)
  • Development: debug (helps debugging)
  • High-volume: warning (reduces log volume)

Field Selection

Control which fields are logged:

1
2
3
4
5
6
7
8
9
10
11
12
observability:
  audit_log:
    fields:
      - "timestamp"
      - "event_type"
      - "caller_spiffe_id"
      - "callee_spiffe_id"
      - "capability"
      - "action"
      - "decision"
      - "reason"
      - "trace_id"

Payload Logging

Warning: Logging payloads can expose sensitive data.

1
2
3
4
5
6
7
8
9
10
observability:
  audit_log:
    include_payloads: false  # Recommended for production

    # If you must log payloads, redact sensitive fields
    redact_fields:
      - "password"
      - "ssn"
      - "credit_card"
      - "api_key"

Best Practice: Never log payloads in production unless required for compliance and properly secured.


Log Destinations

1. File Destination

Write logs to local file:

1
2
3
4
5
6
7
8
observability:
  audit_log:
    destination: "file"
    file_path: "/var/log/agentweave/audit.log"
    max_size_mb: 100
    max_backups: 10
    max_age_days: 30
    compress: true

Considerations:

  • Set up log rotation (max_size_mb, max_backups)
  • Ensure sufficient disk space
  • Protect file with proper permissions (600)
  • Not recommended for production (use centralized logging)

2. Syslog Destination

Send logs to syslog server:

1
2
3
4
5
6
7
observability:
  audit_log:
    destination: "syslog"
    syslog_address: "logs.example.com:514"
    syslog_protocol: "tcp"  # tcp or udp
    syslog_facility: "local0"
    syslog_tag: "agentweave-audit"

Protocols:

  • tcp: Reliable delivery (recommended)
  • udp: Lower overhead, may lose logs
  • tls: Encrypted syslog (port 6514)

TLS Syslog:

1
2
3
4
5
6
7
observability:
  audit_log:
    destination: "syslog"
    syslog_address: "logs.example.com:6514"
    syslog_protocol: "tls"
    syslog_tls_verify: true
    syslog_tls_ca_cert: "/etc/ssl/syslog-ca.pem"

3. Cloud Logging

AWS CloudWatch

1
2
3
4
5
6
observability:
  audit_log:
    destination: "cloudwatch"
    cloudwatch_group: "/aws/agentweave/audit"
    cloudwatch_stream: "agent-data-processor-prod"
    cloudwatch_region: "us-east-1"

Google Cloud Logging

1
2
3
4
5
observability:
  audit_log:
    destination: "gcp_logging"
    gcp_project: "my-project"
    gcp_log_name: "agentweave-audit"

Azure Monitor

1
2
3
4
5
observability:
  audit_log:
    destination: "azure_monitor"
    workspace_id: "12345678-1234-1234-1234-123456789012"
    workspace_key_env: "AZURE_WORKSPACE_KEY"

4. SIEM Integration

Splunk

1
2
3
4
5
6
7
8
observability:
  audit_log:
    destination: "splunk"
    splunk_url: "https://splunk.example.com:8088"
    splunk_token_env: "SPLUNK_HEC_TOKEN"
    splunk_index: "agentweave_audit"
    splunk_source: "agentweave"
    splunk_sourcetype: "agentweave:audit"

Elastic Stack (ELK)

1
2
3
4
5
6
observability:
  audit_log:
    destination: "elasticsearch"
    elasticsearch_url: "https://elasticsearch.example.com:9200"
    elasticsearch_index: "agentweave-audit"
    elasticsearch_api_key_env: "ELASTIC_API_KEY"

Datadog

1
2
3
4
5
6
7
observability:
  audit_log:
    destination: "datadog"
    datadog_api_key_env: "DD_API_KEY"
    datadog_site: "datadoghq.com"
    datadog_service: "agentweave"
    datadog_source: "audit"

Log Retention

Retention Requirements

Configure retention based on compliance needs:

Compliance Minimum Retention
SOC 2 1 year
HIPAA 6 years
PCI DSS 1 year (3 months online)
GDPR As needed for purpose
FedRAMP 1 year

Retention Configuration

In Cloud Logging

AWS CloudWatch:

1
2
3
aws logs put-retention-policy \
  --log-group-name /aws/agentweave/audit \
  --retention-in-days 2555  # 7 years for HIPAA

GCP Logging:

1
2
3
gcloud logging buckets update _Default \
  --location=global \
  --retention-days=2555

Azure Monitor:

1
2
3
4
az monitor log-analytics workspace update \
  --resource-group myResourceGroup \
  --workspace-name myWorkspace \
  --retention-time 2555

In SIEM

Configure retention in your SIEM:

Splunk:

1
2
3
4
5
6
[agentweave_audit]
coldPath = $SPLUNK_DB/agentweave_audit/colddb
homePath = $SPLUNK_DB/agentweave_audit/db
thawedPath = $SPLUNK_DB/agentweave_audit/thaweddb
maxTotalDataSizeMB = 500000
frozenTimePeriodInSecs = 220752000  # 7 years

Archive to Cold Storage

For long-term retention, archive to object storage:

1
2
3
4
5
6
7
8
9
10
# Example: Archive to S3 after 90 days
observability:
  audit_log:
    destination: "cloudwatch"
    cloudwatch_group: "/aws/agentweave/audit"
    archive:
      enabled: true
      after_days: 90
      s3_bucket: "agentweave-audit-archive"
      s3_prefix: "audit-logs/"

Log Analysis

Common Queries

Find All Access by Specific Agent

Splunk:

index=agentweave_audit caller_spiffe_id="spiffe://example.com/agent/api-gateway/prod"
| table timestamp, callee_spiffe_id, capability, decision

Elastic:

1
2
3
4
5
6
7
{
  "query": {
    "term": {
      "caller_spiffe_id": "spiffe://example.com/agent/api-gateway/prod"
    }
  }
}

Find All Authorization Denials

Splunk:

index=agentweave_audit event_type=authorization decision=deny
| stats count by caller_spiffe_id, reason
| sort -count

Elastic:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
  "query": {
    "bool": {
      "must": [
        {"term": {"event_type": "authorization"}},
        {"term": {"decision": "deny"}}
      ]
    }
  },
  "aggs": {
    "by_caller": {
      "terms": {"field": "caller_spiffe_id"},
      "aggs": {
        "by_reason": {
          "terms": {"field": "reason"}
        }
      }
    }
  }
}

Find Access to Specific Capability

Splunk:

index=agentweave_audit capability="process_sensitive_data"
| table timestamp, caller_spiffe_id, decision, duration_ms

Trace Specific Request

Splunk:

index=agentweave_audit trace_id="a1b2c3d4-e5f6-7890-abcd-ef1234567890"
| sort timestamp
| table timestamp, event_type, caller_spiffe_id, callee_spiffe_id, capability, decision

Find High-Volume Callers

Splunk:

index=agentweave_audit event_type=capability_call
| stats count by caller_spiffe_id
| sort -count
| head 20

Security Queries

Detect Brute Force Attempts

Splunk:

index=agentweave_audit event_type=authorization decision=deny
| bin _time span=1m
| stats count by _time, caller_spiffe_id
| where count > 10

Detect Unusual Access Patterns

Splunk:

index=agentweave_audit event_type=authorization
| stats count by caller_spiffe_id, callee_spiffe_id, capability
| where count < 10  # Unusual/rare combinations

Find Access Outside Business Hours

Splunk:

index=agentweave_audit event_type=capability_call
| eval hour=strftime(_time, "%H")
| where hour < 6 OR hour > 20
| table timestamp, caller_spiffe_id, capability

Detect Lateral Movement

Splunk:

index=agentweave_audit event_type=capability_call
| stats dc(callee_spiffe_id) as unique_targets by caller_spiffe_id
| where unique_targets > 10  # Calling many different agents

Alerting on Security Events

Prometheus Alerts

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
groups:
  - name: agentweave-security
    rules:
      # High denial rate
      - alert: HighAuthzDenialRate
        expr: rate(agentweave_authz_denied_total[5m]) > 10
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High authorization denial rate"
          description: " denials per second in last 5 minutes"

      # Unknown caller
      - alert: UnknownCallerAttempt
        expr: agentweave_authz_denied_total{reason="unknown_caller"} > 0
        labels:
          severity: critical
        annotations:
          summary: "Unknown agent attempted access"
          description: "Agent  not recognized"

      # SVID rotation failure
      - alert: SVIDRotationFailed
        expr: agentweave_svid_rotation_errors_total > 0
        labels:
          severity: critical
        annotations:
          summary: "SVID rotation failed"
          description: "Agent  failed to rotate SVID"

      # Unusual capability usage
      - alert: UnusualAdminCapability
        expr: rate(agentweave_capability_calls_total{capability="admin"}[1h]) > 1
        labels:
          severity: warning
        annotations:
          summary: "Unusual admin capability usage"

SIEM Alerts

Splunk Alert: Multiple Failures from Same Caller

index=agentweave_audit event_type=authorization decision=deny
| bin _time span=5m
| stats count by _time, caller_spiffe_id
| where count > 20

Action: Send email, create ticket, trigger webhook

Elastic Watcher: Access to Sensitive Capability

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
{
  "trigger": {
    "schedule": {"interval": "5m"}
  },
  "input": {
    "search": {
      "request": {
        "indices": ["agentweave-audit"],
        "body": {
          "query": {
            "bool": {
              "must": [
                {"term": {"capability": "delete_all_data"}},
                {"range": {"timestamp": {"gte": "now-5m"}}}
              ]
            }
          }
        }
      }
    }
  },
  "condition": {
    "compare": {"ctx.payload.hits.total": {"gt": 0}}
  },
  "actions": {
    "send_email": {
      "email": {
        "to": "security@example.com",
        "subject": "Critical: delete_all_data capability invoked",
        "body": "Someone invoked delete_all_data capability. Review immediately."
      }
    }
  }
}

Compliance Reporting

SOC 2 Audit Report

Generate report of all authorization decisions:

Splunk:

index=agentweave_audit event_type=authorization
  earliest=-30d@d latest=now
| stats count by decision, reason
| eval total=sum(count)
| eval percentage=round((count/total)*100, 2)
| table decision, reason, count, percentage

HIPAA Access Report

Who accessed PHI and when:

Splunk:

index=agentweave_audit capability="get_patient_data"
  earliest=-1y@y latest=now
| table timestamp, caller_spiffe_id, decision, trace_id
| sort timestamp desc

PCI DSS Cardholder Data Access

Splunk:

index=agentweave_audit capability="process_payment"
  earliest=-1y@y latest=now
| stats count by caller_spiffe_id, decision
| table caller_spiffe_id, decision, count

Log Security

Protect Log Files

If using file destination:

1
2
3
4
5
6
# Set proper permissions
chmod 600 /var/log/agentweave/audit.log
chown agentweave:agentweave /var/log/agentweave/audit.log

# Prevent modification
chattr +a /var/log/agentweave/audit.log  # Append-only

Encrypt Logs in Transit

Use TLS for syslog:

1
2
3
4
5
observability:
  audit_log:
    destination: "syslog"
    syslog_protocol: "tls"
    syslog_tls_verify: true

Sign Logs

For tamper-evidence, consider log signing:

1
2
3
4
5
6
observability:
  audit_log:
    signing:
      enabled: true
      key_path: "/etc/agentweave/signing-key.pem"
      algorithm: "RS256"

Each log entry includes signature:

1
2
3
4
5
6
7
{
  "timestamp": "2024-01-15T10:30:00.123Z",
  "event_type": "authorization",
  "caller_spiffe_id": "spiffe://example.com/agent/api-gateway",
  // ... other fields ...
  "signature": "eyJhbGciOiJSUzI1NiIs..."
}

Immutable Storage

Use write-once storage for compliance:

  • AWS S3: Object Lock
  • GCP: Bucket lock
  • Azure: Immutable blob storage

AWS S3 Example:

1
2
3
4
5
6
7
8
9
10
11
aws s3api put-object-lock-configuration \
  --bucket agentweave-audit-archive \
  --object-lock-configuration '{
    "ObjectLockEnabled": "Enabled",
    "Rule": {
      "DefaultRetention": {
        "Mode": "COMPLIANCE",
        "Years": 7
      }
    }
  }'

Best Practices

Do's

Enable audit logging in production

1
2
3
observability:
  audit_log:
    enabled: true

Send logs to centralized SIEM

1
2
3
4
observability:
  audit_log:
    destination: "syslog"
    syslog_address: "siem.example.com:514"

Configure appropriate retention

1
2
3
observability:
  audit_log:
    retention_days: 2555  # 7 years for HIPAA

Set up automated alerts

1
# Prometheus alerts, SIEM alerts, etc.

Review logs regularly

  • Daily: Security events
  • Weekly: Access patterns
  • Monthly: Compliance reports

Test log pipeline

1
2
# Ensure logs are reaching SIEM
agentweave test-audit-log

Don'ts

Don't log sensitive payloads

1
2
3
observability:
  audit_log:
    include_payloads: false  # Keep this false!

Don't use only local file logging in production

1
2
3
4
5
6
7
8
9
# ❌ Bad for production
observability:
  audit_log:
    destination: "file"

# ✅ Good for production
observability:
  audit_log:
    destination: "syslog"

Don't ignore log volume

  • Monitor log volume metrics
  • Set up alerts for unusual volume
  • Have capacity planning

Don't forget log security

  • Encrypt in transit (TLS)
  • Protect access (RBAC)
  • Prevent tampering (immutable storage)

Troubleshooting

Logs Not Appearing

Check agent logs:

1
kubectl logs -n agentweave pod/data-processor-abc123 | grep audit

Verify configuration:

1
agentweave validate config/production.yaml

Test connectivity:

1
2
3
4
5
# Syslog
nc -zv logs.example.com 514

# HTTPS
curl -I https://splunk.example.com:8088

High Log Volume

Reduce verbosity:

1
2
3
observability:
  audit_log:
    level: "warning"  # Instead of "info"

Filter events:

1
2
3
4
5
observability:
  audit_log:
    exclude_events:
      - "health_check"
      - "heartbeat"

Sample logs:

1
2
3
4
5
observability:
  audit_log:
    sampling:
      enabled: true
      rate: 0.1  # Log 10% of events

Summary

Audit logging provides:

  • Security monitoring: Detect attacks and anomalies
  • Compliance: Evidence for auditors
  • Forensics: Investigate incidents
  • Operational insights: Understand access patterns

Key Recommendations:

  1. Enable audit logging in production
  2. Send logs to centralized SIEM
  3. Configure retention per compliance requirements
  4. Set up automated alerts
  5. Review logs regularly
  6. Protect logs from tampering

Next Steps