colony metrics

View and manage metrics for monitoring colony performance.

Synopsis

colony metrics <SUBCOMMAND>

Description

The metrics system collects and displays performance data about your colony, including agent activity, task completion, workflow execution, and system resources.

Subcommands

list

List all registered metrics with current values.

colony metrics list

Output:

Available Metrics

Agent Metrics
  agent.tasks.completed [counter] - 15.00 tasks
    Number of tasks completed by agents
  agent.tasks.failed [counter] - 2.00 tasks
    Number of tasks failed by agents

Task Metrics
  task.queue.depth [gauge] - 5.00 tasks
    Current number of tasks in the queue

Workflow Metrics
  workflow.runs.completed [counter] - 8.00 runs
    Number of workflow runs completed

System Metrics
  system.memory.used [gauge] - 1024.50 MB
    Memory used by the colony system

show

Show detailed statistics for a specific metric.

colony metrics show <NAME> [--hours HOURS]

Arguments:

NAME - Metric name (e.g., agent.tasks.completed)

Options:

--hours - Time period in hours (default: 1)

Example:

colony metrics show agent.tasks.completed --hours 24

Output:

Metric: agent.tasks.completed
Type: Counter
Description: Number of tasks completed by agents
Unit: tasks

Statistics (last 24 hours)

  Current: 45.00 tasks
  Average: 38.50 tasks
  Min: 15.00 tasks
  Max: 45.00 tasks
  Total: 462.00 tasks
  Data Points: 12

Recent Values

  2024-01-15 14:30:00 - 45.00 tasks
  2024-01-15 13:30:00 - 42.00 tasks
  2024-01-15 12:30:00 - 38.00 tasks

export

Export all metrics to JSON format.

colony metrics export [-o FILE]

Options:

-o, --output - Output file path (default: stdout)

Example:

# Export to file
colony metrics export -o metrics.json

# Export to stdout
colony metrics export | jq '.[] | select(.name | contains("agent"))'

JSON Format:

[
  {
    "name": "agent.tasks.completed",
    "metric_type": "Counter",
    "description": "Number of tasks completed by agents",
    "unit": "tasks",
    "points": [
      {
        "timestamp": "2024-01-15T14:30:00Z",
        "value": 45.0,
        "labels": {}
      }
    ]
  }
]

clear

Clear old metrics data or all metrics.

colony metrics clear [--all]

Options:

--all - Clear all metrics, not just old data

Examples:

# Clear data older than retention period (24 hours)
colony metrics clear

# Clear all metrics
colony metrics clear --all

init

Initialize sample metrics for testing.

colony metrics init

Creates standard metrics:

agent.tasks.completed
agent.tasks.failed
task.queue.depth
task.execution_time
workflow.runs.completed
system.memory.used

record

Record a sample metric value (for testing).

colony metrics record <NAME> <VALUE>

Arguments:

NAME - Metric name
VALUE - Numeric value to record

Example:

colony metrics record agent.tasks.completed 10
colony metrics record system.cpu.percent 45.5

Metric Types

Counter

Monotonically increasing values:

Task completions
Workflow runs
Errors

Use case: Counting events that only go up.

Gauge

Point-in-time values that can go up or down:

Queue depth
Memory usage
Active agents

Use case: Current state snapshots.

Histogram

Distribution of values over time:

Execution times
Response latencies
Resource usage patterns

Use case: Understanding value distributions.

Standard Metrics

Agent Metrics

agent.tasks.completed - Tasks completed by agents
agent.tasks.failed - Tasks that failed
agent.active_time - Time agent spent active
agent.idle_time - Time agent spent idle

Task Metrics

task.queue.depth - Current tasks in queue
task.wait_time - Time tasks wait before starting
task.execution_time - Time to complete tasks

Workflow Metrics

workflow.runs.started - Workflows started
workflow.runs.completed - Workflows completed
workflow.runs.failed - Workflows that failed
workflow.step.duration - Time per workflow step

System Metrics

system.memory.used - Memory usage
system.cpu.percent - CPU utilization

Integration

Export to Monitoring Tools

# Prometheus format (future)
colony metrics export --format prometheus

# Grafana (use JSON export)
colony metrics export | curl -X POST grafana-api/metrics

# Custom processing
colony metrics export | jq '...'

Automated Monitoring

# Continuous monitoring
while true; do
  colony metrics export -o metrics-$(date +%s).json
  sleep 60
done

Alerting

# Check thresholds
QUEUE_DEPTH=$(colony metrics show task.queue.depth --json | jq '.current')
if [ $QUEUE_DEPTH -gt 100 ]; then
  echo "Alert: Queue depth is $QUEUE_DEPTH"
fi

Examples

Monitor Task Completion Rate

# Show last 24 hours
colony metrics show agent.tasks.completed --hours 24

# Compare with failures
colony metrics show agent.tasks.failed --hours 24

Track System Resources

# Memory usage
colony metrics show system.memory.used

# Record custom metric
colony metrics record system.cpu.percent $(top -bn1 | grep "Cpu(s)" | awk '{print $2}')

Generate Daily Reports

#!/bin/bash
# daily-metrics.sh
DATE=$(date +%Y-%m-%d)
colony metrics export -o reports/metrics-$DATE.json

# Email report
mail -s "Colony Metrics $DATE" admin@example.com < reports/metrics-$DATE.json

Retention

Default: 24 hours
Configurable: Future feature
Pruning: Automatic via colony metrics clear

Colony Documentation

colony metrics

Synopsis

Description

Subcommands

list

show

export

clear

init

record

Metric Types

Counter

Gauge

Histogram

Standard Metrics

Agent Metrics

Task Metrics

Workflow Metrics

System Metrics

Integration

Export to Monitoring Tools

Automated Monitoring

Alerting

Examples

Monitor Task Completion Rate

Track System Resources

Generate Daily Reports

Retention

See Also