Container Performance Monitoring
Monitor container performance effectively using docker stats, cAdvisor, Prometheus, and Grafana. Learn to set resource limits and optimize container performance for production workloads.
Container performance monitoring is essential for maintaining application reliability, optimizing resource usage, and preventing outages. Without monitoring, you risk containers consuming excessive CPU or memory, causing performance degradation for other containers or the host system.
Key metrics to monitor include: CPU usage, memory consumption, network I/O, disk I/O, container uptime, and restart counts. Monitoring helps you identify resource bottlenecks, plan capacity, and debug performance issues before they impact users.
The `docker stats` command provides real-time resource usage for running containers. It shows CPU percentage, memory usage and limit, network I/O, and block I/O. This is the quickest way to check container performance from the command line.
# Show stats for all running containers
docker stats
# Show stats for specific containers
docker stats container1 container2
# Show stats once (non-streaming)
docker stats --no-stream
# Show stats with custom format
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
# Show stats for all containers (including stopped)
docker stats --all
# Show only container IDs
docker stats --no-stream --format "{{.Container}}"
# Watch specific containers with refresh interval
watch -n 2 "docker stats --no-stream container1 container2"
Resource limits prevent a single container from consuming all host resources. Always set limits in production to ensure fair resource distribution and prevent denial-of-service, whether accidental or malicious.
# Memory limits
docker run -d --memory=512m --memory-swap=1g --memory-reservation=256m nginx
# --memory: hard limit (512 MB)
# --memory-swap: total memory + swap (1 GB)
# --memory-reservation: soft limit (256 MB)
# CPU limits
docker run -d --cpus=0.5 --cpuset-cpus=0-1 nginx
# --cpus: limit to 0.5 CPU cores
# --cpuset-cpus: pin to specific CPU cores
# CPU shares (relative weight)
docker run -d --cpu-shares=512 nginx
# Default is 1024; higher shares get more CPU when under contention
# Disk I/O limits
docker run -d --device-read-bps=/dev/sda:1mb --device-write-bps=/dev/sda:1mb nginx
# Limit read/write to 1 MB per second
# PIDs limit
docker run -d --pids-limit=100 nginx
# Kernel memory limit
docker run -d --kernel-memory=100m nginx
- Set memory limits based on application profiling - Observe normal usage and add 20-30% buffer.
- Use memory reservation for burstable workloads - Allows containers to use more memory when available.
- Set CPU limits to prevent noisy neighbors - Especially important in multi-tenant environments.
- Use CPU shares for relative priority - Critical services get higher shares.
- Monitor memory swap usage - High swap indicates memory pressure.
- Test limits in staging before production - Ensure your application works within limits.
cAdvisor (Container Advisor) is an open-source tool from Google that collects, aggregates, processes, and exports container performance metrics. It runs as a container and automatically discovers all containers on the host, exposing metrics via a web UI and Prometheus endpoint.
# Run cAdvisor as a container
docker run -d \
--name=cadvisor \
--restart=always \
--network=host \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:ro \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--volume=/dev/disk/:/dev/disk:ro \
gcr.io/cadvisor/cadvisor:latest
# Access cAdvisor web UI
# http://localhost:8080
# Check cAdvisor metrics endpoint
curl http://localhost:8080/metrics
For production environments, combine cAdvisor with Prometheus (metrics collection) and Grafana (visualization). This provides long-term metric storage, alerting, and beautiful dashboards.
# docker-compose.yml for monitoring stack
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
ports:
- "9090:9090"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
- grafana_data:/var/lib/grafana
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
volumes:
prometheus_data:
grafana_data:
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'docker'
static_configs:
- targets: ['localhost:8080'] # cAdvisor
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
CPU Usage
Source: docker stats, cAdvisor
Sustained >80% indicates need for more CPU
Memory Usage
Source: docker stats, cAdvisor
Approaching limit indicates memory leak or insufficient memory
Memory Swap
Source: docker stats, cAdvisor
High swap indicates memory pressure, performance degradation
Network I/O
Source: docker stats, cAdvisor
Sudden spikes may indicate attacks or traffic issues
Disk I/O
Source: docker stats, cAdvisor
High disk I/O may impact other containers sharing the disk
Container Restarts
Source: docker inspect, events
Frequent restarts indicate application crashes
# CPU usage (percentage)
rate(container_cpu_usage_seconds_total{name=~".+"}[1m]) * 100
# Memory usage (bytes)
container_memory_usage_bytes{name=~".+"}
# Memory limit
container_spec_memory_limit_bytes{name=~".+"}
# Network received bytes
rate(container_network_receive_bytes_total{name=~".+"}[1m])
# Network transmitted bytes
rate(container_network_transmit_bytes_total{name=~".+"}[1m])
# Container restarts
changes(container_start_time_seconds{name=~".+"}[1h])
# Prometheus alerting rules
groups:
- name: container_alerts
rules:
- alert: HighContainerCPU
expr: rate(container_cpu_usage_seconds_total[5m]) * 100 > 80
for: 5m
annotations:
summary: "High CPU usage on container {{ $labels.name }}"
- alert: HighContainerMemory
expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
for: 5m
annotations:
summary: "High memory usage on container {{ $labels.name }}"
- alert: ContainerRestarting
expr: changes(container_start_time_seconds[10m]) > 2
annotations:
summary: "Container {{ $labels.name }} is restarting frequently"
- alert: ContainerMissing
expr: absent(container_start_time_seconds)
for: 5m
annotations:
summary: "Container {{ $labels.name }} is not running"
- Use multi-stage builds - Reduce image size, which reduces pull and start times.
- Pin image versions - Use specific tags for reproducibility and faster pulls (cached layers).
- Set resource limits - Prevents containers from consuming excessive resources.
- Use Alpine or slim base images - Smaller images have less overhead.
- Configure log rotation - Prevent disk space exhaustion from logs.
- Use read-only root filesystem - Reduces disk I/O and improves security.
- Profile your application - Identify bottlenecks in the application, not just container layer.
- Use volume mounts for I/O heavy workloads - Direct I/O to host filesystem bypasses storage driver overhead.
Effective container performance monitoring is essential for production reliability. Use docker stats for quick checks, cAdvisor for detailed metrics, and Prometheus/Grafana for production monitoring.