Swarm Services & Tasks
Services and tasks are the core concepts for running applications in Docker Swarm. This guide covers replicated vs global services, scaling, rolling updates, health checks, service discovery, and task management.
In Docker Swarm, a service is the definition of how you want your application to run. It specifies the container image, number of replicas, ports, networks, and update policies. Think of a service as the "desired state" declaration.
A task is a single running container that belongs to a service. Swarm creates and manages tasks to meet the desired state defined in the service. For a service with 3 replicas, Swarm creates 3 tasks (containers) across the cluster. Tasks are scheduled on worker nodes and are automatically restarted if they fail.
Service: web (desired replicas: 3)
┌─────────────────────────────────────────┐
│ Task 1 (Node 1) │ Task 2 (Node 2) │ Task 3 (Node 3)
│ Container: web │ Container: web │ Container: web
│ Status: Running │ Status: Running │ Status: Running
└─────────────────────────────────────────┘
# Simple service with 1 replica
docker service create --name web nginx
# Service with 3 replicas and published port
docker service create --name web --replicas 3 -p 80:80 nginx
# Service with custom command
docker service create --name worker --replicas 5 alpine ping 8.8.8.8
# Service with environment variables
docker service create --name api \
--replicas 2 \
-e NODE_ENV=production \
-e DATABASE_URL=postgres://db:5432 \
myapi:latest
# Service with resource limits
docker service create --name app \
--replicas 3 \
--limit-cpu 0.5 \
--limit-memory 512M \
--reserve-cpu 0.25 \
--reserve-memory 256M \
myapp:latest
| Feature | Replicated Service | Global Service |
|---|---|---|
| Purpose | Scale application horizontally | Run on every node |
| Number of tasks | Fixed number (replicas) | One per node (auto-scales with cluster) |
| Use cases | Web servers, APIs, databases | Monitoring agents, log collectors, network plugins |
| Scaling | Manual or automatic (external) | Automatic when nodes added/removed |
| Placement constraints | Optional | Constraints control which nodes run the service |
# Replicated service (default) - 5 instances
docker service create --name web --replicas 5 nginx
# Global service - one instance on every node
docker service create --name node-exporter --mode global prom/node-exporter
# Global service with constraints (only on manager nodes)
docker service create --name manager-agent \
--mode global \
--constraint node.role==manager \
myagent:latest
# Scale to 10 replicas
docker service scale web=10
# Scale down to 3 replicas
docker service scale web=3
# Update replicas during service update
docker service update --replicas 8 web
# Get current replica count
docker service ls
docker service inspect web --format '{{.Spec.Mode.Replicated.Replicas}}'
# Scale with rolling update (scaling doesn't cause downtime)
docker service update --replicas 15 --update-parallelism 2 web
Rolling updates allow you to update a service without downtime. Swarm replaces tasks one by one (or in batches), ensuring the service remains available throughout the update. You can configure parallelism, delay between batches, and failure handling.
# Basic rolling update
docker service update --image nginx:alpine web
# Advanced rolling update settings
docker service update \
--image nginx:1.25 \
--update-parallelism 2 \
--update-delay 10s \
--update-order start-first \
--update-failure-action rollback \
web
# Update with health check integration
docker service update \
--image myapp:2.0 \
--update-parallelism 1 \
--update-order start-first \
--update-monitor 30s \
--update-max-failure-ratio 0.3 \
myapp
# Monitor update progress
docker service ps web --filter "desired-state=running"
docker service inspect web --pretty | grep -A 10 Update
Swarm supports automatic and manual rollbacks. You can configure rollback parameters during service creation or update. If an update fails, Swarm can automatically revert to the previous version.
# Manual rollback to previous version
docker service rollback web
# Configure automatic rollback on failure
docker service create \
--name web \
--replicas 5 \
--update-parallelism 2 \
--update-failure-action rollback \
--rollback-parallelism 1 \
--rollback-monitor 20s \
nginx:1.20
# View rollback history
docker service ps web --filter "event=rollback"
# Force rollback to specific image version
docker service update --image nginx:1.19 --force web
Health checks tell Swarm whether a container is functioning correctly. Swarm uses health check status to decide whether to replace unhealthy tasks and whether rolling updates are successful. A proper health check is essential for production services.
# Service with health check
docker service create --name web \
--replicas 3 \
--health-cmd "curl -f http://localhost/ || exit 1" \
--health-interval 30s \
--health-timeout 10s \
--health-retries 3 \
--health-start-period 30s \
nginx
# In docker-compose.yml for Swarm
version: '3.8'
services:
web:
image: nginx
deploy:
replicas: 3
update_config:
order: start-first
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
Swarm provides built-in service discovery through DNS. Every service gets a DNS name (the service name), and Swarm's internal load balancer resolves that name to the IP addresses of healthy tasks. Services can communicate by name without knowing the underlying infrastructure.
# Create two services on the same network
docker network create -d overlay appnet
docker service create --name api --network appnet --replicas 3 myapi
docker service create --name web --network appnet -p 80:80 nginx
# From within the web service, you can access api by name
# curl http://api:3000/data
# Check service discovery
docker service inspect web --pretty | grep -A 5 Endpoint
# List DNS records for a service
docker service ps api
# Test DNS resolution from within a task
docker service exec web nslookup api
Placement constraints let you control which nodes can run specific services. This is useful for hardware-specific workloads (GPUs, SSDs), compliance (data locality), or isolating critical services.
# Run only on nodes with SSD label
docker service create --name db \
--constraint node.labels.storage==ssd \
--replicas 2 \
postgres
# Run only on manager nodes
docker service create --name manager-app \
--constraint node.role==manager \
--replicas 1 \
myapp
# Run only on nodes without role manager (workers only)
docker service create --name worker-app \
--constraint node.role!=manager \
--replicas 10 \
worker
# Combine multiple constraints
docker service create --name gpu-app \
--constraint node.labels.gpu==nvidia \
--constraint node.labels.region==us-east \
--replicas 2 \
tensorflow:latest
# Add a label to a node
docker node update --label-add gpu=nvidia node1
# List services
docker service ls
# List tasks for a service
docker service ps web
docker service ps web --filter "desired-state=running"
docker service ps web --format "table {{.ID}}\t{{.Node}}\t{{.Status}}"
# Inspect a service
docker service inspect web
docker service inspect web --pretty
# View service logs
docker service logs web
docker service logs -f web --tail 50
docker service logs --since 1h web
# Remove a service
docker service rm web
# Update service configuration
docker service update --env-add NODE_ENV=production web
docker service update --publish-rm 8080 --publish-add 80:80 web
docker service update --reserve-cpu 0.5 --limit-cpu 1 web
# Pause/resume service (stop scheduling new tasks)
docker service scale web=0 # Stop all tasks
docker service scale web=5 # Resume
- Always use health checks - Critical for rolling updates and self-healing.
- Set resource limits - Prevent services from consuming all host resources.
- Use rolling updates with start-first order - Achieve zero-downtime deployments.
- Pin image versions - Never use `:latest` in production. Use specific tags or digests.
- Use placement constraints for special hardware - Ensure GPU workloads only run on GPU nodes.
- Monitor service health - Use `docker service ps` and external monitoring tools.
- Use secrets for sensitive data - Never pass passwords via environment variables in clear text.
- Set update failure action to rollback - Automatic rollback reduces downtime when updates fail.
- Test rolling updates in staging first - Verify update behavior before production.
# docker-compose.yml for Swarm deployment
version: '3.8'
services:
web:
image: nginx:1.25-alpine
ports:
- "80:80"
deploy:
replicas: 5
update_config:
parallelism: 2
delay: 10s
order: start-first
failure_action: rollback
rollback_config:
parallelism: 1
order: stop-first
restart_policy:
condition: on-failure
max_attempts: 3
resources:
limits:
cpus: '0.5'
memory: 512M
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/health"]
interval: 30s
timeout: 5s
retries: 3
start_period: 10s
networks:
- webnet
api:
image: myapi:2.1.0
deploy:
replicas: 3
placement:
constraints:
- node.labels.region == primary
update_config:
parallelism: 1
order: start-first
healthcheck:
test: ["CMD", "node", "health.js"]
interval: 30s
networks:
- webnet
secrets:
- api_key
redis:
image: redis:7-alpine
deploy:
mode: global
volumes:
- redis_data:/data
networks:
- webnet
networks:
webnet:
driver: overlay
volumes:
redis_data:
secrets:
api_key:
external: true
# Deploy: docker stack deploy -c docker-compose.yml myapp
Services and tasks are the foundation of Swarm orchestration. Master these concepts to deploy scalable, resilient applications across your cluster.