Swarm Services & Tasks

Services and tasks are the core concepts for running applications in Docker Swarm. This guide covers replicated vs global services, scaling, rolling updates, health checks, service discovery, and task management.

Replicated Services Global Services Rolling Updates Health Checks
What are Services and Tasks?

In Docker Swarm, a service is the definition of how you want your application to run. It specifies the container image, number of replicas, ports, networks, and update policies. Think of a service as the "desired state" declaration.

A task is a single running container that belongs to a service. Swarm creates and manages tasks to meet the desired state defined in the service. For a service with 3 replicas, Swarm creates 3 tasks (containers) across the cluster. Tasks are scheduled on worker nodes and are automatically restarted if they fail.

Service: web (desired replicas: 3) ┌─────────────────────────────────────────┐ │ Task 1 (Node 1) │ Task 2 (Node 2) │ Task 3 (Node 3) │ Container: web │ Container: web │ Container: web │ Status: Running │ Status: Running │ Status: Running └─────────────────────────────────────────┘
Swarm constantly monitors your service's desired state. If a task crashes, Swarm automatically creates a new one to maintain the replica count. This is called self-healing.
Creating Services: The Basics
# Simple service with 1 replica docker service create --name web nginx # Service with 3 replicas and published port docker service create --name web --replicas 3 -p 80:80 nginx # Service with custom command docker service create --name worker --replicas 5 alpine ping 8.8.8.8 # Service with environment variables docker service create --name api \ --replicas 2 \ -e NODE_ENV=production \ -e DATABASE_URL=postgres://db:5432 \ myapi:latest # Service with resource limits docker service create --name app \ --replicas 3 \ --limit-cpu 0.5 \ --limit-memory 512M \ --reserve-cpu 0.25 \ --reserve-memory 256M \ myapp:latest
Service Types: Replicated vs Global
Feature Replicated Service Global Service
Purpose Scale application horizontally Run on every node
Number of tasks Fixed number (replicas) One per node (auto-scales with cluster)
Use cases Web servers, APIs, databases Monitoring agents, log collectors, network plugins
Scaling Manual or automatic (external) Automatic when nodes added/removed
Placement constraints Optional Constraints control which nodes run the service
# Replicated service (default) - 5 instances docker service create --name web --replicas 5 nginx # Global service - one instance on every node docker service create --name node-exporter --mode global prom/node-exporter # Global service with constraints (only on manager nodes) docker service create --name manager-agent \ --mode global \ --constraint node.role==manager \ myagent:latest
Scaling Services Up and Down
# Scale to 10 replicas docker service scale web=10 # Scale down to 3 replicas docker service scale web=3 # Update replicas during service update docker service update --replicas 8 web # Get current replica count docker service ls docker service inspect web --format '{{.Spec.Mode.Replicated.Replicas}}' # Scale with rolling update (scaling doesn't cause downtime) docker service update --replicas 15 --update-parallelism 2 web
Scaling is instant and doesn't require container restarts. Swarm starts new tasks and distributes them across available nodes. The load balancer automatically includes new tasks.
Rolling Updates: Zero-Downtime Deployments

Rolling updates allow you to update a service without downtime. Swarm replaces tasks one by one (or in batches), ensuring the service remains available throughout the update. You can configure parallelism, delay between batches, and failure handling.

# Basic rolling update docker service update --image nginx:alpine web # Advanced rolling update settings docker service update \ --image nginx:1.25 \ --update-parallelism 2 \ --update-delay 10s \ --update-order start-first \ --update-failure-action rollback \ web # Update with health check integration docker service update \ --image myapp:2.0 \ --update-parallelism 1 \ --update-order start-first \ --update-monitor 30s \ --update-max-failure-ratio 0.3 \ myapp # Monitor update progress docker service ps web --filter "desired-state=running" docker service inspect web --pretty | grep -A 10 Update
To achieve zero downtime, use `--update-order start-first`. This starts new tasks before stopping old ones. Also ensure your service has health checks configured so Swarm knows when a task is ready.
Rolling Back Failed Updates

Swarm supports automatic and manual rollbacks. You can configure rollback parameters during service creation or update. If an update fails, Swarm can automatically revert to the previous version.

# Manual rollback to previous version docker service rollback web # Configure automatic rollback on failure docker service create \ --name web \ --replicas 5 \ --update-parallelism 2 \ --update-failure-action rollback \ --rollback-parallelism 1 \ --rollback-monitor 20s \ nginx:1.20 # View rollback history docker service ps web --filter "event=rollback" # Force rollback to specific image version docker service update --image nginx:1.19 --force web
Health Checks: Keeping Services Healthy

Health checks tell Swarm whether a container is functioning correctly. Swarm uses health check status to decide whether to replace unhealthy tasks and whether rolling updates are successful. A proper health check is essential for production services.

# Service with health check docker service create --name web \ --replicas 3 \ --health-cmd "curl -f http://localhost/ || exit 1" \ --health-interval 30s \ --health-timeout 10s \ --health-retries 3 \ --health-start-period 30s \ nginx # In docker-compose.yml for Swarm version: '3.8' services: web: image: nginx deploy: replicas: 3 update_config: order: start-first healthcheck: test: ["CMD", "curl", "-f", "http://localhost/"] interval: 30s timeout: 10s retries: 3 start_period: 30s
Health checks are critical for rolling updates. Without them, Swarm doesn't know if new tasks are healthy before proceeding. Always add health checks to production services.
Service Discovery and DNS

Swarm provides built-in service discovery through DNS. Every service gets a DNS name (the service name), and Swarm's internal load balancer resolves that name to the IP addresses of healthy tasks. Services can communicate by name without knowing the underlying infrastructure.

# Create two services on the same network docker network create -d overlay appnet docker service create --name api --network appnet --replicas 3 myapi docker service create --name web --network appnet -p 80:80 nginx # From within the web service, you can access api by name # curl http://api:3000/data # Check service discovery docker service inspect web --pretty | grep -A 5 Endpoint # List DNS records for a service docker service ps api # Test DNS resolution from within a task docker service exec web nslookup api
Placement Constraints: Controlling Where Tasks Run

Placement constraints let you control which nodes can run specific services. This is useful for hardware-specific workloads (GPUs, SSDs), compliance (data locality), or isolating critical services.

# Run only on nodes with SSD label docker service create --name db \ --constraint node.labels.storage==ssd \ --replicas 2 \ postgres # Run only on manager nodes docker service create --name manager-app \ --constraint node.role==manager \ --replicas 1 \ myapp # Run only on nodes without role manager (workers only) docker service create --name worker-app \ --constraint node.role!=manager \ --replicas 10 \ worker # Combine multiple constraints docker service create --name gpu-app \ --constraint node.labels.gpu==nvidia \ --constraint node.labels.region==us-east \ --replicas 2 \ tensorflow:latest # Add a label to a node docker node update --label-add gpu=nvidia node1
Service Management Commands Reference
# List services docker service ls # List tasks for a service docker service ps web docker service ps web --filter "desired-state=running" docker service ps web --format "table {{.ID}}\t{{.Node}}\t{{.Status}}" # Inspect a service docker service inspect web docker service inspect web --pretty # View service logs docker service logs web docker service logs -f web --tail 50 docker service logs --since 1h web # Remove a service docker service rm web # Update service configuration docker service update --env-add NODE_ENV=production web docker service update --publish-rm 8080 --publish-add 80:80 web docker service update --reserve-cpu 0.5 --limit-cpu 1 web # Pause/resume service (stop scheduling new tasks) docker service scale web=0 # Stop all tasks docker service scale web=5 # Resume
Service Best Practices
  • Always use health checks - Critical for rolling updates and self-healing.
  • Set resource limits - Prevent services from consuming all host resources.
  • Use rolling updates with start-first order - Achieve zero-downtime deployments.
  • Pin image versions - Never use `:latest` in production. Use specific tags or digests.
  • Use placement constraints for special hardware - Ensure GPU workloads only run on GPU nodes.
  • Monitor service health - Use `docker service ps` and external monitoring tools.
  • Use secrets for sensitive data - Never pass passwords via environment variables in clear text.
  • Set update failure action to rollback - Automatic rollback reduces downtime when updates fail.
  • Test rolling updates in staging first - Verify update behavior before production.
Real-World Example: Complete Stack with Health Checks and Rolling Updates
# docker-compose.yml for Swarm deployment version: '3.8' services: web: image: nginx:1.25-alpine ports: - "80:80" deploy: replicas: 5 update_config: parallelism: 2 delay: 10s order: start-first failure_action: rollback rollback_config: parallelism: 1 order: stop-first restart_policy: condition: on-failure max_attempts: 3 resources: limits: cpus: '0.5' memory: 512M healthcheck: test: ["CMD", "curl", "-f", "http://localhost/health"] interval: 30s timeout: 5s retries: 3 start_period: 10s networks: - webnet api: image: myapi:2.1.0 deploy: replicas: 3 placement: constraints: - node.labels.region == primary update_config: parallelism: 1 order: start-first healthcheck: test: ["CMD", "node", "health.js"] interval: 30s networks: - webnet secrets: - api_key redis: image: redis:7-alpine deploy: mode: global volumes: - redis_data:/data networks: - webnet networks: webnet: driver: overlay volumes: redis_data: secrets: api_key: external: true # Deploy: docker stack deploy -c docker-compose.yml myapp
Frequently Asked Questions
What's the difference between a service and a task?
A service is the desired state definition (image, replicas, ports). A task is a running container that Swarm creates to fulfill that desired state. If a task fails, Swarm creates a new one.
How many replicas should I run?
For production, at least 3 replicas for high availability (can tolerate 2 failures). For development, 1-2 replicas is fine. More replicas provide better load distribution and fault tolerance.
Can I update a service without downtime?
Yes! Use rolling updates with `--update-order start-first`. New tasks start before old tasks stop, ensuring continuous availability. Health checks ensure new tasks are ready before continuing.
What happens if a task fails?
Swarm automatically restarts failed tasks based on the restart policy (default: restart on failure). If a node fails, Swarm reschedules tasks on healthy nodes.
How do I scale services automatically?
Swarm doesn't have built-in auto-scaling. You need external tools (like Docker Swarm autoscaler) or cloud provider integrations to scale based on CPU/memory metrics.
Can I run a service on specific nodes only?
Yes! Use placement constraints with `--constraint` flag. You can constrain by node role, node labels, or engine labels. Add custom labels to nodes with `docker node update --label-add`.
What's the difference between `docker service update` and `docker service scale`?
`scale` is a shorthand for changing only the replica count. `update` can change replica count plus image, environment variables, ports, and other service parameters. Use `scale` for simplicity when only changing replica count.
How do I debug a service that won't start?
Check `docker service ps --no-trunc` to see detailed error messages. Check `docker service logs ` for container logs. Also check node resources and placement constraints.
Previous: Docker Swarm Basics Next: Swarm Networking

Services and tasks are the foundation of Swarm orchestration. Master these concepts to deploy scalable, resilient applications across your cluster.