Docker Swarm Mode
Docker Swarm Mode is Docker's native clustering and orchestration solution. Learn about manager nodes, worker nodes, Raft consensus, and how to deploy highly available, scalable containerized applications across multiple hosts.
Docker Swarm Mode is Docker's native container orchestration platform. It turns a group of Docker hosts into a single, virtual Docker host. Swarm provides multi-host networking, service discovery, load balancing, rolling updates, secret management, and self-healing—all using familiar Docker commands.
Unlike Kubernetes (which requires learning new concepts and tools), Swarm is built directly into the Docker engine. If you know Docker, you already understand most of Swarm. This makes Swarm the natural choice for teams already using Docker who need to move from single-host to multi-host deployments.
┌─────────────────────────────────────────────────────────┐
│ Docker Swarm Cluster │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Manager │ │ Manager │ │ Worker │ │
│ │ Node │ │ Node │ │ Node │ │
│ │ (Leader) │ │ (Follower) │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ └───────────────┴───────────────┘ │
│ Overlay Network │
└─────────────────────────────────────────────────────────┘
A Swarm consists of two types of nodes: Manager nodes and Worker nodes.
Manager Nodes are responsible for cluster management, scheduling services, maintaining cluster state, and handling orchestration. They use the Raft consensus algorithm to maintain a consistent cluster state. For high availability, run an odd number of manager nodes (3, 5, or 7).
Worker Nodes are the workhorses that actually run your containers. They receive tasks from managers and execute them. Workers do not participate in Raft consensus and are not involved in cluster management decisions.
# Check node roles
docker node ls
# Promote a worker to manager
docker node promote worker1
# Demote a manager to worker
docker node demote manager2
# Inspect a node
docker node inspect manager1 --pretty
Swarm uses the Raft consensus algorithm to maintain a consistent cluster state across all manager nodes. Raft ensures that all managers agree on the current state of the cluster, even if some managers fail.
How Raft works: One manager is elected as the leader. All cluster changes go through the leader, which replicates them to follower managers. A majority of managers (more than half) must agree to commit a change. This means with 3 managers, you can tolerate 1 failure; with 5 managers, you can tolerate 2 failures.
Best practice: Run an odd number of manager nodes. Never run 2 managers (can't tolerate any failure). Never run 4 managers (only tolerates 1 failure but adds complexity). 3 or 5 managers are ideal for most deployments.
# Check Raft cluster status
docker system info | grep -A 10 Raft
# View current leader
docker node inspect self --format '{{.ManagerStatus.Leader}}'
# Check manager reachability
docker node ls
# On the first manager node (10.0.0.1)
docker swarm init --advertise-addr 10.0.0.1
# Output provides join token for workers and managers
# To add a worker node (10.0.0.2)
docker swarm join --token SWMTKN-1-xxx 10.0.0.1:2377
# To add another manager node (10.0.0.3)
docker swarm join --token SWMTKN-1-xxx --manager 10.0.0.1:2377
# View nodes in cluster
docker node ls
# Leave the swarm (on a worker)
docker swarm leave
# Leave and force removal (on a manager)
docker swarm leave --force
In Swarm, you deploy services instead of individual containers. A service is a description of your desired state: which image to use, how many replicas, which ports to expose, etc. Swarm then creates tasks (containers) to meet that desired state.
# Create a service with 3 replicas
docker service create --name web --replicas 3 -p 80:80 nginx
# List services
docker service ls
# List tasks (containers) for a service
docker service ps web
# Scale a service
docker service scale web=5
# Update service image (rolling update)
docker service update --image nginx:alpine web
# Update with custom rollout settings
docker service update \
--image myapp:2.0 \
--update-parallelism 2 \
--update-delay 10s \
--update-failure-action rollback \
web
# Remove service
docker service rm web
Swarm supports two service modes:
Replicated services run a specified number of tasks (replicas) across available nodes. This is the default and works for most applications where you want multiple instances for load balancing and redundancy.
Global services run exactly one task on every node in the swarm. This is perfect for monitoring agents, log collectors, or any daemon that needs to run on every node.
# Replicated service (default) - 3 instances across the cluster
docker service create --name web --replicas 3 nginx
# Global service - one instance on every node
docker service create --name monitoring --mode global prom/node-exporter
# Check service details
docker service inspect web --pretty
Overlay networks allow containers on different Swarm nodes to communicate securely as if they're on the same network. Swarm automatically creates an overlay network for service discovery—services can reach each other by service name.
# Create an overlay network
docker network create -d overlay --attachable my-overlay
# Deploy service on overlay network
docker service create --name web --network my-overlay --replicas 3 nginx
# Deploy another service on same network
docker service create --name api --network my-overlay --replicas 2 myapi
# Services can communicate by name (web, api)
docker service logs api # Shows connections to web service
# List networks
docker network ls
# Inspect overlay network
docker network inspect my-overlay
Swarm provides built-in load balancing through the routing mesh. When you publish a port on a service (e.g., `-p 80:80`), any node in the swarm can accept traffic on that port and route it to a healthy task, even if the task isn't running on that node.
Additionally, Swarm has internal DNS-based load balancing. When a service makes a request to another service by name (e.g., `http://web:80`), Swarm's DNS returns the IP of one of the service's tasks in a round-robin fashion.
# Publish port on all nodes (routing mesh)
docker service create --name web --replicas 3 -p 80:80 nginx
# Publish port on local node only (no routing mesh)
docker service create --name web --replicas 3 -p 80:80 --publish mode=host nginx
# Test internal load balancing from within another service
docker service exec api curl -s web:80
Swarm supports zero-downtime rolling updates. When you update a service's image, Swarm replaces tasks one by one (or with configurable parallelism), ensuring the service remains available during the update.
# Deploy initial version
docker service create --name web --replicas 5 nginx:1.20
# Rolling update to new version (2 at a time, 10s between batches)
docker service update \
--image nginx:1.21 \
--update-parallelism 2 \
--update-delay 10s \
web
# Rollback to previous version
docker service rollback web
# Set update order (start-first for zero downtime)
docker service update \
--update-order start-first \
--update-parallelism 1 \
web
# Check update status
docker service ps web --filter "desired-state=running"
Swarm provides built-in encrypted secrets management. Secrets are stored encrypted in the Raft log and are only accessible to services that have been granted access. Secrets are mounted as files in the container's `/run/secrets/` directory.
# Create a secret from a file
echo "mysecretpassword" | docker secret create db_password -
# Create a secret from stdin
printf "admin123" | docker secret create api_key -
# Use secret in service
docker service create --name db \
--secret db_password \
-e POSTGRES_PASSWORD_FILE=/run/secrets/db_password \
postgres
# List secrets
docker secret ls
# Inspect secret (only metadata, value not shown)
docker secret inspect db_password
# Remove secret
docker secret rm db_password
Swarm stacks use a docker-compose.yml file (version 3+) to define multi-service applications. The syntax is very similar to regular Compose, with additional `deploy` options for Swarm-specific features.
# docker-compose.yml for Swarm
version: '3.8'
services:
web:
image: nginx:alpine
ports:
- "80:80"
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 10s
order: start-first
restart_policy:
condition: on-failure
networks:
- webnet
redis:
image: redis:alpine
deploy:
placement:
constraints:
- node.role == manager
networks:
- webnet
networks:
webnet:
driver: overlay
# Deploy the stack
docker stack deploy -c docker-compose.yml myapp
# List stacks
docker stack ls
# List services in stack
docker stack services myapp
# List tasks in stack
docker stack ps myapp
# Remove stack
docker stack rm myapp
- Run an odd number of manager nodes (3 or 5) - Provides high availability without compromising Raft consensus.
- Use dedicated manager nodes - Don't run workloads on manager nodes in production to ensure cluster stability.
- Secure the swarm - Use TLS certificates (default), rotate join tokens regularly, and use secrets for sensitive data.
- Set resource limits - Prevent services from consuming all host resources.
- Use health checks - Swarm will restart unhealthy tasks automatically.
- Prefer rolling updates - Use `start-first` order and appropriate parallelism for zero downtime.
- Monitor the cluster - Watch manager health, node status, and service metrics.
- Back up the Raft log - Backup `/var/lib/docker/swarm` on manager nodes for disaster recovery.
Docker Swarm Mode transforms multiple Docker hosts into a single, powerful cluster. It's the perfect stepping stone from single-host Docker to production-grade container orchestration.