Swarm Services & Tasks

Updated June 2026 15 min read Advanced

Services and tasks are the core concepts for running applications in Docker Swarm. This guide covers replicated vs global services, scaling, rolling updates, health checks, service discovery, and task management.

Replicated Services Global Services Rolling Updates Health Checks

What are Services and Tasks?

In Docker Swarm, a service is the definition of how you want your application to run. It specifies the container image, number of replicas, ports, networks, and update policies. Think of a service as the "desired state" declaration.

A task is a single running container that belongs to a service. Swarm creates and manages tasks to meet the desired state defined in the service. For a service with 3 replicas, Swarm creates 3 tasks (containers) across the cluster. Tasks are scheduled on worker nodes and are automatically restarted if they fail.


    Service: web (desired replicas: 3)
    ┌─────────────────────────────────────────┐
    │  Task 1 (Node 1)  │  Task 2 (Node 2)   │  Task 3 (Node 3)
    │  Container: web    │  Container: web    │  Container: web
    │  Status: Running   │  Status: Running   │  Status: Running
    └─────────────────────────────────────────┘

Swarm constantly monitors your service's desired state. If a task crashes, Swarm automatically creates a new one to maintain the replica count. This is called self-healing.

Creating Services: The Basics

                # Simple service with 1 replica
docker service create --name web nginx

# Service with 3 replicas and published port
docker service create --name web --replicas 3 -p 80:80 nginx

# Service with custom command
docker service create --name worker --replicas 5 alpine ping 8.8.8.8

# Service with environment variables
docker service create --name api \
  --replicas 2 \
  -e NODE_ENV=production \
  -e DATABASE_URL=postgres://db:5432 \
  myapi:latest

# Service with resource limits
docker service create --name app \
  --replicas 3 \
  --limit-cpu 0.5 \
  --limit-memory 512M \
  --reserve-cpu 0.25 \
  --reserve-memory 256M \
  myapp:latest
            

Service Types: Replicated vs Global

Feature	Replicated Service	Global Service
Purpose	Scale application horizontally	Run on every node
Number of tasks	Fixed number (replicas)	One per node (auto-scales with cluster)
Use cases	Web servers, APIs, databases	Monitoring agents, log collectors, network plugins
Scaling	Manual or automatic (external)	Automatic when nodes added/removed
Placement constraints	Optional	Constraints control which nodes run the service

                # Replicated service (default) - 5 instances
docker service create --name web --replicas 5 nginx

# Global service - one instance on every node
docker service create --name node-exporter --mode global prom/node-exporter

# Global service with constraints (only on manager nodes)
docker service create --name manager-agent \
  --mode global \
  --constraint node.role==manager \
  myagent:latest
            

Scaling Services Up and Down

                # Scale to 10 replicas
docker service scale web=10

# Scale down to 3 replicas
docker service scale web=3

# Update replicas during service update
docker service update --replicas 8 web

# Get current replica count
docker service ls
docker service inspect web --format '{{.Spec.Mode.Replicated.Replicas}}'

# Scale with rolling update (scaling doesn't cause downtime)
docker service update --replicas 15 --update-parallelism 2 web
            

Scaling is instant and doesn't require container restarts. Swarm starts new tasks and distributes them across available nodes. The load balancer automatically includes new tasks.

Rolling Updates: Zero-Downtime Deployments

Rolling updates allow you to update a service without downtime. Swarm replaces tasks one by one (or in batches), ensuring the service remains available throughout the update. You can configure parallelism, delay between batches, and failure handling.

                # Basic rolling update
docker service update --image nginx:alpine web

# Advanced rolling update settings
docker service update \
  --image nginx:1.25 \
  --update-parallelism 2 \
  --update-delay 10s \
  --update-order start-first \
  --update-failure-action rollback \
  web

# Update with health check integration
docker service update \
  --image myapp:2.0 \
  --update-parallelism 1 \
  --update-order start-first \
  --update-monitor 30s \
  --update-max-failure-ratio 0.3 \
  myapp

# Monitor update progress
docker service ps web --filter "desired-state=running"
docker service inspect web --pretty | grep -A 10 Update
            

To achieve zero downtime, use `--update-order start-first`. This starts new tasks before stopping old ones. Also ensure your service has health checks configured so Swarm knows when a task is ready.

Rolling Back Failed Updates

Swarm supports automatic and manual rollbacks. You can configure rollback parameters during service creation or update. If an update fails, Swarm can automatically revert to the previous version.

                # Manual rollback to previous version
docker service rollback web

# Configure automatic rollback on failure
docker service create \
  --name web \
  --replicas 5 \
  --update-parallelism 2 \
  --update-failure-action rollback \
  --rollback-parallelism 1 \
  --rollback-monitor 20s \
  nginx:1.20

# View rollback history
docker service ps web --filter "event=rollback"

# Force rollback to specific image version
docker service update --image nginx:1.19 --force web
            

Health Checks: Keeping Services Healthy

Health checks tell Swarm whether a container is functioning correctly. Swarm uses health check status to decide whether to replace unhealthy tasks and whether rolling updates are successful. A proper health check is essential for production services.

                # Service with health check
docker service create --name web \
  --replicas 3 \
  --health-cmd "curl -f http://localhost/ || exit 1" \
  --health-interval 30s \
  --health-timeout 10s \
  --health-retries 3 \
  --health-start-period 30s \
  nginx

# In docker-compose.yml for Swarm
version: '3.8'
services:
  web:
    image: nginx
    deploy:
      replicas: 3
      update_config:
        order: start-first
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s
            

Health checks are critical for rolling updates. Without them, Swarm doesn't know if new tasks are healthy before proceeding. Always add health checks to production services.

Service Discovery and DNS

Swarm provides built-in service discovery through DNS. Every service gets a DNS name (the service name), and Swarm's internal load balancer resolves that name to the IP addresses of healthy tasks. Services can communicate by name without knowing the underlying infrastructure.

                # Create two services on the same network
docker network create -d overlay appnet
docker service create --name api --network appnet --replicas 3 myapi
docker service create --name web --network appnet -p 80:80 nginx

# From within the web service, you can access api by name
# curl http://api:3000/data

# Check service discovery
docker service inspect web --pretty | grep -A 5 Endpoint

# List DNS records for a service
docker service ps api

# Test DNS resolution from within a task
docker service exec web nslookup api
            

Placement Constraints: Controlling Where Tasks Run

Placement constraints let you control which nodes can run specific services. This is useful for hardware-specific workloads (GPUs, SSDs), compliance (data locality), or isolating critical services.

                # Run only on nodes with SSD label
docker service create --name db \
  --constraint node.labels.storage==ssd \
  --replicas 2 \
  postgres

# Run only on manager nodes
docker service create --name manager-app \
  --constraint node.role==manager \
  --replicas 1 \
  myapp

# Run only on nodes without role manager (workers only)
docker service create --name worker-app \
  --constraint node.role!=manager \
  --replicas 10 \
  worker

# Combine multiple constraints
docker service create --name gpu-app \
  --constraint node.labels.gpu==nvidia \
  --constraint node.labels.region==us-east \
  --replicas 2 \
  tensorflow:latest

# Add a label to a node
docker node update --label-add gpu=nvidia node1
            

Service Management Commands Reference

                # List services
docker service ls

# List tasks for a service
docker service ps web
docker service ps web --filter "desired-state=running"
docker service ps web --format "table {{.ID}}\t{{.Node}}\t{{.Status}}"

# Inspect a service
docker service inspect web
docker service inspect web --pretty

# View service logs
docker service logs web
docker service logs -f web --tail 50
docker service logs --since 1h web

# Remove a service
docker service rm web

# Update service configuration
docker service update --env-add NODE_ENV=production web
docker service update --publish-rm 8080 --publish-add 80:80 web
docker service update --reserve-cpu 0.5 --limit-cpu 1 web

# Pause/resume service (stop scheduling new tasks)
docker service scale web=0   # Stop all tasks
docker service scale web=5   # Resume
            

Service Best Practices

Always use health checks - Critical for rolling updates and self-healing.
Set resource limits - Prevent services from consuming all host resources.
Use rolling updates with start-first order - Achieve zero-downtime deployments.
Pin image versions - Never use `:latest` in production. Use specific tags or digests.
Use placement constraints for special hardware - Ensure GPU workloads only run on GPU nodes.
Monitor service health - Use `docker service ps` and external monitoring tools.
Use secrets for sensitive data - Never pass passwords via environment variables in clear text.
Set update failure action to rollback - Automatic rollback reduces downtime when updates fail.
Test rolling updates in staging first - Verify update behavior before production.

Real-World Example: Complete Stack with Health Checks and Rolling Updates

                # docker-compose.yml for Swarm deployment
version: '3.8'

services:
  web:
    image: nginx:1.25-alpine
    ports:
      - "80:80"
    deploy:
      replicas: 5
      update_config:
        parallelism: 2
        delay: 10s
        order: start-first
        failure_action: rollback
      rollback_config:
        parallelism: 1
        order: stop-first
      restart_policy:
        condition: on-failure
        max_attempts: 3
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/health"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
    networks:
      - webnet

  api:
    image: myapi:2.1.0
    deploy:
      replicas: 3
      placement:
        constraints:
          - node.labels.region == primary
      update_config:
        parallelism: 1
        order: start-first
    healthcheck:
      test: ["CMD", "node", "health.js"]
      interval: 30s
    networks:
      - webnet
    secrets:
      - api_key

  redis:
    image: redis:7-alpine
    deploy:
      mode: global
    volumes:
      - redis_data:/data
    networks:
      - webnet

networks:
  webnet:
    driver: overlay

volumes:
  redis_data:

secrets:
  api_key:
    external: true

# Deploy: docker stack deploy -c docker-compose.yml myapp
            

Frequently Asked Questions

What's the difference between a service and a task?

A service is the desired state definition (image, replicas, ports). A task is a running container that Swarm creates to fulfill that desired state. If a task fails, Swarm creates a new one.

How many replicas should I run?

For production, at least 3 replicas for high availability (can tolerate 2 failures). For development, 1-2 replicas is fine. More replicas provide better load distribution and fault tolerance.

Can I update a service without downtime?

Yes! Use rolling updates with `--update-order start-first`. New tasks start before old tasks stop, ensuring continuous availability. Health checks ensure new tasks are ready before continuing.

What happens if a task fails?

Swarm automatically restarts failed tasks based on the restart policy (default: restart on failure). If a node fails, Swarm reschedules tasks on healthy nodes.

How do I scale services automatically?

Swarm doesn't have built-in auto-scaling. You need external tools (like Docker Swarm autoscaler) or cloud provider integrations to scale based on CPU/memory metrics.

Can I run a service on specific nodes only?

Yes! Use placement constraints with `--constraint` flag. You can constrain by node role, node labels, or engine labels. Add custom labels to nodes with `docker node update --label-add`.

What's the difference between `docker service update` and `docker service scale`?

`scale` is a shorthand for changing only the replica count. `update` can change replica count plus image, environment variables, ports, and other service parameters. Use `scale` for simplicity when only changing replica count.

How do I debug a service that won't start?

Check `docker service ps --no-trunc` to see detailed error messages. Check `docker service logs ` for container logs. Also check node resources and placement constraints.

Previous: Docker Swarm Basics Next: Swarm Networking

Services and tasks are the foundation of Swarm orchestration. Master these concepts to deploy scalable, resilient applications across your cluster.