Dockerfile Basics
A Dockerfile is a text document that contains all the commands to build a Docker image. This guide covers every essential instruction: FROM, RUN, COPY, ADD, CMD, ENTRYPOINT, ENV, ARG, WORKDIR, EXPOSE, and VOLUME, with practical examples and best practices.
A Dockerfile is a text file that contains a series of instructions for building a Docker image. Each instruction creates a layer in the image. When you run docker build, Docker reads the Dockerfile and executes the instructions in order, producing a reusable image. Dockerfiles are the foundation of reproducible container builds—they allow you to version, share, and automate your infrastructure as code.
Dockerfile instructions are executed in order, from top to bottom. Each instruction creates a new layer that is cached. If an instruction hasn't changed, Docker reuses the cached layer, making subsequent builds much faster. This is why ordering instructions from least frequently changed to most frequently changed is a key optimization technique.
# Example Dockerfile for a Node.js application
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
EXPOSE 3000
CMD ["node", "server.js"]
| Instruction | Purpose | Example |
|---|---|---|
FROM | Sets the base image | FROM node:18-alpine |
RUN | Executes commands during build | RUN apt-get update && apt-get install -y curl |
COPY | Copies files from host to image | COPY . /app |
ADD | Advanced copy with URL/tar support | ADD https://example.com/file.tar.gz /tmp/ |
CMD | Default command for container | CMD ["npm", "start"] |
ENTRYPOINT | Main command wrapper | ENTRYPOINT ["docker-entrypoint.sh"] |
WORKDIR | Sets working directory | WORKDIR /app |
ENV | Sets environment variables | ENV NODE_ENV=production |
ARG | Build-time variables | ARG VERSION=latest |
EXPOSE | Documents container ports | EXPOSE 8080 |
VOLUME | Creates mount point for volumes | VOLUME /data |
USER | Sets user for RUN, CMD, ENTRYPOINT | USER node |
LABEL | Adds metadata | LABEL version="1.0" |
HEALTHCHECK | Checks container health | HEALTHCHECK CMD curl -f http://localhost/ || exit 1 |
FROM is the first instruction in almost every Dockerfile. It specifies the base image to build upon. All subsequent instructions run in the context of this base image. You can use any image from Docker Hub or a private registry.
Choose a minimal base image for smaller, more secure images. Alpine variants are popular for their small size (5MB vs 70MB for Ubuntu). For Node.js, use node:18-alpine; for Python, use python:3.11-slim. Multi-stage builds can use multiple FROM statements—the final image is built from the last FROM.
# Basic FROM instructions
FROM ubuntu:22.04
FROM node:18-alpine
FROM python:3.11-slim
FROM nginx:alpine
# Multi-stage build example
FROM node:18 AS builder
# ... build steps ...
FROM nginx:alpine
COPY --from=builder /app/build /usr/share/nginx/html
RUN executes commands in a new layer on top of the current image and commits the results. It's used to install packages, create directories, or perform any setup needed for your application. Each RUN instruction creates a new layer, which affects image size.
Best practice: combine multiple commands into a single RUN using && to reduce the number of layers. Also, clean up temporary files in the same RUN command to avoid persisting them. For apt-get, always combine apt-get update and apt-get install in the same RUN to avoid cache issues.
# Bad: Multiple layers, no cleanup
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y git
RUN rm -rf /var/lib/apt/lists/*
# Good: Single layer with cleanup
RUN apt-get update && \
apt-get install -y curl git && \
rm -rf /var/lib/apt/lists/*
# Installing npm packages
RUN npm ci --only=production
# Creating a directory
RUN mkdir -p /app/data
COPY copies files and directories from the build context (the directory where you run docker build) into the image. It's straightforward and predictable—it just copies files.
ADD does everything COPY does, plus two additional features: it can copy from URLs (downloads the file), and it automatically extracts tar archives (including gzip, bzip2, xz). Because of these additional behaviors, ADD can be unpredictable. The official Docker best practice recommends using COPY unless you specifically need ADD's features.
# COPY - simple file copying
COPY package.json /app/
COPY . /app
COPY --chown=node:node . /app
# ADD - advanced features
ADD https://example.com/file.tar.gz /tmp/ # Downloads from URL
ADD app.tar.gz /app/ # Auto-extracts tar
ADD --chown=node:node . /app
# Best practice: Use COPY for local files
COPY package*.json ./
COPY src/ ./src/
# Use ADD only when you need URL or extraction
ADD --chmod=755 https://example.com/script.sh /usr/local/bin/
CMD provides defaults for an executing container. It can be overridden by command-line arguments. There can only be one CMD instruction per Dockerfile—if multiple, only the last takes effect.
ENTRYPOINT defines the executable that runs when the container starts. It's harder to override (requires --entrypoint flag). ENTRYPOINT is often used with CMD to provide a default command that can be extended. Together, they create a flexible and secure container interface.
# ENTRYPOINT + CMD pattern
ENTRYPOINT ["npm"]
CMD ["start"]
# Override: docker run myapp install (runs npm install)
# Shell form vs Exec form
# Exec form (preferred - no shell processing)
CMD ["node", "app.js"]
ENTRYPOINT ["docker-entrypoint.sh"]
# Shell form (uses /bin/sh -c)
CMD node app.js
# Common patterns
ENTRYPOINT ["python", "app.py"]
CMD ["--help"] # docker run myapp --help runs python app.py --help
/bin/sh -c, which prevents signal handling and may cause containers to not stop gracefully.
WORKDIR sets the working directory for any RUN, CMD, ENTRYPOINT, COPY, and ADD instructions that follow. If the directory doesn't exist, it's created automatically. Using WORKDIR is better than RUN cd because WORKDIR persists across instructions and makes paths relative.
ENV sets environment variables that persist in the final container. Use ENV for configuration that shouldn't change between builds (like application paths, default ports).
ARG defines build-time variables that only exist during the build. They're not available in the final container. ARG is useful for version numbers, cache busting, or conditional builds.
# WORKDIR examples
WORKDIR /app
COPY . . # Copies to /app
WORKDIR /app/src
RUN make # Runs in /app/src
# ENV examples
ENV NODE_ENV=production
ENV PORT=3000
ENV PATH="/app/bin:${PATH}"
# ARG examples
ARG VERSION=1.0.0
ARG DEBIAN_FRONTEND=noninteractive
RUN echo "Building version ${VERSION}"
# Build with build-arg
# docker build --build-arg VERSION=2.0.0 -t myapp .
EXPOSE informs Docker that the container listens on the specified network ports at runtime. It's documentation—it doesn't actually publish the port. You still need -p or -P to make ports accessible. Use EXPOSE to communicate to users which ports your application uses.
VOLUME creates a mount point for external storage. Any data written to a volume persists even after the container is deleted. Use VOLUME for database storage, logs, or any data that should survive container removal.
# EXPOSE examples (documentation only)
EXPOSE 80
EXPOSE 8080/tcp
EXPOSE 53/udp
# VOLUME examples
VOLUME /data
VOLUME ["/var/log", "/var/db"]
# Complete example
FROM postgres:15
EXPOSE 5432
VOLUME /var/lib/postgresql/data
ENV POSTGRES_DB=mydb
CMD ["postgres"]
# Stage 1: Build
FROM node:18-alpine AS builder
WORKDIR /build
# Copy package files
COPY package*.json ./
RUN npm ci
# Copy source and build
COPY . .
RUN npm run build
# Stage 2: Production
FROM node:18-alpine
# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
WORKDIR /app
# Copy built artifacts from builder
COPY --from=builder --chown=nodejs:nodejs /build/package*.json ./
COPY --from=builder --chown=nodejs:nodejs /build/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /build/dist ./dist
# Environment
ENV NODE_ENV=production
ENV PORT=3000
# Documentation
EXPOSE 3000
USER nodejs
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD node -e "require('http').get('http://localhost:3000/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"
# Start application
CMD ["node", "dist/server.js"]
- Order layers from least to most frequently changing. Put infrequently changed instructions (FROM, ENV, WORKDIR) first, and frequently changed instructions (COPY source code) last to maximize cache reuse.
- Use specific base image tags, not latest.
FROM node:18-alpineis reproducible;FROM node:latestcan change unexpectedly. - Combine RUN commands to reduce the number of layers. Use
&&and clean up in the same layer. - Use .dockerignore to exclude unnecessary files (node_modules, .git, .env) from the build context, speeding up builds.
- Run as non-root user for better security. Create a user and switch to it before the CMD instruction.
- Use multi-stage builds to keep final images small. Build tools and intermediate artifacts don't need to be in production images.
- Prefer COPY over ADD unless you need URL download or tar extraction.
- Use exec form for CMD and ENTRYPOINT to ensure proper signal handling.
ENTRYPOINT ["nginx"]; CMD ["-g", "daemon off;"].--build-arg: docker build --build-arg VERSION=1.0 -t myapp . ARG values are not persisted in the final image./bin/sh -c. Exec form is preferred because it handles signals correctly and has no shell processing. Use exec form for CMD and ENTRYPOINT unless you need shell features.&& to chain commands. For readability, use backslashes for line continuation: RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*.COPY --from=stage_name. Only the last stage's artifacts are in the final image.apt-get clean, rm -rf /tmp/*), and use multi-stage builds to exclude build tools from the final image.Mastering Dockerfiles is essential for creating reproducible, secure, and efficient container images. Start with a simple Dockerfile and gradually add optimizations as you learn.