Dockerfile Basics

A Dockerfile is a text document that contains all the commands to build a Docker image. This guide covers every essential instruction: FROM, RUN, COPY, ADD, CMD, ENTRYPOINT, ENV, ARG, WORKDIR, EXPOSE, and VOLUME, with practical examples and best practices.

FROM RUN COPY ADD CMD ENTRYPOINT
What is a Dockerfile?

A Dockerfile is a text file that contains a series of instructions for building a Docker image. Each instruction creates a layer in the image. When you run docker build, Docker reads the Dockerfile and executes the instructions in order, producing a reusable image. Dockerfiles are the foundation of reproducible container builds—they allow you to version, share, and automate your infrastructure as code.

Dockerfile instructions are executed in order, from top to bottom. Each instruction creates a new layer that is cached. If an instruction hasn't changed, Docker reuses the cached layer, making subsequent builds much faster. This is why ordering instructions from least frequently changed to most frequently changed is a key optimization technique.

# Example Dockerfile for a Node.js application FROM node:18-alpine WORKDIR /app COPY package*.json ./ RUN npm ci --only=production COPY . . EXPOSE 3000 CMD ["node", "server.js"]
Dockerfile Instructions Reference
InstructionPurposeExample
FROMSets the base imageFROM node:18-alpine
RUNExecutes commands during buildRUN apt-get update && apt-get install -y curl
COPYCopies files from host to imageCOPY . /app
ADDAdvanced copy with URL/tar supportADD https://example.com/file.tar.gz /tmp/
CMDDefault command for containerCMD ["npm", "start"]
ENTRYPOINTMain command wrapperENTRYPOINT ["docker-entrypoint.sh"]
WORKDIRSets working directoryWORKDIR /app
ENVSets environment variablesENV NODE_ENV=production
ARGBuild-time variablesARG VERSION=latest
EXPOSEDocuments container portsEXPOSE 8080
VOLUMECreates mount point for volumesVOLUME /data
USERSets user for RUN, CMD, ENTRYPOINTUSER node
LABELAdds metadataLABEL version="1.0"
HEALTHCHECKChecks container healthHEALTHCHECK CMD curl -f http://localhost/ || exit 1
FROM: Setting the Base Image

FROM is the first instruction in almost every Dockerfile. It specifies the base image to build upon. All subsequent instructions run in the context of this base image. You can use any image from Docker Hub or a private registry.

Choose a minimal base image for smaller, more secure images. Alpine variants are popular for their small size (5MB vs 70MB for Ubuntu). For Node.js, use node:18-alpine; for Python, use python:3.11-slim. Multi-stage builds can use multiple FROM statements—the final image is built from the last FROM.

# Basic FROM instructions FROM ubuntu:22.04 FROM node:18-alpine FROM python:3.11-slim FROM nginx:alpine # Multi-stage build example FROM node:18 AS builder # ... build steps ... FROM nginx:alpine COPY --from=builder /app/build /usr/share/nginx/html
RUN: Executing Commands During Build

RUN executes commands in a new layer on top of the current image and commits the results. It's used to install packages, create directories, or perform any setup needed for your application. Each RUN instruction creates a new layer, which affects image size.

Best practice: combine multiple commands into a single RUN using && to reduce the number of layers. Also, clean up temporary files in the same RUN command to avoid persisting them. For apt-get, always combine apt-get update and apt-get install in the same RUN to avoid cache issues.

# Bad: Multiple layers, no cleanup RUN apt-get update RUN apt-get install -y curl RUN apt-get install -y git RUN rm -rf /var/lib/apt/lists/* # Good: Single layer with cleanup RUN apt-get update && \ apt-get install -y curl git && \ rm -rf /var/lib/apt/lists/* # Installing npm packages RUN npm ci --only=production # Creating a directory RUN mkdir -p /app/data
COPY vs ADD: Copying Files

COPY copies files and directories from the build context (the directory where you run docker build) into the image. It's straightforward and predictable—it just copies files.

ADD does everything COPY does, plus two additional features: it can copy from URLs (downloads the file), and it automatically extracts tar archives (including gzip, bzip2, xz). Because of these additional behaviors, ADD can be unpredictable. The official Docker best practice recommends using COPY unless you specifically need ADD's features.

# COPY - simple file copying COPY package.json /app/ COPY . /app COPY --chown=node:node . /app # ADD - advanced features ADD https://example.com/file.tar.gz /tmp/ # Downloads from URL ADD app.tar.gz /app/ # Auto-extracts tar ADD --chown=node:node . /app # Best practice: Use COPY for local files COPY package*.json ./ COPY src/ ./src/ # Use ADD only when you need URL or extraction ADD --chmod=755 https://example.com/script.sh /usr/local/bin/
Unless you need automatic tar extraction or URL download, use COPY. It's more transparent and less prone to unexpected behavior.
CMD and ENTRYPOINT: Defining Container Behavior

CMD provides defaults for an executing container. It can be overridden by command-line arguments. There can only be one CMD instruction per Dockerfile—if multiple, only the last takes effect.

ENTRYPOINT defines the executable that runs when the container starts. It's harder to override (requires --entrypoint flag). ENTRYPOINT is often used with CMD to provide a default command that can be extended. Together, they create a flexible and secure container interface.

# ENTRYPOINT + CMD pattern ENTRYPOINT ["npm"] CMD ["start"] # Override: docker run myapp install (runs npm install) # Shell form vs Exec form # Exec form (preferred - no shell processing) CMD ["node", "app.js"] ENTRYPOINT ["docker-entrypoint.sh"] # Shell form (uses /bin/sh -c) CMD node app.js # Common patterns ENTRYPOINT ["python", "app.py"] CMD ["--help"] # docker run myapp --help runs python app.py --help
Use the exec form (JSON array syntax) for CMD and ENTRYPOINT. The shell form uses /bin/sh -c, which prevents signal handling and may cause containers to not stop gracefully.
WORKDIR, ENV, ARG: Environment & Directories

WORKDIR sets the working directory for any RUN, CMD, ENTRYPOINT, COPY, and ADD instructions that follow. If the directory doesn't exist, it's created automatically. Using WORKDIR is better than RUN cd because WORKDIR persists across instructions and makes paths relative.

ENV sets environment variables that persist in the final container. Use ENV for configuration that shouldn't change between builds (like application paths, default ports).

ARG defines build-time variables that only exist during the build. They're not available in the final container. ARG is useful for version numbers, cache busting, or conditional builds.

# WORKDIR examples WORKDIR /app COPY . . # Copies to /app WORKDIR /app/src RUN make # Runs in /app/src # ENV examples ENV NODE_ENV=production ENV PORT=3000 ENV PATH="/app/bin:${PATH}" # ARG examples ARG VERSION=1.0.0 ARG DEBIAN_FRONTEND=noninteractive RUN echo "Building version ${VERSION}" # Build with build-arg # docker build --build-arg VERSION=2.0.0 -t myapp .
EXPOSE and VOLUME: Documentation & Data Persistence

EXPOSE informs Docker that the container listens on the specified network ports at runtime. It's documentation—it doesn't actually publish the port. You still need -p or -P to make ports accessible. Use EXPOSE to communicate to users which ports your application uses.

VOLUME creates a mount point for external storage. Any data written to a volume persists even after the container is deleted. Use VOLUME for database storage, logs, or any data that should survive container removal.

# EXPOSE examples (documentation only) EXPOSE 80 EXPOSE 8080/tcp EXPOSE 53/udp # VOLUME examples VOLUME /data VOLUME ["/var/log", "/var/db"] # Complete example FROM postgres:15 EXPOSE 5432 VOLUME /var/lib/postgresql/data ENV POSTGRES_DB=mydb CMD ["postgres"]
Complete Dockerfile Example: Node.js Application
# Stage 1: Build FROM node:18-alpine AS builder WORKDIR /build # Copy package files COPY package*.json ./ RUN npm ci # Copy source and build COPY . . RUN npm run build # Stage 2: Production FROM node:18-alpine # Create non-root user RUN addgroup -g 1001 -S nodejs && \ adduser -S nodejs -u 1001 WORKDIR /app # Copy built artifacts from builder COPY --from=builder --chown=nodejs:nodejs /build/package*.json ./ COPY --from=builder --chown=nodejs:nodejs /build/node_modules ./node_modules COPY --from=builder --chown=nodejs:nodejs /build/dist ./dist # Environment ENV NODE_ENV=production ENV PORT=3000 # Documentation EXPOSE 3000 USER nodejs # Health check HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \ CMD node -e "require('http').get('http://localhost:3000/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})" # Start application CMD ["node", "dist/server.js"]
This example demonstrates multi-stage builds (smaller final image), non-root users (better security), health checks, and proper layer ordering for cache optimization.
Dockerfile Best Practices
  • Order layers from least to most frequently changing. Put infrequently changed instructions (FROM, ENV, WORKDIR) first, and frequently changed instructions (COPY source code) last to maximize cache reuse.
  • Use specific base image tags, not latest. FROM node:18-alpine is reproducible; FROM node:latest can change unexpectedly.
  • Combine RUN commands to reduce the number of layers. Use && and clean up in the same layer.
  • Use .dockerignore to exclude unnecessary files (node_modules, .git, .env) from the build context, speeding up builds.
  • Run as non-root user for better security. Create a user and switch to it before the CMD instruction.
  • Use multi-stage builds to keep final images small. Build tools and intermediate artifacts don't need to be in production images.
  • Prefer COPY over ADD unless you need URL download or tar extraction.
  • Use exec form for CMD and ENTRYPOINT to ensure proper signal handling.
Frequently Asked Questions
What's the difference between CMD and ENTRYPOINT?
CMD sets default command that can be overridden; ENTRYPOINT sets a command that is harder to override. They're often used together: ENTRYPOINT for the executable, CMD for default arguments. Example: ENTRYPOINT ["nginx"]; CMD ["-g", "daemon off;"].
Why does Docker reinstall packages even when my package.json hasn't changed?
Because a previous instruction (like COPY .) invalidates the cache. Order matters: copy package.json FIRST, then run npm install, THEN copy the rest of the code. This way, npm install only runs when package.json changes.
What is a .dockerignore file?
Similar to .gitignore, .dockerignore excludes files from the build context. Use it to exclude node_modules, .git, logs, temporary files, and secrets from being sent to the Docker daemon during build.
How do I pass build-time variables to Dockerfile?
Use ARG in the Dockerfile and pass values with --build-arg: docker build --build-arg VERSION=1.0 -t myapp . ARG values are not persisted in the final image.
What's the difference between shell form and exec form?
Exec form (JSON array) runs the command directly without a shell. Shell form uses /bin/sh -c. Exec form is preferred because it handles signals correctly and has no shell processing. Use exec form for CMD and ENTRYPOINT unless you need shell features.
How do I run multiple commands in a single RUN instruction?
Use && to chain commands. For readability, use backslashes for line continuation: RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*.
Can I have multiple FROM statements in one Dockerfile?
Yes! This is called multi-stage builds. Each FROM begins a new stage. You can copy artifacts from previous stages using COPY --from=stage_name. Only the last stage's artifacts are in the final image.
Why is my image so large even with alpine base?
Each RUN instruction creates a layer that persists. Combine commands, clean up temporary files (apt-get clean, rm -rf /tmp/*), and use multi-stage builds to exclude build tools from the final image.
Previous: Container Lifecycle Next: Dockerfile Best Practices

Mastering Dockerfiles is essential for creating reproducible, secure, and efficient container images. Start with a simple Dockerfile and gradually add optimizations as you learn.