Dockerfile Basics

Updated May 2026 14 min read Beginner to Intermediate

A Dockerfile is a text document that contains all the commands to build a Docker image. This guide covers every essential instruction: FROM, RUN, COPY, ADD, CMD, ENTRYPOINT, ENV, ARG, WORKDIR, EXPOSE, and VOLUME, with practical examples and best practices.

FROM RUN COPY ADD CMD ENTRYPOINT

What is a Dockerfile?

A Dockerfile is a text file that contains a series of instructions for building a Docker image. Each instruction creates a layer in the image. When you run docker build, Docker reads the Dockerfile and executes the instructions in order, producing a reusable image. Dockerfiles are the foundation of reproducible container builds—they allow you to version, share, and automate your infrastructure as code.

Dockerfile instructions are executed in order, from top to bottom. Each instruction creates a new layer that is cached. If an instruction hasn't changed, Docker reuses the cached layer, making subsequent builds much faster. This is why ordering instructions from least frequently changed to most frequently changed is a key optimization technique.

                # Example Dockerfile for a Node.js application
FROM node:18-alpine

WORKDIR /app

COPY package*.json ./
RUN npm ci --only=production

COPY . .

EXPOSE 3000

CMD ["node", "server.js"]

Dockerfile Instructions Reference

Instruction	Purpose	Example
`FROM`	Sets the base image	`FROM node:18-alpine`
`RUN`	Executes commands during build	`RUN apt-get update && apt-get install -y curl`
`COPY`	Copies files from host to image	`COPY . /app`
`ADD`	Advanced copy with URL/tar support	`ADD https://example.com/file.tar.gz /tmp/`
`CMD`	Default command for container	`CMD ["npm", "start"]`
`ENTRYPOINT`	Main command wrapper	`ENTRYPOINT ["docker-entrypoint.sh"]`
`WORKDIR`	Sets working directory	`WORKDIR /app`
`ENV`	Sets environment variables	`ENV NODE_ENV=production`
`ARG`	Build-time variables	`ARG VERSION=latest`
`EXPOSE`	Documents container ports	`EXPOSE 8080`
`VOLUME`	Creates mount point for volumes	`VOLUME /data`
`USER`	Sets user for RUN, CMD, ENTRYPOINT	`USER node`
`LABEL`	Adds metadata	`LABEL version="1.0"`
`HEALTHCHECK`	Checks container health	`HEALTHCHECK CMD curl -f http://localhost/ \|\| exit 1`

FROM: Setting the Base Image

FROM is the first instruction in almost every Dockerfile. It specifies the base image to build upon. All subsequent instructions run in the context of this base image. You can use any image from Docker Hub or a private registry.

Choose a minimal base image for smaller, more secure images. Alpine variants are popular for their small size (5MB vs 70MB for Ubuntu). For Node.js, use node:18-alpine; for Python, use python:3.11-slim. Multi-stage builds can use multiple FROM statements—the final image is built from the last FROM.

                # Basic FROM instructions
FROM ubuntu:22.04
FROM node:18-alpine
FROM python:3.11-slim
FROM nginx:alpine

# Multi-stage build example
FROM node:18 AS builder
# ... build steps ...

FROM nginx:alpine
COPY --from=builder /app/build /usr/share/nginx/html
            

RUN: Executing Commands During Build

RUN executes commands in a new layer on top of the current image and commits the results. It's used to install packages, create directories, or perform any setup needed for your application. Each RUN instruction creates a new layer, which affects image size.

Best practice: combine multiple commands into a single RUN using && to reduce the number of layers. Also, clean up temporary files in the same RUN command to avoid persisting them. For apt-get, always combine apt-get update and apt-get install in the same RUN to avoid cache issues.

                # Bad: Multiple layers, no cleanup
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y git
RUN rm -rf /var/lib/apt/lists/*

# Good: Single layer with cleanup
RUN apt-get update && \
    apt-get install -y curl git && \
    rm -rf /var/lib/apt/lists/*

# Installing npm packages
RUN npm ci --only=production

# Creating a directory
RUN mkdir -p /app/data
            

COPY vs ADD: Copying Files

COPY copies files and directories from the build context (the directory where you run docker build) into the image. It's straightforward and predictable—it just copies files.

ADD does everything COPY does, plus two additional features: it can copy from URLs (downloads the file), and it automatically extracts tar archives (including gzip, bzip2, xz). Because of these additional behaviors, ADD can be unpredictable. The official Docker best practice recommends using COPY unless you specifically need ADD's features.

                # COPY - simple file copying
COPY package.json /app/
COPY . /app
COPY --chown=node:node . /app

# ADD - advanced features
ADD https://example.com/file.tar.gz /tmp/     # Downloads from URL
ADD app.tar.gz /app/                          # Auto-extracts tar
ADD --chown=node:node . /app

# Best practice: Use COPY for local files
COPY package*.json ./
COPY src/ ./src/

# Use ADD only when you need URL or extraction
ADD --chmod=755 https://example.com/script.sh /usr/local/bin/
            

Unless you need automatic tar extraction or URL download, use COPY. It's more transparent and less prone to unexpected behavior.

CMD and ENTRYPOINT: Defining Container Behavior

CMD provides defaults for an executing container. It can be overridden by command-line arguments. There can only be one CMD instruction per Dockerfile—if multiple, only the last takes effect.

ENTRYPOINT defines the executable that runs when the container starts. It's harder to override (requires --entrypoint flag). ENTRYPOINT is often used with CMD to provide a default command that can be extended. Together, they create a flexible and secure container interface.

                # ENTRYPOINT + CMD pattern
ENTRYPOINT ["npm"]
CMD ["start"]
# Override: docker run myapp install (runs npm install)

# Shell form vs Exec form
# Exec form (preferred - no shell processing)
CMD ["node", "app.js"]
ENTRYPOINT ["docker-entrypoint.sh"]

# Shell form (uses /bin/sh -c)
CMD node app.js

# Common patterns
ENTRYPOINT ["python", "app.py"]
CMD ["--help"]  # docker run myapp --help runs python app.py --help
            

Use the exec form (JSON array syntax) for CMD and ENTRYPOINT. The shell form uses /bin/sh -c, which prevents signal handling and may cause containers to not stop gracefully.

WORKDIR, ENV, ARG: Environment & Directories

WORKDIR sets the working directory for any RUN, CMD, ENTRYPOINT, COPY, and ADD instructions that follow. If the directory doesn't exist, it's created automatically. Using WORKDIR is better than RUN cd because WORKDIR persists across instructions and makes paths relative.

ENV sets environment variables that persist in the final container. Use ENV for configuration that shouldn't change between builds (like application paths, default ports).

ARG defines build-time variables that only exist during the build. They're not available in the final container. ARG is useful for version numbers, cache busting, or conditional builds.

                # WORKDIR examples
WORKDIR /app
COPY . .                    # Copies to /app

WORKDIR /app/src
RUN make                    # Runs in /app/src

# ENV examples
ENV NODE_ENV=production
ENV PORT=3000
ENV PATH="/app/bin:${PATH}"

# ARG examples
ARG VERSION=1.0.0
ARG DEBIAN_FRONTEND=noninteractive
RUN echo "Building version ${VERSION}"

# Build with build-arg
# docker build --build-arg VERSION=2.0.0 -t myapp .
            

EXPOSE and VOLUME: Documentation & Data Persistence

EXPOSE informs Docker that the container listens on the specified network ports at runtime. It's documentation—it doesn't actually publish the port. You still need -p or -P to make ports accessible. Use EXPOSE to communicate to users which ports your application uses.

VOLUME creates a mount point for external storage. Any data written to a volume persists even after the container is deleted. Use VOLUME for database storage, logs, or any data that should survive container removal.

                # EXPOSE examples (documentation only)
EXPOSE 80
EXPOSE 8080/tcp
EXPOSE 53/udp

# VOLUME examples
VOLUME /data
VOLUME ["/var/log", "/var/db"]

# Complete example
FROM postgres:15
EXPOSE 5432
VOLUME /var/lib/postgresql/data
ENV POSTGRES_DB=mydb
CMD ["postgres"]
            

Complete Dockerfile Example: Node.js Application

                # Stage 1: Build
FROM node:18-alpine AS builder

WORKDIR /build

# Copy package files
COPY package*.json ./
RUN npm ci

# Copy source and build
COPY . .
RUN npm run build

# Stage 2: Production
FROM node:18-alpine

# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

WORKDIR /app

# Copy built artifacts from builder
COPY --from=builder --chown=nodejs:nodejs /build/package*.json ./
COPY --from=builder --chown=nodejs:nodejs /build/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /build/dist ./dist

# Environment
ENV NODE_ENV=production
ENV PORT=3000

# Documentation
EXPOSE 3000
USER nodejs

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD node -e "require('http').get('http://localhost:3000/health', (r) => {process.exit(r.statusCode === 200 ? 0 : 1)})"

# Start application
CMD ["node", "dist/server.js"]
            

This example demonstrates multi-stage builds (smaller final image), non-root users (better security), health checks, and proper layer ordering for cache optimization.

Dockerfile Best Practices

Order layers from least to most frequently changing. Put infrequently changed instructions (FROM, ENV, WORKDIR) first, and frequently changed instructions (COPY source code) last to maximize cache reuse.
Use specific base image tags, not latest. FROM node:18-alpine is reproducible; FROM node:latest can change unexpectedly.
Combine RUN commands to reduce the number of layers. Use && and clean up in the same layer.
Use .dockerignore to exclude unnecessary files (node_modules, .git, .env) from the build context, speeding up builds.
Run as non-root user for better security. Create a user and switch to it before the CMD instruction.
Use multi-stage builds to keep final images small. Build tools and intermediate artifacts don't need to be in production images.
Prefer COPY over ADD unless you need URL download or tar extraction.
Use exec form for CMD and ENTRYPOINT to ensure proper signal handling.

Frequently Asked Questions

What's the difference between CMD and ENTRYPOINT?

CMD sets default command that can be overridden; ENTRYPOINT sets a command that is harder to override. They're often used together: ENTRYPOINT for the executable, CMD for default arguments. Example: ENTRYPOINT ["nginx"]; CMD ["-g", "daemon off;"].

Why does Docker reinstall packages even when my package.json hasn't changed?

Because a previous instruction (like COPY .) invalidates the cache. Order matters: copy package.json FIRST, then run npm install, THEN copy the rest of the code. This way, npm install only runs when package.json changes.

What is a .dockerignore file?

Similar to .gitignore, .dockerignore excludes files from the build context. Use it to exclude node_modules, .git, logs, temporary files, and secrets from being sent to the Docker daemon during build.

How do I pass build-time variables to Dockerfile?

Use ARG in the Dockerfile and pass values with --build-arg: docker build --build-arg VERSION=1.0 -t myapp . ARG values are not persisted in the final image.

What's the difference between shell form and exec form?

Exec form (JSON array) runs the command directly without a shell. Shell form uses /bin/sh -c. Exec form is preferred because it handles signals correctly and has no shell processing. Use exec form for CMD and ENTRYPOINT unless you need shell features.

How do I run multiple commands in a single RUN instruction?

Use && to chain commands. For readability, use backslashes for line continuation: RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*.

Can I have multiple FROM statements in one Dockerfile?

Yes! This is called multi-stage builds. Each FROM begins a new stage. You can copy artifacts from previous stages using COPY --from=stage_name. Only the last stage's artifacts are in the final image.

Why is my image so large even with alpine base?

Each RUN instruction creates a layer that persists. Combine commands, clean up temporary files (apt-get clean, rm -rf /tmp/*), and use multi-stage builds to exclude build tools from the final image.

Previous: Container Lifecycle Next: Dockerfile Best Practices

Mastering Dockerfiles is essential for creating reproducible, secure, and efficient container images. Start with a simple Dockerfile and gradually add optimizations as you learn.