Linux Namespaces & Cgroups: Complete Guide to Container Isolation

Master Linux container isolation technologies with namespaces and cgroups. Understand the fundamental building blocks of containers, from process isolation to resource limits, with practical examples and deep technical insights.

Container Isolation Architecture Host Linux Kernel Namespaces Cgroups v2 OverlayFS seccomp Capabilities PID 1: systemd PID 2: kthreadd PID 3: ... Namespace Isolation PID Namespace Isolated PIDs Network NS Private network Mount NS Private filesystem IPC Namespace Isolated IPC UTS Namespace Hostname isolation User NS User ID mapping Cgroup Resource Control CPU Controller cpu.max, cpu.weight Memory Controller memory.max, memory.swap IO Controller io.max, io.weight PID Controller pids.max Devices Controller device access Container Instance Container Processes PID 1: /sbin/init Container Network eth0: 172.17.0.2 Container FS OverlayFS layers Security: seccomp filters, capabilities, AppArmor/SELinux profiles Defense in depth: Namespaces + Cgroups + Security modules
Complete container isolation architecture showing namespaces and cgroups working together

Why Linux Namespaces & Cgroups?

Namespaces and cgroups are the fundamental Linux kernel technologies that enable modern containerization.

  • Isolation: Namespaces provide process isolation (PID, network, mount, etc.)
  • Resource Control: Cgroups limit and monitor resource usage (CPU, memory, I/O)
  • Security: Combined with capabilities and seccomp for defense in depth
  • Efficiency: Lightweight compared to full virtualization
  • Portability: Containers run consistently across different environments
  • Density: Run multiple isolated environments on a single host
  • Orchestration: Foundation for Kubernetes, Docker, and container orchestration

1. Linux Namespaces Deep Dive

🔒
PID Namespace
unshare --pid --fork
Process ID isolation. Each namespace has its own PID 1. Isolation Process
🌐
Network Namespace
ip netns add mynet
Network stack isolation. Private interfaces, routing, iptables. Isolation Network
📁
Mount Namespace
unshare --mount --propagation private
Filesystem mount isolation. Private mount points and rootfs. Isolation Filesystem
💬
IPC Namespace
unshare --ipc
Inter-process communication isolation (System V IPC, POSIX queues). Isolation
🏷️
UTS Namespace
unshare --uts
Hostname and NIS domain name isolation. Isolation
👤
User Namespace
unshare --user --map-root-user
User and group ID isolation with mapping. Enables rootless containers. Security Rootless

Namespace Types Comparison

Namespace Flag Isolates Introduced Key Use Case
PID CLONE_NEWPID Process IDs Linux 2.6.24 Container process trees
Network CLONE_NEWNET Network devices, ports, routing Linux 2.6.24 Container networking
Mount CLONE_NEWNS Mount points, filesystems Linux 2.4.19 Container root filesystem
IPC CLONE_NEWIPC System V IPC, POSIX queues Linux 2.6.19 Inter-process communication
UTS CLONE_NEWUTS Hostname and NIS domain Linux 2.6.19 Container hostname
User CLONE_NEWUSER User and group IDs Linux 3.8 Rootless containers, security
Cgroup CLONE_NEWCGROUP Cgroup root directory Linux 4.6 Cgroup namespace isolation
Time CLONE_NEWTIME Boot and monotonic clocks Linux 5.6 Container time offsets
Host Processes
Namespace Isolation
Container View

2. Working with Namespaces

Namespace API & System Calls

clone() - Create new process with new namespaces
int clone(int (*fn)(void *), void *stack, int flags, void *arg, ...);
// flags: CLONE_NEWPID | CLONE_NEWNET | CLONE_NEWNS | CLONE_NEWIPC | CLONE_NEWUTS | CLONE_NEWUSER
unshare() - Move calling process to new namespace
int unshare(int flags);
setns() - Join existing namespace
int setns(int fd, int nstype);
ioctl() - Get namespace information
ioctl(fd, NS_GET_USERNS); // Get parent user namespace
ioctl(fd, NS_GET_PARENT); // Get parent namespace
Namespace file descriptors in /proc/[pid]/ns/
ls -la /proc/self/ns/
lrwxrwxrwx 1 root root 0 Dec 9 10:00 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx 1 root root 0 Dec 9 10:00 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx 1 root root 0 Dec 9 10:00 mnt -> 'mnt:[4026531840]'
lrwxrwxrwx 1 root root 0 Dec 9 10:00 net -> 'net:[4026531992]'
lrwxrwxrwx 1 root root 0 Dec 9 10:00 pid -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 Dec 9 10:00 pid_for_children -> 'pid:[4026531836]'
lrwxrwxrwx 1 root root 0 Dec 9 10:00 user -> 'user:[4026531837]'
lrwxrwxrwx 1 root root 0 Dec 9 10:00 uts -> 'uts:[4026531838]'

Practical Namespace Examples

# Create and enter new PID namespace
sudo unshare --pid --fork --mount-proc /bin/bash
ps aux # Shows only processes in new namespace
echo $$ # PID 1 in this namespace
# Create network namespace
sudo ip netns add mynet
sudo ip netns list
sudo ip netns exec mynet ip addr show
sudo ip netns exec mynet ping 8.8.8.8 # No network connectivity yet
# Create veth pair for network namespace
sudo ip link add veth0 type veth peer name veth1
sudo ip link set veth1 netns mynet
sudo ip addr add 10.0.0.1/24 dev veth0
sudo ip link set veth0 up
sudo ip netns exec mynet ip addr add 10.0.0.2/24 dev veth1
sudo ip netns exec mynet ip link set veth1 up
sudo ip netns exec mynet ip route add default via 10.0.0.1
# Create mount namespace with private propagation
sudo unshare --mount --propagation private /bin/bash
mount -t tmpfs tmpfs /mnt # Private mount not visible to host
mount --make-rshared / # Change propagation type
# Create user namespace (rootless)
unshare --user --map-root-user /bin/bash
id # Shows root in namespace, regular user on host
cat /proc/self/uid_map # Shows ID mapping
# Create UTS namespace (hostname isolation)
sudo unshare --uts /bin/bash
hostname container1 # Change hostname only in namespace
hostname # Shows container1
# Create IPC namespace
sudo unshare --ipc /bin/bash
ipcs -a # Shows empty IPC facilities
# Create multiple namespaces at once
sudo unshare --pid --net --mount --ipc --uts --user --map-root-user --fork /bin/bash
# Inspect namespace IDs
ls -la /proc/$$/ns/ # Current process namespaces
readlink /proc/$$/ns/pid # Get PID namespace ID
sudo lsns -p $$ # List namespaces for process
sudo nsenter -t PID -n /bin/bash # Enter network namespace of process

C Program: Creating a Simple Container

// simple-container.c - Create a minimal container with namespaces

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sched.h>
#include <signal.h>
#include <sys/wait.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>

#define STACK_SIZE (1024 * 1024)
static char container_stack[STACK_SIZE];

// Container process function
int container_main(void* arg) {
    printf("Container [%d] - Inside container!\n", getpid());
    
    // Set hostname in UTS namespace
    sethostname("container", 9);
    
    // Remount /proc to get accurate PID view
    mount("proc", "/proc", "proc", 0, NULL);
    
    // Execute a shell
    system("/bin/bash");
    
    return 0;
}

int main(int argc, char *argv[]) {
    printf("Host [%d] - Starting container...\n", getpid());
    
    // Flags for namespace creation
    int flags = CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWNET | SIGCHLD;
    
    // Create child process with new namespaces
    pid_t pid = clone(container_main, container_stack + STACK_SIZE, flags, NULL);
    if (pid == -1) {
        perror("clone");
        exit(EXIT_FAILURE);
    }
    
    printf("Host [%d] - Container PID: %d\n", getpid(), pid);
    
    // Wait for container process to exit
    waitpid(pid, NULL, 0);
    
    printf("Host [%d] - Container exited\n", getpid());
    return 0;
}

// Compile: gcc -o simple-container simple-container.c
// Run: sudo ./simple-container
Namespace Isolation Layers Host System PID 1: systemd eth0: 192.168.1.100 Hostname: server1 UID 0: root Mount: /, /home, /var Namespace Layer PID Namespace Network Namespace UTS Namespace User Namespace Mount Namespace IPC Namespace Container View PID 1: /sbin/init eth0: 172.17.0.2 Hostname: myapp UID 0: root (mapped) Mount: / (container root) Isolation Principle: Each namespace type isolates specific system resources • Combined namespaces create full container isolation
Namespace isolation showing how different namespaces create container view

3. Cgroups (Control Groups) Mastery

Cgroups v2 Architecture

# Cgroups v2 is mounted at /sys/fs/cgroup (unified hierarchy)
mount -t cgroup2 cgroup2 /sys/fs/cgroup
ls -la /sys/fs/cgroup/ # View cgroup hierarchy
# Check cgroup version
grep cgroup /proc/filesystems
stat -fc %T /sys/fs/cgroup/ # Should show cgroup2fs
# Create a new cgroup
sudo mkdir /sys/fs/cgroup/myapp
ls -la /sys/fs/cgroup/myapp/ # Controller files appear
# Available controllers
cat /sys/fs/cgroup/cgroup.controllers
cat /sys/fs/cgroup/cgroup.subtree_control
# Enable controllers for subtree
echo "+cpu +memory +io +pids" > /sys/fs/cgroup/cgroup.subtree_control
echo "+cpu +memory" > /sys/fs/cgroup/myapp/cgroup.subtree_control
# Move process to cgroup
echo $$ > /sys/fs/cgroup/myapp/cgroup.procs
cat /sys/fs/cgroup/myapp/cgroup.procs # List processes in cgroup
# Set resource limits
# CPU: Limit to 50% of a core (50ms per 100ms period)
echo "50000 100000" > /sys/fs/cgroup/myapp/cpu.max
# Memory: Limit to 100MB
echo "100M" > /sys/fs/cgroup/myapp/memory.max
echo "50M" > /sys/fs/cgroup/myapp/memory.swap.max # Swap limit
# IO: Limit read/write bandwidth
echo "8:16 wbps=1048576" > /sys/fs/cgroup/myapp/io.max
# 8:16 is major:minor device number (check with lsblk -f)
# PID: Limit number of processes
echo "100" > /sys/fs/cgroup/myapp/pids.max
# Monitor cgroup usage
cat /sys/fs/cgroup/myapp/cpu.stat # CPU statistics
cat /sys/fs/cgroup/myapp/memory.current # Current memory usage
cat /sys/fs/cgroup/myapp/memory.events # Memory events (OOM kills)
cat /sys/fs/cgroup/myapp/io.stat # IO statistics
# Set CPU weight (for shares)
echo "100" > /sys/fs/cgroup/myapp/cpu.weight # Default is 100
# Set memory pressure notifications
echo "low" > /sys/fs/cgroup/myapp/memory.pressure
# Recursively delete cgroup (must be empty)
rmdir /sys/fs/cgroup/myapp

Cgroups v1 vs v2 Comparison

Feature Cgroups v1 Cgroups v2 Benefits
Hierarchy Multiple hierarchies Single unified hierarchy Simpler, consistent
Controllers Separate mount points All in one place Easier management
CPU cpu, cpuacct, cpuset Unified cpu controller Better CPU scheduling
Memory memory controller only Memory + swap accounting Better memory management
IO blkio controller io controller Unified IO control
Write Interface Various files Consistent .max files Predictable API
Pressure Not available Pressure stall information Better monitoring
Default Old default Default since kernel 4.5 Modern systems

Practical Cgroup Examples

cgroup-manager.sh - Comprehensive Cgroup Management
#!/bin/bash
# cgroup-manager.sh - Create and manage cgroups with resource limits

set -euo pipefail

CGROUP_ROOT="/sys/fs/cgroup"
CGROUP_NAME="mycontainer"
CGROUP_PATH="$CGROUP_ROOT/$CGROUP_NAME"

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

log() {
    local level="$1"
    local message="$2"
    
    case "$level" in
        "INFO") echo -e "${BLUE}[INFO]${NC} $message" ;;
        "SUCCESS") echo -e "${GREEN}[SUCCESS]${NC} $message" ;;
        "WARNING") echo -e "${YELLOW}[WARNING]${NC} $message" ;;
        "ERROR") echo -e "${RED}[ERROR]${NC} $message" ;;
    esac
}

check_cgroup_v2() {
    if ! mount | grep -q "cgroup2 on /sys/fs/cgroup"; then
        log "ERROR" "Cgroups v2 not mounted at /sys/fs/cgroup"
        log "INFO" "Try: sudo mount -t cgroup2 cgroup2 /sys/fs/cgroup"
        exit 1
    fi
    log "SUCCESS" "Cgroups v2 is mounted"
}

create_cgroup() {
    log "INFO" "Creating cgroup: $CGROUP_NAME"
    
    if [[ -d "$CGROUP_PATH" ]]; then
        log "WARNING" "Cgroup already exists, removing..."
        rmdir "$CGROUP_PATH" 2>/dev/null || true
    fi
    
    mkdir -p "$CGROUP_PATH"
    
    # Enable controllers for this cgroup
    echo "+cpu +memory +io +pids" > "$CGROUP_ROOT/cgroup.subtree_control"
    
    log "SUCCESS" "Created cgroup at $CGROUP_PATH"
}

set_resource_limits() {
    local cpu_limit="${1:-0.5}"    # 50% of a core
    local memory_limit="${2:-100M}" # 100MB memory
    local pid_limit="${3:-100}"    # 100 processes max
    
    log "INFO" "Setting resource limits:"
    log "INFO" "  CPU: ${cpu_limit} cores"
    log "INFO" "  Memory: $memory_limit"
    log "INFO" "  PIDs: $pid_limit"
    
    # CPU limit (format: $MAX $PERIOD)
    # $MAX microseconds per $PERIOD microseconds
    local cpu_period=100000  # 100ms in microseconds
    local cpu_max=$(echo "$cpu_limit * $cpu_period" | bc | cut -d. -f1)
    echo "$cpu_max $cpu_period" > "$CGROUP_PATH/cpu.max"
    
    # Memory limit
    echo "$memory_limit" > "$CGROUP_PATH/memory.max"
    echo "50M" > "$CGROUP_PATH/memory.swap.max"  # Limit swap usage
    
    # PID limit
    echo "$pid_limit" > "$CGROUP_PATH/pids.max"
    
    # IO limits (example: limit to 1MB/s write on sda)
    # Get device major:minor for sda
    if [[ -b /dev/sda ]]; then
        local device_info=$(lsblk -nd -o MAJ:MIN /dev/sda)
        echo "$device_info wbps=1048576" > "$CGROUP_PATH/io.max"
    fi
    
    # CPU weight (for sharing)
    echo "100" > "$CGROUP_PATH/cpu.weight"
    
    log "SUCCESS" "Resource limits configured"
}

add_process_to_cgroup() {
    local pid="${1:-$$}"
    
    log "INFO" "Adding process $pid to cgroup"
    echo "$pid" > "$CGROUP_PATH/cgroup.procs"
    
    # Verify
    if grep -q "^$pid$" "$CGROUP_PATH/cgroup.procs"; then
        log "SUCCESS" "Process $pid added to cgroup"
    else
        log "ERROR" "Failed to add process to cgroup"
    fi
}

run_in_cgroup() {
    local command="${1:-/bin/bash}"
    
    log "INFO" "Running command in cgroup: $command"
    
    # Fork a process and move it to cgroup
    (
        echo $$ > "$CGROUP_PATH/cgroup.procs"
        exec $command
    ) &
    
    local child_pid=$!
    log "SUCCESS" "Started process $child_pid in cgroup"
    echo "$child_pid"
}

monitor_cgroup() {
    log "INFO" "Monitoring cgroup $CGROUP_NAME..."
    echo "=== Cgroup Statistics ==="
    
    while true; do
        clear
        echo "Cgroup: $CGROUP_NAME"
        echo "======================"
        
        # CPU
        echo -e "\n${BLUE}CPU Usage:${NC}"
        cat "$CGROUP_PATH/cpu.stat" 2>/dev/null || echo "N/A"
        
        # Memory
        echo -e "\n${GREEN}Memory Usage:${NC}"
        local mem_current=$(cat "$CGROUP_PATH/memory.current" 2>/dev/null || echo "0")
        local mem_max=$(cat "$CGROUP_PATH/memory.max" 2>/dev/null || echo "max")
        echo "Current: $(numfmt --to=iec $mem_current)"
        echo "Limit: $mem_max"
        
        # Memory events
        if [[ -f "$CGROUP_PATH/memory.events" ]]; then
            echo -e "\nMemory Events:"
            grep -E "(oom|pressure)" "$CGROUP_PATH/memory.events" || true
        fi
        
        # PIDs
        echo -e "\n${YELLOW}Process Count:${NC}"
        local pids_current=$(wc -l < "$CGROUP_PATH/cgroup.procs" 2>/dev/null || echo "0")
        local pids_max=$(cat "$CGROUP_PATH/pids.max" 2>/dev/null || echo "max")
        echo "Current: $pids_current"
        echo "Limit: $pids_max"
        
        # IO
        if [[ -f "$CGROUP_PATH/io.stat" ]]; then
            echo -e "\n${RED}IO Statistics:${NC}"
            cat "$CGROUP_PATH/io.stat" 2>/dev/null || true
        fi
        
        echo -e "\nPress Ctrl+C to exit..."
        sleep 2
    done
}

cleanup() {
    log "INFO" "Cleaning up cgroup: $CGROUP_NAME"
    
    # Kill all processes in cgroup
    if [[ -f "$CGROUP_PATH/cgroup.procs" ]]; then
        while read -r pid; do
            kill -9 "$pid" 2>/dev/null || true
        done < "$CGROUP_PATH/cgroup.procs"
    fi
    
    # Remove cgroup
    rmdir "$CGROUP_PATH" 2>/dev/null && log "SUCCESS" "Cgroup removed" || log "WARNING" "Could not remove cgroup (may not be empty)"
}

show_usage() {
    echo "Usage: $0 [command]"
    echo "Commands:"
    echo "  create       - Create cgroup with default limits"
    echo "  limits       - Set resource limits"
    echo "  add     - Add process to cgroup"
    echo "  run     - Run command in cgroup"
    echo "  monitor      - Monitor cgroup statistics"
    echo "  cleanup      - Remove cgroup"
    echo "  help         - Show this help"
}

main() {
    local command="${1:-help}"
    
    case "$command" in
        "create")
            check_cgroup_v2
            create_cgroup
            set_resource_limits
            ;;
        "limits")
            set_resource_limits "$2" "$3" "$4"
            ;;
        "add")
            add_process_to_cgroup "$2"
            ;;
        "run")
            run_in_cgroup "$2"
            ;;
        "monitor")
            monitor_cgroup
            ;;
        "cleanup")
            cleanup
            ;;
        "help"|*)
            show_usage
            ;;
    esac
}

# Run main function
main "$@"

4. Building a Container from Scratch

Minimal Container Implementation

1 Container Setup Script
#!/bin/bash
# minimal-container.sh - Build a container from scratch using namespaces and cgroups

set -euo pipefail

CONTAINER_NAME="mycontainer"
CONTAINER_ID=$(uuidgen | cut -d- -f1)
CONTAINER_ROOT="/var/containers/$CONTAINER_ID"
CONTAINER_CGROUP="containers/$CONTAINER_ID"

# Container image (busybox)
IMAGE_URL="https://busybox.net/downloads/binaries/1.35.0-x86_64-linux-musl/busybox"
IMAGE_FILE="/tmp/busybox"

# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

log() {
    echo -e "${GREEN}[+]${NC} $1"
}

error() {
    echo -e "${RED}[!]${NC} $1" >&2
    exit 1
}

# Download busybox if not present
download_busybox() {
    if [[ ! -f "$IMAGE_FILE" ]]; then
        log "Downloading busybox..."
        curl -s -o "$IMAGE_FILE" "$IMAGE_URL" || error "Failed to download busybox"
        chmod +x "$IMAGE_FILE"
    fi
}

# Setup container root filesystem
setup_rootfs() {
    log "Setting up container root filesystem..."
    
    mkdir -p "$CONTAINER_ROOT"
    mkdir -p "$CONTAINER_ROOT/bin"
    mkdir -p "$CONTAINER_ROOT/etc"
    mkdir -p "$CONTAINER_ROOT/proc"
    mkdir -p "$CONTAINER_ROOT/sys"
    mkdir -p "$CONTAINER_ROOT/tmp"
    mkdir -p "$CONTAINER_ROOT/dev"
    mkdir -p "$CONTAINER_ROOT/home"
    
    # Copy busybox
    cp "$IMAGE_FILE" "$CONTAINER_ROOT/bin/busybox"
    
    # Create symlinks for busybox applets
    "$CONTAINER_ROOT/bin/busybox" --list | while read applet; do
        ln -sf /bin/busybox "$CONTAINER_ROOT/bin/$applet"
    done
    
    # Create minimal /etc files
    cat > "$CONTAINER_ROOT/etc/passwd" << EOF
root:x:0:0:root:/root:/bin/sh
EOF
    
    cat > "$CONTAINER_ROOT/etc/group" << EOF
root:x:0:
EOF
    
    cat > "$CONTAINER_ROOT/etc/hostname" << EOF
$CONTAINER_NAME
EOF
    
    cat > "$CONTAINER_ROOT/etc/hosts" << EOF
127.0.0.1   localhost $CONTAINER_NAME
::1         localhost ip6-localhost ip6-loopback
EOF
    
    # Create container init script
    cat > "$CONTAINER_ROOT/init.sh" << 'EOF'
#!/bin/sh
# Container init script

# Mount proc and sys
mount -t proc proc /proc
mount -t sysfs sysfs /sys
mount -t tmpfs tmpfs /tmp
mount -t devtmpfs devtmpfs /dev

# Set hostname
hostname $(cat /etc/hostname)

# Drop to shell
echo "Container started as PID $$"
exec /bin/sh
EOF
    
    chmod +x "$CONTAINER_ROOT/init.sh"
}

# Create cgroup for container
setup_cgroup() {
    log "Setting up cgroup..."
    
    CGROUP_PATH="/sys/fs/cgroup/$CONTAINER_CGROUP"
    mkdir -p "$CGROUP_PATH"
    
    # Set resource limits
    echo "50000 100000" > "$CGROUP_PATH/cpu.max"      # 50% CPU
    echo "100M" > "$CGROUP_PATH/memory.max"           # 100MB memory
    echo "50M" > "$CGROUP_PATH/memory.swap.max"       # 50MB swap
    echo "100" > "$CGROUP_PATH/pids.max"              # 100 processes max
}

# Start container
start_container() {
    log "Starting container $CONTAINER_NAME..."
    
    # Create network namespace
    ip netns add "$CONTAINER_NAME" 2>/dev/null || true
    
    # Start container process with namespaces
    unshare \
        --pid --fork \
        --mount \
        --uts \
        --ipc \
        --net=/var/run/netns/$CONTAINER_NAME \
        --user --map-root-user \
        --cgroup \
        chroot "$CONTAINER_ROOT" /init.sh &
    
    CONTAINER_PID=$!
    log "Container started with PID: $CONTAINER_PID"
    
    # Add container process to cgroup
    echo "$CONTAINER_PID" > "/sys/fs/cgroup/$CONTAINER_CGROUP/cgroup.procs"
    
    # Setup network in namespace
    setup_network
    
    log "Container is running!"
    log "Attach with: nsenter -t $CONTAINER_PID -a"
}

# Setup container network
setup_network() {
    log "Setting up container network..."
    
    # Create veth pair
    ip link add veth0 type veth peer name veth1
    
    # Move veth1 to container namespace
    ip link set veth1 netns "$CONTAINER_NAME"
    
    # Configure host side
    ip addr add 10.0.0.1/24 dev veth0
    ip link set veth0 up
    
    # Configure container side
    ip netns exec "$CONTAINER_NAME" ip addr add 10.0.0.2/24 dev veth1
    ip netns exec "$CONTAINER_NAME" ip link set veth1 up
    ip netns exec "$CONTAINER_NAME" ip link set lo up
    ip netns exec "$CONTAINER_NAME" ip route add default via 10.0.0.1
    
    # Enable NAT
    iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -j MASQUERADE
    iptables -A FORWARD -i veth0 -j ACCEPT
    iptables -A FORWARD -o veth0 -j ACCEPT
    
    # Enable IP forwarding
    echo 1 > /proc/sys/net/ipv4/ip_forward
}

# Stop container
stop_container() {
    log "Stopping container..."
    
    # Kill container process
    kill -9 "$CONTAINER_PID" 2>/dev/null || true
    
    # Cleanup network
    ip netns del "$CONTAINER_NAME" 2>/dev/null || true
    ip link del veth0 2>/dev/null || true
    
    # Cleanup cgroup
    rmdir "/sys/fs/cgroup/$CONTAINER_CGROUP" 2>/dev/null || true
    
    log "Container stopped"
}

# Cleanup container filesystem
cleanup() {
    log "Cleaning up..."
    rm -rf "$CONTAINER_ROOT"
}

# Main execution
main() {
    download_busybox
    setup_rootfs
    setup_cgroup
    start_container
    
    # Wait for user input
    echo "Press Enter to stop container..."
    read
    
    stop_container
    cleanup
}

# Run main function
main "$@"
2 Container Lifecycle Management
#!/bin/bash
# container-manager.sh - Complete container lifecycle management

CONTAINERS_DIR="/var/containers"
CGROUP_ROOT="/sys/fs/cgroup"

list_containers() {
    echo "Running containers:"
    echo "=================="
    
    for pid in $(find /proc -maxdepth 1 -type d -name '[0-9]*'); do
        if [[ -f "$pid/ns/pid" ]]; then
            ns_id=$(readlink "$pid/ns/pid" | cut -d[ -f2 | cut -d] -f1)
            echo "PID: $(basename $pid) | Namespace: $ns_id"
        fi
    done | sort -u
    
    echo ""
    echo "Cgroups:"
    echo "========"
    find "$CGROUP_ROOT" -type d -name "container_*" 2>/dev/null | while read cg; do
        echo "Cgroup: $(basename $cg)"
        [[ -f "$cg/cgroup.procs" ]] && echo "  PIDs: $(cat $cg/cgroup.procs | tr '\n' ' ')"
    done
}

inspect_container() {
    local container_pid="$1"
    
    if [[ ! -d "/proc/$container_pid" ]]; then
        echo "Error: Process $container_pid does not exist"
        return 1
    fi
    
    echo "=== Container Inspection: PID $container_pid ==="
    echo ""
    
    # Namespaces
    echo "Namespaces:"
    ls -la "/proc/$container_pid/ns/" | tail -n +2
    echo ""
    
    # Cgroup
    echo "Cgroup:"
    cat "/proc/$container_pid/cgroup"
    echo ""
    
    # Process tree
    echo "Process Tree:"
    pstree -p "$container_pid"
    echo ""
    
    # Resource usage
    echo "Resource Usage:"
    echo "CPU: $(ps -p $container_pid -o %cpu --no-headers)%"
    echo "Memory: $(ps -p $container_pid -o %mem --no-headers)%"
    echo ""
    
    # Network
    echo "Network Namespace:"
    ip netns identify "$container_pid" 2>/dev/null || echo "Not in network namespace"
}

container_stats() {
    local container_pid="$1"
    local cgroup_path=$(cat "/proc/$container_pid/cgroup" | grep -o "containers/.*" | cut -d: -f2)
    
    if [[ -z "$cgroup_path" ]]; then
        echo "No cgroup found for container"
        return 1
    fi
    
    echo "=== Container Statistics ==="
    echo ""
    
    # CPU
    echo "CPU:"
    cat "/sys/fs/cgroup/$cgroup_path/cpu.stat" 2>/dev/null || echo "N/A"
    echo ""
    
    # Memory
    echo "Memory:"
    echo "Current: $(cat /sys/fs/cgroup/$cgroup_path/memory.current 2>/dev/null || echo 0) bytes"
    echo "Limit: $(cat /sys/fs/cgroup/$cgroup_path/memory.max 2>/dev/null || echo max)"
    echo ""
    
    # IO
    echo "IO:"
    cat "/sys/fs/cgroup/$cgroup_path/io.stat" 2>/dev/null || echo "N/A"
}

# Main menu
case "${1:-list}" in
    "list")
        list_containers
        ;;
    "inspect")
        inspect_container "$2"
        ;;
    "stats")
        container_stats "$2"
        ;;
    "help")
        echo "Usage: $0 [command]"
        echo "Commands:"
        echo "  list                 - List all containers"
        echo "  inspect         - Inspect specific container"
        echo "  stats           - Show container statistics"
        echo "  help                 - Show this help"
        ;;
    *)
        echo "Unknown command: $1"
        echo "Use: $0 help"
        ;;
esac

5. Security Considerations

Container Security Best Practices:
1. Use user namespaces: Enable rootless containers with UID mapping
2. Limit capabilities: Drop unnecessary Linux capabilities
3. Apply seccomp profiles: Restrict system calls
4. Use AppArmor/SELinux: Mandatory access controls
5. Set resource limits: Prevent DoS with cgroups
6. Read-only root filesystem: Mount rootfs as read-only
7. Use no-new-privileges: Prevent privilege escalation
8. Network isolation: Use private network namespaces
9. Regular updates: Keep kernel and container tools updated
10. Audit and monitor: Log container activities

Security Hardening Examples

# Drop Linux capabilities
capsh --drop=cap_net_raw,cap_sys_admin -- -c "/bin/bash"
# Keep only: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE
# Apply seccomp filter
wget https://raw.githubusercontent.com/moby/moby/master/profiles/seccomp/default.json
docker run --security-opt seccomp=default.json nginx
# Run with no-new-privileges
docker run --security-opt no-new-privileges nginx
# Read-only root filesystem
docker run --read-only nginx
# With writable tmpfs for /tmp
docker run --read-only --tmpfs /tmp nginx
# User namespace mapping
echo "0 100000 65536" > /proc/$$/uid_map
echo "0 100000 65536" > /proc/$$/gid_map
# AppArmor profile
docker run --security-opt apparmor=docker-default nginx
# SELinux context
docker run --security-opt label=type:svirt_lxc_net_t nginx
# Combine security options
docker run \
--read-only \
--tmpfs /tmp \
--security-opt no-new-privileges \
--security-opt seccomp=default.json \
--cap-drop ALL \
--cap-add NET_BIND_SERVICE \
nginx

6. Performance Monitoring & Troubleshooting

Monitoring Tools and Commands

Tool Purpose Command Output
lsns List namespaces lsns -p $$ Namespace IDs and types
nsenter Enter namespace nsenter -t PID -n ip addr Network config in namespace
systemd-cgtop Cgroup monitoring systemd-cgtop Real-time cgroup resource usage
bpftrace Tracing bpftrace -e 'tracepoint:syscalls:sys_enter_* { @[probe] = count(); }' System call tracing
perf Performance analysis perf stat -a sleep 1 System performance counters
cat /proc/$$/status Process status grep -i NSpid /proc/$$/status Namespace PID mapping
findmnt Mount info findmnt --kernel Kernel mount information

Troubleshooting Common Issues

  • Container can't start: Check kernel support with uname -r and grep NAMESPACE /boot/config-$(uname -r)
  • Permission denied: Ensure user namespaces are enabled: sysctl kernel.unprivileged_userns_clone=1
  • Out of memory: Check cgroup memory limits and adjust memory.max
  • Network issues: Verify network namespace and veth pairs with ip netns list
  • Mount failures: Check mount namespace and propagation flags
  • PID 1 issues: Ensure proper signal handling in container init process
  • Performance problems: Monitor with systemd-cgtop and adjust cgroup limits
  • Security violations: Check audit logs and adjust seccomp/capabilities
  • Master Container Isolation Fundamentals

    Linux namespaces and cgroups are the foundational technologies that enable modern containerization. Namespaces provide isolation (process, network, filesystem, etc.) while cgroups provide resource control (CPU, memory, I/O limits). Together, they create the secure, efficient environments that power Docker, Kubernetes, and cloud-native applications.

    Key Takeaways: Understand each namespace type and its isolation purpose. Master cgroups v2 for resource management. Combine namespaces and cgroups to build containers from scratch. Implement security best practices with capabilities, seccomp, and user namespaces. Monitor and troubleshoot container performance effectively.

    Next Steps: Experiment with manual namespace creation using unshare and nsenter. Build a minimal container from scratch. Explore Docker internals to see how these technologies are used in production. Study Kubernetes pod isolation to understand multi-container orchestration. Continuously monitor container security and performance in your environments.