Linux Namespaces & Cgroups: Complete Guide to Container Isolation

Master Linux container isolation technologies with namespaces and cgroups. Understand the fundamental building blocks of containers, from process isolation to resource limits, with practical examples and deep technical insights.

Complete container isolation architecture showing namespaces and cgroups working together

Why Linux Namespaces & Cgroups?

Namespaces and cgroups are the fundamental Linux kernel technologies that enable modern containerization.

Isolation: Namespaces provide process isolation (PID, network, mount, etc.)
Resource Control: Cgroups limit and monitor resource usage (CPU, memory, I/O)
Security: Combined with capabilities and seccomp for defense in depth
Efficiency: Lightweight compared to full virtualization
Portability: Containers run consistently across different environments
Density: Run multiple isolated environments on a single host
Orchestration: Foundation for Kubernetes, Docker, and container orchestration

1. Linux Namespaces Deep Dive

🔒

PID Namespace

unshare --pid --fork

Process ID isolation. Each namespace has its own PID 1. Isolation Process

🌐

Network Namespace

ip netns add mynet

Network stack isolation. Private interfaces, routing, iptables. Isolation Network

📁

Mount Namespace

unshare --mount --propagation private

Filesystem mount isolation. Private mount points and rootfs. Isolation Filesystem

💬

IPC Namespace

unshare --ipc

Inter-process communication isolation (System V IPC, POSIX queues). Isolation

🏷️

UTS Namespace

unshare --uts

Hostname and NIS domain name isolation. Isolation

👤

User Namespace

unshare --user --map-root-user

User and group ID isolation with mapping. Enables rootless containers. Security Rootless

Namespace Types Comparison

Namespace	Flag	Isolates	Introduced	Key Use Case
PID	`CLONE_NEWPID`	Process IDs	Linux 2.6.24	Container process trees
Network	`CLONE_NEWNET`	Network devices, ports, routing	Linux 2.6.24	Container networking
Mount	`CLONE_NEWNS`	Mount points, filesystems	Linux 2.4.19	Container root filesystem
IPC	`CLONE_NEWIPC`	System V IPC, POSIX queues	Linux 2.6.19	Inter-process communication
UTS	`CLONE_NEWUTS`	Hostname and NIS domain	Linux 2.6.19	Container hostname
User	`CLONE_NEWUSER`	User and group IDs	Linux 3.8	Rootless containers, security
Cgroup	`CLONE_NEWCGROUP`	Cgroup root directory	Linux 4.6	Cgroup namespace isolation
Time	`CLONE_NEWTIME`	Boot and monotonic clocks	Linux 5.6	Container time offsets

Host Processes

→

Namespace Isolation

→

Container View

2. Working with Namespaces

Namespace API & System Calls

clone() - Create new process with new namespaces

int clone(int (*fn)(void *), void *stack, int flags, void *arg, ...);

unshare() - Move calling process to new namespace

int unshare(int flags);

setns() - Join existing namespace

int setns(int fd, int nstype);

ioctl() - Get namespace information

ioctl(fd, NS_GET_USERNS); // Get parent user namespace

ioctl(fd, NS_GET_PARENT); // Get parent namespace

Namespace file descriptors in /proc/[pid]/ns/

ls -la /proc/self/ns/

lrwxrwxrwx 1 root root 0 Dec 9 10:00 cgroup -> 'cgroup:[4026531835]'

lrwxrwxrwx 1 root root 0 Dec 9 10:00 ipc -> 'ipc:[4026531839]'

lrwxrwxrwx 1 root root 0 Dec 9 10:00 mnt -> 'mnt:[4026531840]'

lrwxrwxrwx 1 root root 0 Dec 9 10:00 net -> 'net:[4026531992]'

lrwxrwxrwx 1 root root 0 Dec 9 10:00 pid -> 'pid:[4026531836]'

lrwxrwxrwx 1 root root 0 Dec 9 10:00 pid_for_children -> 'pid:[4026531836]'

lrwxrwxrwx 1 root root 0 Dec 9 10:00 user -> 'user:[4026531837]'

lrwxrwxrwx 1 root root 0 Dec 9 10:00 uts -> 'uts:[4026531838]'

Practical Namespace Examples

# Create and enter new PID namespace

sudo unshare --pid --fork --mount-proc /bin/bash

ps aux # Shows only processes in new namespace

echo $$ # PID 1 in this namespace

# Create network namespace

sudo ip netns add mynet

sudo ip netns list

sudo ip netns exec mynet ip addr show

sudo ip netns exec mynet ping 8.8.8.8 # No network connectivity yet

# Create veth pair for network namespace

sudo ip link add veth0 type veth peer name veth1

sudo ip link set veth1 netns mynet

sudo ip addr add 10.0.0.1/24 dev veth0

sudo ip link set veth0 up

sudo ip netns exec mynet ip addr add 10.0.0.2/24 dev veth1

sudo ip netns exec mynet ip link set veth1 up

sudo ip netns exec mynet ip route add default via 10.0.0.1

# Create mount namespace with private propagation

sudo unshare --mount --propagation private /bin/bash

mount -t tmpfs tmpfs /mnt # Private mount not visible to host

mount --make-rshared / # Change propagation type

# Create user namespace (rootless)

unshare --user --map-root-user /bin/bash

id # Shows root in namespace, regular user on host

cat /proc/self/uid_map # Shows ID mapping

# Create UTS namespace (hostname isolation)

sudo unshare --uts /bin/bash

hostname container1 # Change hostname only in namespace

hostname # Shows container1

# Create IPC namespace

sudo unshare --ipc /bin/bash

ipcs -a # Shows empty IPC facilities

# Create multiple namespaces at once

sudo unshare --pid --net --mount --ipc --uts --user --map-root-user --fork /bin/bash

# Inspect namespace IDs

ls -la /proc/$$/ns/ # Current process namespaces

readlink /proc/$$/ns/pid # Get PID namespace ID

sudo lsns -p $$ # List namespaces for process

sudo nsenter -t PID -n /bin/bash # Enter network namespace of process

C Program: Creating a Simple Container

// simple-container.c - Create a minimal container with namespaces

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sched.h>
#include <signal.h>
#include <sys/wait.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>

#define STACK_SIZE (1024 * 1024)
static char container_stack[STACK_SIZE];

// Container process function
int container_main(void* arg) {
    printf("Container [%d] - Inside container!\n", getpid());
    
    // Set hostname in UTS namespace
    sethostname("container", 9);
    
    // Remount /proc to get accurate PID view
    mount("proc", "/proc", "proc", 0, NULL);
    
    // Execute a shell
    system("/bin/bash");
    
    return 0;
}

int main(int argc, char *argv[]) {
    printf("Host [%d] - Starting container...\n", getpid());
    
    // Flags for namespace creation
    int flags = CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWNET | SIGCHLD;
    
    // Create child process with new namespaces
    pid_t pid = clone(container_main, container_stack + STACK_SIZE, flags, NULL);
    if (pid == -1) {
        perror("clone");
        exit(EXIT_FAILURE);
    }
    
    printf("Host [%d] - Container PID: %d\n", getpid(), pid);
    
    // Wait for container process to exit
    waitpid(pid, NULL, 0);
    
    printf("Host [%d] - Container exited\n", getpid());
    return 0;
}

// Compile: gcc -o simple-container simple-container.c
// Run: sudo ./simple-container

Namespace isolation showing how different namespaces create container view

3. Cgroups (Control Groups) Mastery

Cgroups v2 Architecture

# Cgroups v2 is mounted at /sys/fs/cgroup (unified hierarchy)

mount -t cgroup2 cgroup2 /sys/fs/cgroup

ls -la /sys/fs/cgroup/ # View cgroup hierarchy

# Check cgroup version

grep cgroup /proc/filesystems

stat -fc %T /sys/fs/cgroup/ # Should show cgroup2fs

# Create a new cgroup

sudo mkdir /sys/fs/cgroup/myapp

ls -la /sys/fs/cgroup/myapp/ # Controller files appear

# Available controllers

cat /sys/fs/cgroup/cgroup.controllers

cat /sys/fs/cgroup/cgroup.subtree_control

# Enable controllers for subtree

echo "+cpu +memory +io +pids" > /sys/fs/cgroup/cgroup.subtree_control

echo "+cpu +memory" > /sys/fs/cgroup/myapp/cgroup.subtree_control

# Move process to cgroup

echo $$ > /sys/fs/cgroup/myapp/cgroup.procs

cat /sys/fs/cgroup/myapp/cgroup.procs # List processes in cgroup

# Set resource limits

# CPU: Limit to 50% of a core (50ms per 100ms period)

echo "50000 100000" > /sys/fs/cgroup/myapp/cpu.max

# Memory: Limit to 100MB

echo "100M" > /sys/fs/cgroup/myapp/memory.max

echo "50M" > /sys/fs/cgroup/myapp/memory.swap.max # Swap limit

# IO: Limit read/write bandwidth

echo "8:16 wbps=1048576" > /sys/fs/cgroup/myapp/io.max

# 8:16 is major:minor device number (check with lsblk -f)

# PID: Limit number of processes

echo "100" > /sys/fs/cgroup/myapp/pids.max

# Monitor cgroup usage

cat /sys/fs/cgroup/myapp/cpu.stat # CPU statistics

cat /sys/fs/cgroup/myapp/memory.current # Current memory usage

cat /sys/fs/cgroup/myapp/memory.events # Memory events (OOM kills)

cat /sys/fs/cgroup/myapp/io.stat # IO statistics

# Set CPU weight (for shares)

echo "100" > /sys/fs/cgroup/myapp/cpu.weight # Default is 100

# Set memory pressure notifications

echo "low" > /sys/fs/cgroup/myapp/memory.pressure

# Recursively delete cgroup (must be empty)

rmdir /sys/fs/cgroup/myapp

Cgroups v1 vs v2 Comparison

Feature	Cgroups v1	Cgroups v2	Benefits
Hierarchy	Multiple hierarchies	Single unified hierarchy	Simpler, consistent
Controllers	Separate mount points	All in one place	Easier management
CPU	cpu, cpuacct, cpuset	Unified cpu controller	Better CPU scheduling
Memory	memory controller only	Memory + swap accounting	Better memory management
IO	blkio controller	io controller	Unified IO control
Write Interface	Various files	Consistent .max files	Predictable API
Pressure	Not available	Pressure stall information	Better monitoring
Default	Old default	Default since kernel 4.5	Modern systems

Practical Cgroup Examples

cgroup-manager.sh - Comprehensive Cgroup Management

#!/bin/bash
# cgroup-manager.sh - Create and manage cgroups with resource limits

set -euo pipefail

CGROUP_ROOT="/sys/fs/cgroup"
CGROUP_NAME="mycontainer"
CGROUP_PATH="$CGROUP_ROOT/$CGROUP_NAME"

# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

log() {
    local level="$1"
    local message="$2"
    
    case "$level" in
        "INFO") echo -e "${BLUE}[INFO]${NC} $message" ;;
        "SUCCESS") echo -e "${GREEN}[SUCCESS]${NC} $message" ;;
        "WARNING") echo -e "${YELLOW}[WARNING]${NC} $message" ;;
        "ERROR") echo -e "${RED}[ERROR]${NC} $message" ;;
    esac
}

check_cgroup_v2() {
    if ! mount | grep -q "cgroup2 on /sys/fs/cgroup"; then
        log "ERROR" "Cgroups v2 not mounted at /sys/fs/cgroup"
        log "INFO" "Try: sudo mount -t cgroup2 cgroup2 /sys/fs/cgroup"
        exit 1
    fi
    log "SUCCESS" "Cgroups v2 is mounted"
}

create_cgroup() {
    log "INFO" "Creating cgroup: $CGROUP_NAME"
    
    if [[ -d "$CGROUP_PATH" ]]; then
        log "WARNING" "Cgroup already exists, removing..."
        rmdir "$CGROUP_PATH" 2>/dev/null || true
    fi
    
    mkdir -p "$CGROUP_PATH"
    
    # Enable controllers for this cgroup
    echo "+cpu +memory +io +pids" > "$CGROUP_ROOT/cgroup.subtree_control"
    
    log "SUCCESS" "Created cgroup at $CGROUP_PATH"
}

set_resource_limits() {
    local cpu_limit="${1:-0.5}"    # 50% of a core
    local memory_limit="${2:-100M}" # 100MB memory
    local pid_limit="${3:-100}"    # 100 processes max
    
    log "INFO" "Setting resource limits:"
    log "INFO" "  CPU: ${cpu_limit} cores"
    log "INFO" "  Memory: $memory_limit"
    log "INFO" "  PIDs: $pid_limit"
    
    # CPU limit (format: $MAX $PERIOD)
    # $MAX microseconds per $PERIOD microseconds
    local cpu_period=100000  # 100ms in microseconds
    local cpu_max=$(echo "$cpu_limit * $cpu_period" | bc | cut -d. -f1)
    echo "$cpu_max $cpu_period" > "$CGROUP_PATH/cpu.max"
    
    # Memory limit
    echo "$memory_limit" > "$CGROUP_PATH/memory.max"
    echo "50M" > "$CGROUP_PATH/memory.swap.max"  # Limit swap usage
    
    # PID limit
    echo "$pid_limit" > "$CGROUP_PATH/pids.max"
    
    # IO limits (example: limit to 1MB/s write on sda)
    # Get device major:minor for sda
    if [[ -b /dev/sda ]]; then
        local device_info=$(lsblk -nd -o MAJ:MIN /dev/sda)
        echo "$device_info wbps=1048576" > "$CGROUP_PATH/io.max"
    fi
    
    # CPU weight (for sharing)
    echo "100" > "$CGROUP_PATH/cpu.weight"
    
    log "SUCCESS" "Resource limits configured"
}

add_process_to_cgroup() {
    local pid="${1:-$$}"
    
    log "INFO" "Adding process $pid to cgroup"
    echo "$pid" > "$CGROUP_PATH/cgroup.procs"
    
    # Verify
    if grep -q "^$pid$" "$CGROUP_PATH/cgroup.procs"; then
        log "SUCCESS" "Process $pid added to cgroup"
    else
        log "ERROR" "Failed to add process to cgroup"
    fi
}

run_in_cgroup() {
    local command="${1:-/bin/bash}"
    
    log "INFO" "Running command in cgroup: $command"
    
    # Fork a process and move it to cgroup
    (
        echo $$ > "$CGROUP_PATH/cgroup.procs"
        exec $command
    ) &
    
    local child_pid=$!
    log "SUCCESS" "Started process $child_pid in cgroup"
    echo "$child_pid"
}

monitor_cgroup() {
    log "INFO" "Monitoring cgroup $CGROUP_NAME..."
    echo "=== Cgroup Statistics ==="
    
    while true; do
        clear
        echo "Cgroup: $CGROUP_NAME"
        echo "======================"
        
        # CPU
        echo -e "\n${BLUE}CPU Usage:${NC}"
        cat "$CGROUP_PATH/cpu.stat" 2>/dev/null || echo "N/A"
        
        # Memory
        echo -e "\n${GREEN}Memory Usage:${NC}"
        local mem_current=$(cat "$CGROUP_PATH/memory.current" 2>/dev/null || echo "0")
        local mem_max=$(cat "$CGROUP_PATH/memory.max" 2>/dev/null || echo "max")
        echo "Current: $(numfmt --to=iec $mem_current)"
        echo "Limit: $mem_max"
        
        # Memory events
        if [[ -f "$CGROUP_PATH/memory.events" ]]; then
            echo -e "\nMemory Events:"
            grep -E "(oom|pressure)" "$CGROUP_PATH/memory.events" || true
        fi
        
        # PIDs
        echo -e "\n${YELLOW}Process Count:${NC}"
        local pids_current=$(wc -l < "$CGROUP_PATH/cgroup.procs" 2>/dev/null || echo "0")
        local pids_max=$(cat "$CGROUP_PATH/pids.max" 2>/dev/null || echo "max")
        echo "Current: $pids_current"
        echo "Limit: $pids_max"
        
        # IO
        if [[ -f "$CGROUP_PATH/io.stat" ]]; then
            echo -e "\n${RED}IO Statistics:${NC}"
            cat "$CGROUP_PATH/io.stat" 2>/dev/null || true
        fi
        
        echo -e "\nPress Ctrl+C to exit..."
        sleep 2
    done
}

cleanup() {
    log "INFO" "Cleaning up cgroup: $CGROUP_NAME"
    
    # Kill all processes in cgroup
    if [[ -f "$CGROUP_PATH/cgroup.procs" ]]; then
        while read -r pid; do
            kill -9 "$pid" 2>/dev/null || true
        done < "$CGROUP_PATH/cgroup.procs"
    fi
    
    # Remove cgroup
    rmdir "$CGROUP_PATH" 2>/dev/null && log "SUCCESS" "Cgroup removed" || log "WARNING" "Could not remove cgroup (may not be empty)"
}

show_usage() {
    echo "Usage: $0 [command]"
    echo "Commands:"
    echo "  create       - Create cgroup with default limits"
    echo "  limits       - Set resource limits"
    echo "  add     - Add process to cgroup"
    echo "  run     - Run command in cgroup"
    echo "  monitor      - Monitor cgroup statistics"
    echo "  cleanup      - Remove cgroup"
    echo "  help         - Show this help"
}

main() {
    local command="${1:-help}"
    
    case "$command" in
        "create")
            check_cgroup_v2
            create_cgroup
            set_resource_limits
            ;;
        "limits")
            set_resource_limits "$2" "$3" "$4"
            ;;
        "add")
            add_process_to_cgroup "$2"
            ;;
        "run")
            run_in_cgroup "$2"
            ;;
        "monitor")
            monitor_cgroup
            ;;
        "cleanup")
            cleanup
            ;;
        "help"|*)
            show_usage
            ;;
    esac
}

# Run main function
main "$@"

4. Building a Container from Scratch

Minimal Container Implementation

1 Container Setup Script

#!/bin/bash
# minimal-container.sh - Build a container from scratch using namespaces and cgroups

set -euo pipefail

CONTAINER_NAME="mycontainer"
CONTAINER_ID=$(uuidgen | cut -d- -f1)
CONTAINER_ROOT="/var/containers/$CONTAINER_ID"
CONTAINER_CGROUP="containers/$CONTAINER_ID"

# Container image (busybox)
IMAGE_URL="https://busybox.net/downloads/binaries/1.35.0-x86_64-linux-musl/busybox"
IMAGE_FILE="/tmp/busybox"

# Colors
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

log() {
    echo -e "${GREEN}[+]${NC} $1"
}

error() {
    echo -e "${RED}[!]${NC} $1" >&2
    exit 1
}

# Download busybox if not present
download_busybox() {
    if [[ ! -f "$IMAGE_FILE" ]]; then
        log "Downloading busybox..."
        curl -s -o "$IMAGE_FILE" "$IMAGE_URL" || error "Failed to download busybox"
        chmod +x "$IMAGE_FILE"
    fi
}

# Setup container root filesystem
setup_rootfs() {
    log "Setting up container root filesystem..."
    
    mkdir -p "$CONTAINER_ROOT"
    mkdir -p "$CONTAINER_ROOT/bin"
    mkdir -p "$CONTAINER_ROOT/etc"
    mkdir -p "$CONTAINER_ROOT/proc"
    mkdir -p "$CONTAINER_ROOT/sys"
    mkdir -p "$CONTAINER_ROOT/tmp"
    mkdir -p "$CONTAINER_ROOT/dev"
    mkdir -p "$CONTAINER_ROOT/home"
    
    # Copy busybox
    cp "$IMAGE_FILE" "$CONTAINER_ROOT/bin/busybox"
    
    # Create symlinks for busybox applets
    "$CONTAINER_ROOT/bin/busybox" --list | while read applet; do
        ln -sf /bin/busybox "$CONTAINER_ROOT/bin/$applet"
    done
    
    # Create minimal /etc files
    cat > "$CONTAINER_ROOT/etc/passwd" << EOF
root:x:0:0:root:/root:/bin/sh
EOF
    
    cat > "$CONTAINER_ROOT/etc/group" << EOF
root:x:0:
EOF
    
    cat > "$CONTAINER_ROOT/etc/hostname" << EOF
$CONTAINER_NAME
EOF
    
    cat > "$CONTAINER_ROOT/etc/hosts" << EOF
127.0.0.1   localhost $CONTAINER_NAME
::1         localhost ip6-localhost ip6-loopback
EOF
    
    # Create container init script
    cat > "$CONTAINER_ROOT/init.sh" << 'EOF'
#!/bin/sh
# Container init script

# Mount proc and sys
mount -t proc proc /proc
mount -t sysfs sysfs /sys
mount -t tmpfs tmpfs /tmp
mount -t devtmpfs devtmpfs /dev

# Set hostname
hostname $(cat /etc/hostname)

# Drop to shell
echo "Container started as PID $$"
exec /bin/sh
EOF
    
    chmod +x "$CONTAINER_ROOT/init.sh"
}

# Create cgroup for container
setup_cgroup() {
    log "Setting up cgroup..."
    
    CGROUP_PATH="/sys/fs/cgroup/$CONTAINER_CGROUP"
    mkdir -p "$CGROUP_PATH"
    
    # Set resource limits
    echo "50000 100000" > "$CGROUP_PATH/cpu.max"      # 50% CPU
    echo "100M" > "$CGROUP_PATH/memory.max"           # 100MB memory
    echo "50M" > "$CGROUP_PATH/memory.swap.max"       # 50MB swap
    echo "100" > "$CGROUP_PATH/pids.max"              # 100 processes max
}

# Start container
start_container() {
    log "Starting container $CONTAINER_NAME..."
    
    # Create network namespace
    ip netns add "$CONTAINER_NAME" 2>/dev/null || true
    
    # Start container process with namespaces
    unshare \
        --pid --fork \
        --mount \
        --uts \
        --ipc \
        --net=/var/run/netns/$CONTAINER_NAME \
        --user --map-root-user \
        --cgroup \
        chroot "$CONTAINER_ROOT" /init.sh &
    
    CONTAINER_PID=$!
    log "Container started with PID: $CONTAINER_PID"
    
    # Add container process to cgroup
    echo "$CONTAINER_PID" > "/sys/fs/cgroup/$CONTAINER_CGROUP/cgroup.procs"
    
    # Setup network in namespace
    setup_network
    
    log "Container is running!"
    log "Attach with: nsenter -t $CONTAINER_PID -a"
}

# Setup container network
setup_network() {
    log "Setting up container network..."
    
    # Create veth pair
    ip link add veth0 type veth peer name veth1
    
    # Move veth1 to container namespace
    ip link set veth1 netns "$CONTAINER_NAME"
    
    # Configure host side
    ip addr add 10.0.0.1/24 dev veth0
    ip link set veth0 up
    
    # Configure container side
    ip netns exec "$CONTAINER_NAME" ip addr add 10.0.0.2/24 dev veth1
    ip netns exec "$CONTAINER_NAME" ip link set veth1 up
    ip netns exec "$CONTAINER_NAME" ip link set lo up
    ip netns exec "$CONTAINER_NAME" ip route add default via 10.0.0.1
    
    # Enable NAT
    iptables -t nat -A POSTROUTING -s 10.0.0.0/24 -j MASQUERADE
    iptables -A FORWARD -i veth0 -j ACCEPT
    iptables -A FORWARD -o veth0 -j ACCEPT
    
    # Enable IP forwarding
    echo 1 > /proc/sys/net/ipv4/ip_forward
}

# Stop container
stop_container() {
    log "Stopping container..."
    
    # Kill container process
    kill -9 "$CONTAINER_PID" 2>/dev/null || true
    
    # Cleanup network
    ip netns del "$CONTAINER_NAME" 2>/dev/null || true
    ip link del veth0 2>/dev/null || true
    
    # Cleanup cgroup
    rmdir "/sys/fs/cgroup/$CONTAINER_CGROUP" 2>/dev/null || true
    
    log "Container stopped"
}

# Cleanup container filesystem
cleanup() {
    log "Cleaning up..."
    rm -rf "$CONTAINER_ROOT"
}

# Main execution
main() {
    download_busybox
    setup_rootfs
    setup_cgroup
    start_container
    
    # Wait for user input
    echo "Press Enter to stop container..."
    read
    
    stop_container
    cleanup
}

# Run main function
main "$@"

2 Container Lifecycle Management

#!/bin/bash
# container-manager.sh - Complete container lifecycle management

CONTAINERS_DIR="/var/containers"
CGROUP_ROOT="/sys/fs/cgroup"

list_containers() {
    echo "Running containers:"
    echo "=================="
    
    for pid in $(find /proc -maxdepth 1 -type d -name '[0-9]*'); do
        if [[ -f "$pid/ns/pid" ]]; then
            ns_id=$(readlink "$pid/ns/pid" | cut -d[ -f2 | cut -d] -f1)
            echo "PID: $(basename $pid) | Namespace: $ns_id"
        fi
    done | sort -u
    
    echo ""
    echo "Cgroups:"
    echo "========"
    find "$CGROUP_ROOT" -type d -name "container_*" 2>/dev/null | while read cg; do
        echo "Cgroup: $(basename $cg)"
        [[ -f "$cg/cgroup.procs" ]] && echo "  PIDs: $(cat $cg/cgroup.procs | tr '\n' ' ')"
    done
}

inspect_container() {
    local container_pid="$1"
    
    if [[ ! -d "/proc/$container_pid" ]]; then
        echo "Error: Process $container_pid does not exist"
        return 1
    fi
    
    echo "=== Container Inspection: PID $container_pid ==="
    echo ""
    
    # Namespaces
    echo "Namespaces:"
    ls -la "/proc/$container_pid/ns/" | tail -n +2
    echo ""
    
    # Cgroup
    echo "Cgroup:"
    cat "/proc/$container_pid/cgroup"
    echo ""
    
    # Process tree
    echo "Process Tree:"
    pstree -p "$container_pid"
    echo ""
    
    # Resource usage
    echo "Resource Usage:"
    echo "CPU: $(ps -p $container_pid -o %cpu --no-headers)%"
    echo "Memory: $(ps -p $container_pid -o %mem --no-headers)%"
    echo ""
    
    # Network
    echo "Network Namespace:"
    ip netns identify "$container_pid" 2>/dev/null || echo "Not in network namespace"
}

container_stats() {
    local container_pid="$1"
    local cgroup_path=$(cat "/proc/$container_pid/cgroup" | grep -o "containers/.*" | cut -d: -f2)
    
    if [[ -z "$cgroup_path" ]]; then
        echo "No cgroup found for container"
        return 1
    fi
    
    echo "=== Container Statistics ==="
    echo ""
    
    # CPU
    echo "CPU:"
    cat "/sys/fs/cgroup/$cgroup_path/cpu.stat" 2>/dev/null || echo "N/A"
    echo ""
    
    # Memory
    echo "Memory:"
    echo "Current: $(cat /sys/fs/cgroup/$cgroup_path/memory.current 2>/dev/null || echo 0) bytes"
    echo "Limit: $(cat /sys/fs/cgroup/$cgroup_path/memory.max 2>/dev/null || echo max)"
    echo ""
    
    # IO
    echo "IO:"
    cat "/sys/fs/cgroup/$cgroup_path/io.stat" 2>/dev/null || echo "N/A"
}

# Main menu
case "${1:-list}" in
    "list")
        list_containers
        ;;
    "inspect")
        inspect_container "$2"
        ;;
    "stats")
        container_stats "$2"
        ;;
    "help")
        echo "Usage: $0 [command]"
        echo "Commands:"
        echo "  list                 - List all containers"
        echo "  inspect         - Inspect specific container"
        echo "  stats           - Show container statistics"
        echo "  help                 - Show this help"
        ;;
    *)
        echo "Unknown command: $1"
        echo "Use: $0 help"
        ;;
esac

5. Security Considerations

Container Security Best Practices:
1. Use user namespaces: Enable rootless containers with UID mapping
2. Limit capabilities: Drop unnecessary Linux capabilities
3. Apply seccomp profiles: Restrict system calls
4. Use AppArmor/SELinux: Mandatory access controls
5. Set resource limits: Prevent DoS with cgroups
6. Read-only root filesystem: Mount rootfs as read-only
7. Use no-new-privileges: Prevent privilege escalation
8. Network isolation: Use private network namespaces
9. Regular updates: Keep kernel and container tools updated
10. Audit and monitor: Log container activities

Security Hardening Examples

# Drop Linux capabilities

capsh --drop=cap_net_raw,cap_sys_admin -- -c "/bin/bash"

# Keep only: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE

# Apply seccomp filter

wget https://raw.githubusercontent.com/moby/moby/master/profiles/seccomp/default.json

docker run --security-opt seccomp=default.json nginx

# Run with no-new-privileges

docker run --security-opt no-new-privileges nginx

# Read-only root filesystem

docker run --read-only nginx

# With writable tmpfs for /tmp

docker run --read-only --tmpfs /tmp nginx

# User namespace mapping

echo "0 100000 65536" > /proc/$$/uid_map

echo "0 100000 65536" > /proc/$$/gid_map

# AppArmor profile

docker run --security-opt apparmor=docker-default nginx

# SELinux context

docker run --security-opt label=type:svirt_lxc_net_t nginx

# Combine security options

docker run \

--read-only \

--tmpfs /tmp \

--security-opt no-new-privileges \

--security-opt seccomp=default.json \

--cap-drop ALL \

--cap-add NET_BIND_SERVICE \

nginx

6. Performance Monitoring & Troubleshooting

Monitoring Tools and Commands

Tool	Purpose	Command	Output
`lsns`	List namespaces	`lsns -p $$`	Namespace IDs and types
`nsenter`	Enter namespace	`nsenter -t PID -n ip addr`	Network config in namespace
`systemd-cgtop`	Cgroup monitoring	`systemd-cgtop`	Real-time cgroup resource usage
`bpftrace`	Tracing	`bpftrace -e 'tracepoint:syscalls:sys_enter_* { @[probe] = count(); }'`	System call tracing
`perf`	Performance analysis	`perf stat -a sleep 1`	System performance counters
`cat /proc/$$/status`	Process status	`grep -i NSpid /proc/$$/status`	Namespace PID mapping
`findmnt`	Mount info	`findmnt --kernel`	Kernel mount information

Troubleshooting Common Issues

Container can't start: Check kernel support with uname -r and grep NAMESPACE /boot/config-$(uname -r)

Permission denied: Ensure user namespaces are enabled: sysctl kernel.unprivileged_userns_clone=1

Out of memory: Check cgroup memory limits and adjust memory.max

Network issues: Verify network namespace and veth pairs with ip netns list

Mount failures: Check mount namespace and propagation flags

PID 1 issues: Ensure proper signal handling in container init process

Performance problems: Monitor with systemd-cgtop and adjust cgroup limits

Security violations: Check audit logs and adjust seccomp/capabilities

Master Container Isolation Fundamentals

Linux namespaces and cgroups are the foundational technologies that enable modern containerization. Namespaces provide isolation (process, network, filesystem, etc.) while cgroups provide resource control (CPU, memory, I/O limits). Together, they create the secure, efficient environments that power Docker, Kubernetes, and cloud-native applications.

Key Takeaways: Understand each namespace type and its isolation purpose. Master cgroups v2 for resource management. Combine namespaces and cgroups to build containers from scratch. Implement security best practices with capabilities, seccomp, and user namespaces. Monitor and troubleshoot container performance effectively.

Next Steps: Experiment with manual namespace creation using unshare and nsenter. Build a minimal container from scratch. Explore Docker internals to see how these technologies are used in production. Study Kubernetes pod isolation to understand multi-container orchestration. Continuously monitor container security and performance in your environments.