GitHub Actions Troubleshooting
GitHub Actions workflows can fail for many reasons. This guide teaches you how to debug failures effectively—reading logs, re-running jobs, using SSH debugging, and fixing common issues.
When a GitHub Actions workflow fails, the first step is understanding what failed and why. Workflow runs can fail at different levels: the entire workflow might be misconfigured, a job might fail to start, or a specific step within a job might fail. Each type of failure requires a different debugging approach.
GitHub provides a detailed run log for every workflow execution. The log shows each step, its output, and any error messages. The log also indicates which step failed, making it easy to pinpoint the problem. Failed steps are marked with a red X, and you can expand them to see the full error output.
To view workflow logs, go to the Actions tab in your repository, click on the failed workflow run, then click on the failed job. Each step in the job can be expanded to show its output. The log shows the exact commands that were executed and their results.
When analyzing logs, look for error messages first. They're usually highlighted in red. Common error messages include "command not found," "permission denied," "connection refused," and "exit code 1." The line number of the failure is also shown, which helps locate the exact command that failed.
You can also search within logs using your browser's find feature (Ctrl+F or Cmd+F). This is helpful for finding specific error patterns or locating where a particular command ran. For long logs, consider downloading the raw log (using the "Download log archive" button) and searching locally.
# Example log error message
Run npm test
> my-app@1.0.0 test
> jest
FAIL src/app.test.js
● Login test › should authenticate user
TypeError: Cannot read property 'token' of undefined
Error: Process completed with exit code 1.
GitHub allows you to re-run failed workflows, individual jobs, or even specific failed jobs within a workflow. This is useful when you've fixed the issue and want to test the fix without pushing new code.
To re-run a failed workflow, go to the Actions tab, click on the failed run, then click the "Re-run jobs" button. You can choose to re-run all failed jobs or only specific ones. Re-running uses the same code version as the original run—it doesn't pull new commits.
You can also re-run from a specific failed job. This is helpful when a workflow has multiple jobs and only one failed. Re-running just that job saves time and resources compared to re-running the entire workflow.
# Via GitHub CLI
$ gh run rerun <run-id>
# Re-run only failed jobs
$ gh run rerun <run-id> --failed
# Re-run specific jobs
$ gh run rerun <run-id> --job <job-id>
For complex failures that are hard to debug from logs alone, GitHub Actions supports SSH debugging. You can add a special action that pauses the runner and allows you to connect via SSH to inspect the environment, run commands, and diagnose issues interactively.
To enable SSH debugging, add the `actions/debug` action with `enable-ssh: true` to your workflow. When the workflow reaches that step, it prints connection details (host, port, and SSH key) to the log. You can then connect using those credentials. The runner stays alive for the duration of the job, giving you time to investigate.
SSH debugging is invaluable for issues like missing files, environment variable problems, or incorrect permissions. You can browse the file system, check installed software, test commands manually, and see exactly what's happening on the runner.
# Add SSH debugging to your workflow
name: Debug Workflow
on: [push]
jobs:
debug:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup SSH Debug
uses: mxschmitt/action-tmate@v3
- name: Build
run: npm run build
YAML syntax errors: The workflow fails to parse. Check indentation (use spaces, not tabs), quotes, and colons. Use a YAML validator online or in your editor.
Secret not found: The workflow references a secret that doesn't exist. Verify the secret name in repository Settings → Secrets and variables → Actions.
Command not found: The command isn't installed on the runner. Install it using `apt-get`, `npm install -g`, or use a pre-installed action.
Permission denied: The runner doesn't have permissions to run a command or access a file. Use `chmod +x` for scripts, or check file ownership.
Network timeout: The runner can't reach an external service. Check firewall rules, or use retry logic in your workflow.
Disk space full: The runner ran out of space. Use `actions/cache` to manage dependencies, or clean up temporary files.
# Add retry logic for network operations
- name: Download with retry
run: |
for i in 1 2 3; do
curl -O https://example.com/file.tar.gz && break || sleep 5
done
# Clean up disk space
- name: Free disk space
run: |
sudo rm -rf /usr/share/dotnet
sudo rm -rf /opt/ghc
sudo docker system prune -af
When debugging, it's often helpful to trigger workflows on demand rather than waiting for a push or pull request. The `workflow_dispatch` event allows you to manually trigger workflows from the GitHub UI.
Add `workflow_dispatch` to your workflow's `on` section. You can also add inputs to pass parameters when triggering. This is perfect for testing fixes without making commits or for running debug workflows on specific branches.
# Add manual trigger with debug inputs
on:
workflow_dispatch:
inputs:
debug_enabled:
type: boolean
description: 'Enable SSH debug'
default: false
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Setup SSH if debug enabled
if: ${{ github.event.inputs.debug_enabled == 'true' }}
uses: mxschmitt/action-tmate@v3
Matrix builds run the same job across multiple combinations. When a matrix job fails, only some combinations may fail. The log shows which matrix combination failed, including the specific values for each variable.
To debug matrix failures, first identify which combination failed. Then reproduce that combination locally or in a dedicated debug workflow. You can also use `continue-on-error` to allow other matrix jobs to continue even if one fails, helping you see all failures at once.
# Matrix with continue-on-error
jobs:
test:
strategy:
matrix:
node: [16, 18, 20]
os: [ubuntu-latest, windows-latest]
fail-fast: false # Don't cancel other jobs on failure
continue-on-error: ${{ matrix.node == 16 }} # Don't fail on Node 16
Self-hosted runners add complexity to debugging. Common issues include missing dependencies, incorrect environment variables, and permission problems. The runner logs on the machine itself provide additional debugging information.
Check the runner's diagnostic logs in the `_diag` directory. These logs show the runner's internal operations, including job pickup, environment setup, and step execution. You can also run the runner in interactive mode to see real-time output.
# Run runner interactively for debugging
$ ./run.sh --once # Run one job and exit
# Check runner logs
$ cat ~/actions-runner/_diag/*.log
# Enable verbose logging for the runner
$ export RUNNER_TRACE=1
$ ./run.sh
Debugging is a skill. With practice, you'll quickly identify the root cause of any workflow failure. Use logs, re-runs, and SSH debugging to investigate systematically.