Code Scanning with CodeQL
CodeQL is GitHub's powerful static analysis engine that finds security vulnerabilities in your code. It treats code as data, letting you write queries to identify patterns that lead to bugs and security issues—all integrated into your CI/CD pipeline.
CodeQL is GitHub's industry-leading static analysis engine that treats code as data. Unlike traditional static analysis tools that use pattern matching, CodeQL actually understands the structure of your code—its syntax, data flow, and control flow. This allows it to find complex vulnerabilities that simple pattern matching would miss.
At its core, CodeQL works by extracting a relational database from your codebase. This database contains information about every token, syntax tree, data flow path, and control flow graph. You then write queries against this database using a powerful SQL-like language. These queries can find anything from simple SQL injection vulnerabilities to complex logic bugs that span multiple functions and files.
GitHub provides hundreds of pre-written security queries covering the most common vulnerability types: SQL injection, cross-site scripting (XSS), path traversal, command injection, insecure deserialization, and many more. These queries are maintained by GitHub's security research team and are constantly updated as new vulnerability patterns emerge.
CodeQL supports a wide range of programming languages, with different levels of maturity:
Comprehensive support (all security queries available): C/C++, C#, Go, Java, JavaScript/TypeScript, Python, Ruby, and Kotlin.
Beta support: Swift and Rust. These languages are actively being developed and have growing query coverage.
For each supported language, CodeQL understands the language's unique semantics, including type systems, package structures, and common frameworks. This enables it to find framework-specific vulnerabilities—like finding XSS in React applications or SQL injection in Django ORM.
# Supported language identifiers for codeql.yml
language: javascript # JavaScript/TypeScript
language: python # Python
language: java # Java/Kotlin
language: csharp # C#
language: cpp # C/C++
language: go # Go
language: ruby # Ruby
language: swift # Swift (beta)
language: rust # Rust (beta)
CodeQL's approach is fundamentally different from traditional static analysis. First, it builds a database of your code through a process called extraction. During extraction, CodeQL parses your source code and builds a relational database that captures every element—functions, classes, variables, expressions, and their relationships.
Once the database is built, CodeQL runs queries against it. These queries are written in QL, a declarative language that resembles SQL. A query might look for data flows from user input to a dangerous sink (like an SQL query) without passing through a sanitizer. Because CodeQL understands data flow across function boundaries and through objects, it can find vulnerabilities that span hundreds of lines of code.
Finally, CodeQL produces results with precise locations, showing exactly where the vulnerability originates, where it flows, and where it's used unsafely. These results appear as alerts in GitHub's code scanning interface, directly in the code view, and as pull request checks.
# Example QL query concept (simplified)
# Find SQL injection vulnerabilities
import sql
import javascript
from SqlInjectionFlowConfig config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink, source, sink, "This query depends on a user-provided value."
Setting up CodeQL code scanning is straightforward. Go to your repository's Security tab, click "Set up code scanning," and choose CodeQL. GitHub will create a workflow file at .github/workflows/codeql.yml with default configuration.
The default configuration works for most projects. It automatically detects your languages, builds your code, and runs all security queries. For compiled languages (C/C++, C#, Java, Go), CodeQL needs to understand your build process. The default configuration uses an autobuild script that works for many common project structures. For complex builds, you can customize the build steps.
You can also configure CodeQL to run only on specific languages, use a custom query suite, or run on a schedule. The workflow runs on every push to the main branch and on every pull request, providing continuous security feedback.
# .github/workflows/codeql.yml
name: "CodeQL"
on:
push:
branches: [ "main" ]
pull_request:
branches: [ "main" ]
schedule:
- cron: '0 2 * * 0' # Weekly on Sunday
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
strategy:
matrix:
language: [javascript, python, java]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4
with:
distribution: 'temurin'
java-version: '17'
- uses: github/codeql-action/init@v3
with:
languages: ${{ matrix.language }}
queries: security-extended
- uses: github/codeql-action/autobuild@v3
- uses: github/codeql-action/analyze@v3
CodeQL organizes security queries into suites. The default suite runs a balanced set of queries that produce few false positives. The extended suite runs more aggressive queries that may produce more results, including potential false positives, but catches more vulnerabilities. You can also create custom query suites for your specific needs.
You can also add custom queries. If your organization has specific security requirements, you can write QL queries and include them in your workflow. These queries can be checked into your repository, making your custom security rules version-controlled and shared with the team.
Results are categorized by severity: error, warning, and note. Errors are high-confidence findings that should be fixed immediately. Warnings are potential issues that deserve investigation. Notes are informational—they might highlight code patterns that are unusual but not necessarily problematic.
# Using a custom query suite
- uses: github/codeql-action/init@v3
with:
languages: javascript
queries: security-extended
config-file: .github/codeql/custom-config.yml
# .github/codeql/custom-config.yml
queries:
- uses: security-extended
- uses: ./custom-queries/ # Local custom queries
paths-ignore:
- '**/test/**'
- '**/vendor/**'
When CodeQL finds a vulnerability, it appears in the Security tab and as a check on pull requests. Each alert includes a description of the vulnerability, a severity rating, and exact locations in your code. Clicking on an alert shows the data flow path—how untrusted data enters your application, how it flows through the code, and where it reaches a dangerous sink without sanitization.
Understanding this path is crucial for fixing the vulnerability. For example, an SQL injection alert might show that user input from a query parameter flows through a function call and ends up concatenated into an SQL query. The fix is to use parameterized queries instead of string concatenation.
Not every alert requires immediate action. Some may be false positives, where CodeQL couldn't determine that input was safely sanitized. Others may be in test code or legacy code that's being phased out. You can dismiss alerts with a reason (false positive, won't fix, used in tests) and optionally add a comment explaining your decision.
CodeQL integrates seamlessly with GitHub's pull request workflow. When you enable code scanning, CodeQL runs on every pull request. If it finds any new vulnerabilities, it marks the check as failing, preventing merging until the issues are addressed. You can configure which severity levels block merging.
You can also run CodeQL on a schedule. A weekly scan on the main branch ensures that you catch vulnerabilities that might have been introduced and then fixed in feature branches, or that were discovered by new security queries after the code was merged.
For large codebases, CodeQL's performance is excellent. Analysis times vary by language and codebase size, but typically run in 10-30 minutes. The matrix strategy allows analyzing multiple languages in parallel, reducing total time.
# Enforce code scanning on pull requests
# In branch protection rules, require the "CodeQL" status check
# Configure severity thresholds in codeql-analysis.yml
- uses: github/codeql-action/analyze@v3
with:
severity-threshold: 'error' # Only block on errors
Enable extended query suite for critical applications. The extended suite catches more vulnerabilities, especially useful for security-sensitive codebases.
Run on every pull request. Catching vulnerabilities before they reach main is much cheaper than fixing them later.
Review alerts promptly. Don't let alerts accumulate. Set a goal to review all new alerts within 24-48 hours.
Document dismissals. When you dismiss an alert, add a comment explaining why. This helps future reviewers understand the reasoning.
Build your custom queries. If your application has domain-specific security patterns, writing custom QL queries can catch bugs unique to your codebase.
Use path filters. Exclude test files, generated code, and third-party libraries from analysis to reduce noise and speed up analysis.
Combine with other security tools. CodeQL is powerful, but no tool catches everything. Combine it with dependency scanning, secret scanning, and manual reviews.
CodeQL transforms security testing from a periodic audit to a continuous, automated part of your development workflow. Find vulnerabilities before they find you.