Git Internals: Blobs, Trees, Commits, and References
What Lies Beneath Git?
Most developers use Git daily, but few understand its internal architecture. Git is fundamentally a content-addressable filesystem with a VCS interface on top. At its core are four object types: blobs, trees, commits, and tags. Understanding these gives you superpowers for debugging and advanced workflows.
Key insight: Git doesn't think in terms of files—it thinks in terms of objects identified by SHA-1 hashes. Everything is stored in the .git/objects directory.
The Object Database
Every Git object has:
- A type (blob, tree, commit, tag)
- A size
- Content
- A SHA-1 hash of (type + size + content)
$ git init
$ echo 'hello world' > test.txt
$ git add test.txt
$ git commit -m "first commit"
$ find .git/objects -type f
# .git/objects/3b/18e512dba79e4c8300dd08aeb37f8e728b8dad
Blob Objects
A blob (binary large object) stores file content—but not the filename. It's just the compressed file data. Identical file content = identical blob hash, regardless of where it appears.
$ git cat-file -p 3b18e51 # shows "hello world"
$ git cat-file -t 3b18e51 # shows "blob"
Tree Objects
Trees represent directories. They map filenames to blobs (or subtrees) and include file modes. A tree object stores:
- File mode (e.g., 100644 for regular file)
- Type (blob or tree)
- SHA-1 hash
- Filename
$ git cat-file -p HEAD^{tree}
100644 blob 3b18e51... test.txt
040000 tree 2c3f4d2... src
Commit Objects
A commit is a snapshot of your repository at a point in time. It contains:
- Tree hash (root directory snapshot)
- Parent commit hash(es)
- Author (name, email, timestamp)
- Committer (name, email, timestamp)
- Commit message
$ git cat-file -p HEAD
tree 5c2b3f1a8b7d9c4e2f1a3b5c7d9e1f3a5b7c9d1e
parent 9d8e7f6a5b4c3d2e1f0a9b8c7d6e5f4a3b2c1d0e
author John Doe <john@example.com> 1709123456 -0500
committer John Doe <john@example.com> 1709123456 -0500
Initial commit with test.txt
References (Refs)
References are human-readable names for commit hashes. They live in .git/refs/:
- Branches –
refs/heads/mainpoints to latest commit - Tags –
refs/tags/v1.0points to a commit or tag object - HEAD –
.git/HEADpoints to current branch
$ cat .git/refs/heads/main
a1b2c3d4e5f678901234567890abcdef12345678
$ cat .git/HEAD
ref: refs/heads/main
The Object Graph
Commits form a directed acyclic graph (DAG):
- Each commit points to its parent(s)
- Merge commits have multiple parents
- The entire history is a graph of immutable objects
Visualizing the DAG
git log --graph --oneline shows the commit graph. Each commit is a node, arrows point to parents.
Tag Objects
Tags can be lightweight (just a ref pointing to a commit) or annotated (full object with tagger, message, and optionally signature).
$ git tag -a v1.0 -m "Release version 1.0"
$ git cat-file -p v1.0
object a1b2c3d4e5f678901234567890abcdef12345678
type commit
tag v1.0
tagger John Doe <john@example.com> 1709123456 -0500
Release version 1.0
The Packfiles
To save space, Git packs loose objects into packfiles (`.pack`) with indexes (`.idx`). Objects are delta-compressed against similar objects.
$ git gc
# Now objects are in .git/objects/pack/
Plumbing vs Porcelain
Git commands are divided into:
- Plumbing – low-level commands (e.g.,
git cat-file,git hash-object,git update-index) - Porcelain – user-friendly commands (e.g.,
git add,git commit,git status)
Understanding plumbing lets you build custom Git tools and debug repository issues.
Practical Internals: Manual Commit
You can create a commit without using porcelain:
$ echo "Hello" | git hash-object -w --stdin
5d41402abc4b2a76b9719d911017c592
# Step 2: Create tree (requires staging area)
$ git update-index --add --cacheinfo 100644 5d41402 abc file.txt
$ git write-tree
8d1a2b3c4d5e6f78901234567890abcdef123456
# Step 3: Create commit
$ echo "Manual commit" | git commit-tree 8d1a2b3
f0e1d2c3b4a5968778695a4b3c2d1e0f1a2b3c4d
# Step 4: Update branch
$ git update-ref refs/heads/main f0e1d2c
Git Objects Summary
| Object | Stores | Example |
|---|---|---|
| Blob | File content | git cat-file -p <blob-hash> |
| Tree | Directory listing | git ls-tree <tree-hash> |
| Commit | Snapshot + metadata | git show <commit-hash> |
| Tag | Tag annotation | git cat-file -p <tag-hash> |
Frequently Asked Questions
In .git/objects/. Initially as loose objects (one file per object), then packed into packfiles in .git/objects/pack/ after garbage collection.
A blob stores only the content, not the filename or metadata. The filename is stored in tree objects. This is why moving/renaming a file doesn't create a new blob if content is identical.
Git prepends the object type and size to the content: "blob <size>\0<content>", then computes SHA-1. For a tree, it's "tree <size>\0<entries>".
Yes! If the object still exists (not garbage collected), use git checkout <hash> or create a branch: git branch recover <hash>. Use git reflog to find lost commits.