Git Internals: Blobs, Trees, Commits, and References

What Lies Beneath Git?

Most developers use Git daily, but few understand its internal architecture. Git is fundamentally a content-addressable filesystem with a VCS interface on top. At its core are four object types: blobs, trees, commits, and tags. Understanding these gives you superpowers for debugging and advanced workflows.

Key insight: Git doesn't think in terms of files—it thinks in terms of objects identified by SHA-1 hashes. Everything is stored in the .git/objects directory.

The Object Database

Every Git object has:

  • A type (blob, tree, commit, tag)
  • A size
  • Content
  • A SHA-1 hash of (type + size + content)
# Explore objects manually
$ git init
$ echo 'hello world' > test.txt
$ git add test.txt
$ git commit -m "first commit"
$ find .git/objects -type f
# .git/objects/3b/18e512dba79e4c8300dd08aeb37f8e728b8dad

Blob Objects

A blob (binary large object) stores file content—but not the filename. It's just the compressed file data. Identical file content = identical blob hash, regardless of where it appears.

# View a blob
$ git cat-file -p 3b18e51 # shows "hello world"
$ git cat-file -t 3b18e51 # shows "blob"

Tree Objects

Trees represent directories. They map filenames to blobs (or subtrees) and include file modes. A tree object stores:

  • File mode (e.g., 100644 for regular file)
  • Type (blob or tree)
  • SHA-1 hash
  • Filename
# View a tree
$ git cat-file -p HEAD^{tree}
100644 blob 3b18e51... test.txt
040000 tree 2c3f4d2... src

Commit Objects

A commit is a snapshot of your repository at a point in time. It contains:

  • Tree hash (root directory snapshot)
  • Parent commit hash(es)
  • Author (name, email, timestamp)
  • Committer (name, email, timestamp)
  • Commit message
# Inspect a commit
$ git cat-file -p HEAD
tree 5c2b3f1a8b7d9c4e2f1a3b5c7d9e1f3a5b7c9d1e
parent 9d8e7f6a5b4c3d2e1f0a9b8c7d6e5f4a3b2c1d0e
author John Doe <john@example.com> 1709123456 -0500
committer John Doe <john@example.com> 1709123456 -0500

Initial commit with test.txt

References (Refs)

References are human-readable names for commit hashes. They live in .git/refs/:

  • Branchesrefs/heads/main points to latest commit
  • Tagsrefs/tags/v1.0 points to a commit or tag object
  • HEAD.git/HEAD points to current branch
# References are just files
$ cat .git/refs/heads/main
a1b2c3d4e5f678901234567890abcdef12345678
$ cat .git/HEAD
ref: refs/heads/main

The Object Graph

Commits form a directed acyclic graph (DAG):

  • Each commit points to its parent(s)
  • Merge commits have multiple parents
  • The entire history is a graph of immutable objects

Visualizing the DAG

git log --graph --oneline shows the commit graph. Each commit is a node, arrows point to parents.

Tag Objects

Tags can be lightweight (just a ref pointing to a commit) or annotated (full object with tagger, message, and optionally signature).

# Annotated tag object
$ git tag -a v1.0 -m "Release version 1.0"
$ git cat-file -p v1.0
object a1b2c3d4e5f678901234567890abcdef12345678
type commit
tag v1.0
tagger John Doe <john@example.com> 1709123456 -0500

Release version 1.0

The Packfiles

To save space, Git packs loose objects into packfiles (`.pack`) with indexes (`.idx`). Objects are delta-compressed against similar objects.

# Force garbage collection
$ git gc
# Now objects are in .git/objects/pack/

Plumbing vs Porcelain

Git commands are divided into:

  • Plumbing – low-level commands (e.g., git cat-file, git hash-object, git update-index)
  • Porcelain – user-friendly commands (e.g., git add, git commit, git status)

Understanding plumbing lets you build custom Git tools and debug repository issues.

Practical Internals: Manual Commit

You can create a commit without using porcelain:

# Step 1: Create blob
$ echo "Hello" | git hash-object -w --stdin
5d41402abc4b2a76b9719d911017c592

# Step 2: Create tree (requires staging area)
$ git update-index --add --cacheinfo 100644 5d41402 abc file.txt
$ git write-tree
8d1a2b3c4d5e6f78901234567890abcdef123456

# Step 3: Create commit
$ echo "Manual commit" | git commit-tree 8d1a2b3
f0e1d2c3b4a5968778695a4b3c2d1e0f1a2b3c4d

# Step 4: Update branch
$ git update-ref refs/heads/main f0e1d2c

Git Objects Summary

Object Stores Example
Blob File content git cat-file -p <blob-hash>
Tree Directory listing git ls-tree <tree-hash>
Commit Snapshot + metadata git show <commit-hash>
Tag Tag annotation git cat-file -p <tag-hash>
What Git object type stores directory structure and filenames?
  • Tree
  • Blob
  • Commit
  • Tag

Frequently Asked Questions

Where are Git objects physically stored?

In .git/objects/. Initially as loose objects (one file per object), then packed into packfiles in .git/objects/pack/ after garbage collection.

What's the difference between a blob and a file?

A blob stores only the content, not the filename or metadata. The filename is stored in tree objects. This is why moving/renaming a file doesn't create a new blob if content is identical.

How does Git compute the SHA-1 hash?

Git prepends the object type and size to the content: "blob <size>\0<content>", then computes SHA-1. For a tree, it's "tree <size>\0<entries>".

Can I recover a deleted commit if I know its hash?

Yes! If the object still exists (not garbage collected), use git checkout <hash> or create a branch: git branch recover <hash>. Use git reflog to find lost commits.

Previous: Automated Deployments Next: Advanced Rebase