Git Submodule · Subtree · LFS — repos inside repos
Git Submodule · Subtree · LFS — repos inside repos
When a repo needs to carry another repo's code, or when large binary files have to live inside a repo, the options that come to mind are Submodule, Subtree, Sparse Checkout, and Git LFS. Each has different strengths and traps. This article covers their origins, behavior, common pitfalls, and why monorepos tend to avoid submodules.
1. About the four tools
| Tool | Origin | Problem solved |
|---|---|---|
| Git Submodule | Git 1.5.3 (2007) | Include an external repo pinned at a specific commit |
| Git Subtree | git-subtree joined contrib in 2009 | Merge an external repo into a directory; its history blends into the parent |
| Sparse Checkout | Git 1.7 (2010), big improvements in v2.25 | Check out only part of a large repo |
| Git LFS | GitHub, 2015 | Store large binary files in a separate store |
They each address different problems. There's almost no situation where all four apply.
2. Submodule
The parent repo holds only a reference to the child repo. The child's actual files live in the child repo, and the parent only carries metadata that says "this directory is the child repo at SHA xxx".
# Add
git submodule add https://github.com/foo/lib.git vendor/lib
# Creates .gitmodules and clones the child into vendor/lib
# Pull on clone
git clone --recurse-submodules https://github.com/me/parent.git
# Pull submodules into an already-cloned repo
git submodule update --init --recursive
# Bump the child
git submodule update --remote vendor/lib
git add vendor/lib && git commit -m "bump lib"
.gitmodules is the SSOT:
[submodule "vendor/lib"]
path = vendor/lib
url = https://github.com/foo/lib.git
branch = main
| Strengths | Weaknesses |
|---|---|
| Child repo's history stays separate | Clone and checkout flow is involved |
| Pin to an exact commit | New collaborators get confused often |
| Permission separation (mix private and public) | Extra CI configuration required |
3. Subtree
The child repo's contents are physically merged into one directory of the parent. The child's commits land in the parent's history.
# Add
git subtree add --prefix=vendor/lib https://github.com/foo/lib.git main --squash
# Pull child changes
git subtree pull --prefix=vendor/lib https://github.com/foo/lib.git main --squash
# Push parent changes back to the child
git subtree push --prefix=vendor/lib origin-lib main
| Strengths | Weaknesses |
|---|---|
| Cloners get all the code without extra commands | Parent history grows large |
| Plain clone and pull just work | Pushing back to the child is finicky |
No separate metadata like .gitmodules |
Child history mingles into the parent |
4. Sparse Checkout
Pulls only some directories from a large monorepo. Git v2.25's git sparse-checkout brought a major improvement:
git clone --filter=blob:none --no-checkout https://github.com/big/mono.git
cd mono
git sparse-checkout init --cone
git sparse-checkout set apps/web packages/ui
git checkout main
--cone mode operates only at directory granularity — fast and safe. The older non-cone mode supports glob patterns but suffers from performance and stability issues.
This is not about including external repos like submodules; it pulls part of the same repo.
5. Git LFS
Large binaries (images, video, ML weights) compressed and shipped on every commit balloon a repo into the GB range fast. LFS keeps those files on a separate LFS server, leaving only small pointer files in the repo:
# One-time install
git lfs install
# Add tracking
git lfs track "*.psd"
git lfs track "models/*.safetensors"
# .gitattributes is updated
git add .gitattributes
# add · commit · push as usual
git add design.psd
git commit -m "add design"
git push
.gitattributes is the SSOT:
*.psd filter=lfs diff=lfs merge=lfs -text
models/*.safetensors filter=lfs diff=lfs merge=lfs -text
GitHub gives 1 GB free storage and 1 GB / month bandwidth. More needs a data pack purchase.
6. At a glance
| Scenario | Recommendation |
|---|---|
| Pin an external library at an exact commit | Submodule (or just a package manager if that's enough) |
| Fork-style: merge external code into your own repo | Subtree |
| Pull only part of one giant monorepo | Sparse checkout |
| Large binaries — images, video, model weights | LFS |
7. Other paths
Beyond the four:
- Package managers — npm, pnpm, Cargo, and Maven are usually the more natural place for dependency management. Question whether a submodule is really the answer.
- Monorepo tools — Nx, Turborepo, Bazel, Buck. Many packages in one repo, with build and cache management.
- Vendoring — Wholesale-copying external code as if it were your own. You track child updates by hand.
- Workspaces — pnpm / npm / Yarn workspaces. Auto-link dependencies between packages in the same repo.
8. Common pitfalls
Submodule
- New collaborator runs
git cloneand stops there —vendor/libis empty. The README must mention--recurse-submodulesorgit submodule update --init. - Pushing the child without bumping the parent — teammates' machines see the child at an old SHA. Push child →
git add vendor/libin parent → push parent is a single bundle. - CI submodule fetch — GitHub Actions'
actions/checkoutdefaults tosubmodules: false. Setsubmodules: recursiveexplicitly. - detached HEAD — submodule directories sit at detached HEAD by default. Be intentional when entering child work.
Subtree
- Bloated parent history — frequent pulls without
--squashgrow history fast. - Pushing to the child takes practice — having one person own that flow is safer.
Sparse checkout
- Non-cone mode pitfalls — patterns can mis-match and only show some files. Stick to
--conemode. - CI checking out everything — builds tend to be heavier on CI than local.
LFS
- Existing large files after first tracking —
git lfs trackalone doesn't move past commits.git lfs migrate importis needed. - LFS objects on fork — some hosts don't carry LFS objects when forking.
- Free quota — GitHub's 1 GB / month bandwidth is gone after one video. Plan ahead.
9. Why monorepos avoid submodules
In a monorepo holding many packages from the same company / team, submodules cause friction:
- A single PR ends up touching two repos (parent and child).
- CI cache and dependency graph fragment.
- The clone, init, update steps create a steep entry barrier for new people.
Workspaces in a package manager, or Nx / Turborepo, are typically recommended instead. Submodules fit better when pinning external OSS or where permissions / licensing must stay separate.
Closing thoughts
Submodule, Subtree, Sparse Checkout, and LFS group together by category, but each solves a different problem. In a monorepo, workspaces (pnpm, Yarn) or monorepo tools (Turborepo, Nx) cause less friction than submodules. For large binaries, LFS is nearly the standard. Before picking a tool, question whether it's actually needed in this spot.
Next
- (end of tools)
References include git-submodule, Pro Git Submodules, git-subtree, git-sparse-checkout, Git LFS, GitHub LFS quotas, Atlassian — Git Submodules vs Subtree, GitHub Blog — partial clone and shallow clone, Nx, Turborepo, Bazel, and pnpm Workspaces.