Dramatically improve cloning speed for contributors #4329
Labels
🤖 aspect: dx
Concerns developers' experience with the codebase
🛠 goal: fix
Bug fix
🔍 ov: meta
Issue spans multiple repos and has sub-issues
🟧 priority: high
Stalls work on the project or its dependents
🧱 stack: documentation
Related to Sphinx documentation
Description
I did some analysis on the size of the repository, originally intended to address concerns about the size of our frontend snapshots, that led to some insights. First, a breakdown of the current repo sizing:
.git
directory: 1.3GBIt also took over 2 minutes to download on my 120 Mbps wireless connection.
Clearly, the history and metadata of the repository are the main contributors to the download size. Additionally, there aren't really any large blobs in particular that we benefit from removing. We simply have a lot of history with a lot of files.
I think for most contributors we should recommend doing a "Partial blobless clone" of the repository using the flag
--filter=blob:none
like so:git clone --filter=blob:none https://github.com/wordpress/openverse.git # or gh repo clone wordpress/openverse -- --filter=blob:none
This results in the following sizing:
.git
directory: 79Mwith a 15 second download time on my 120 Mbps wireless connection.
You can learn more about blobless clones here: https://gist.github.com/leereilly/1f4ea46a01618b6e34ead76f75d0784b#blobless-clones
It basically means that all of the metadata of past commits are present, but not the actual files (blobs). Those will be downloaded on-demand when running
git blame
orgit checkout
of a previous commit.I think we can recommend this strategy to most users in our documentation, and significantly improve their experience.
The text was updated successfully, but these errors were encountered: