Dramatically improve cloning speed for contributors #4329

zackkrida · 2024-05-14T21:45:13Z

Description

I did some analysis on the size of the repository, originally intended to address concerns about the size of our frontend snapshots, that led to some insights. First, a breakdown of the current repo sizing:

Downloaded repo size: 1.4GB
Side of repo .git directory: 1.3GB

It also took over 2 minutes to download on my 120 Mbps wireless connection.

Clearly, the history and metadata of the repository are the main contributors to the download size. Additionally, there aren't really any large blobs in particular that we benefit from removing. We simply have a lot of history with a lot of files.

I think for most contributors we should recommend doing a "Partial blobless clone" of the repository using the flag --filter=blob:none like so:

git clone --filter=blob:none https://github.com/wordpress/openverse.git
# or gh repo clone wordpress/openverse -- --filter=blob:none

This results in the following sizing:

Downloaded repo size: 183MB
Side of repo .git directory: 79M

with a 15 second download time on my 120 Mbps wireless connection.

You can learn more about blobless clones here: https://gist.github.com/leereilly/1f4ea46a01618b6e34ead76f75d0784b#blobless-clones

It basically means that all of the metadata of past commits are present, but not the actual files (blobs). Those will be downloaded on-demand when running git blame or git checkout of a previous commit.

I think we can recommend this strategy to most users in our documentation, and significantly improve their experience.

The text was updated successfully, but these errors were encountered:

sarayourfriend · 2024-05-16T07:01:42Z

By the way @zackkrida I've included this approach in #4343. It's awesome! I've cloned the repository several times in testing the ov bootstrap method and it's so much faster this way.

openverse-bot added this to Openverse Backlog May 14, 2024

openverse-bot moved this to 📋 Backlog in Openverse Backlog May 14, 2024

zackkrida mentioned this issue May 15, 2024

Rough draft of contributor help script #4293

Closed

6 tasks

zackkrida self-assigned this May 15, 2024

zackkrida moved this from 📋 Backlog to 📅 To Do in Openverse Backlog May 15, 2024

sarayourfriend mentioned this issue May 23, 2024

Move test images from the repository #549

Open

zackkrida mentioned this issue Jun 3, 2024

Update docs to recommend blobless cloning strategy #4432

Merged

8 tasks

openverse-bot moved this from 📅 To Do to 🏗 In Progress in Openverse Backlog Jun 3, 2024

zackkrida closed this as completed in #4432 Jun 4, 2024

openverse-bot moved this from 🏗 In Progress to ✅ Done in Openverse Backlog Jun 4, 2024

sarayourfriend mentioned this issue Jun 18, 2024

Make repo installable from offline media #4478

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dramatically improve cloning speed for contributors #4329

Dramatically improve cloning speed for contributors #4329

zackkrida commented May 14, 2024 •

edited

Loading

sarayourfriend commented May 16, 2024

Dramatically improve cloning speed for contributors #4329

Dramatically improve cloning speed for contributors #4329

Comments

zackkrida commented May 14, 2024 • edited Loading

Description

sarayourfriend commented May 16, 2024

zackkrida commented May 14, 2024 •

edited

Loading