Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support operating on sparse-checkout of cdnjs #75

Closed
klausenbusk opened this issue Jul 6, 2020 · 2 comments
Closed

Support operating on sparse-checkout of cdnjs #75

klausenbusk opened this issue Jul 6, 2020 · 2 comments

Comments

@klausenbusk
Copy link
Contributor

klausenbusk commented Jul 6, 2020

Cloning the cdnjs repository is a very expensive operation, as the repository is very big (11GB (compressed) at the time of writing, 100GB+ checked out) and contains a lot of files. Often we only need the package.json files and the directory tree, but not the actual file content of all the stored versions. If we ever want to switch to ex: GitHub Actions (cdnjs/bot-ansible#11) we need this.

Combing partial clone and sparse checkout is much faster:

$ time git clone --filter=blob:none --sparse [email protected]:cdnjs/cdnjs.git
[...]
real	1m18,731s
$ cd cdnjs
$ time git sparse-checkout add '**/package.json'
[..]
real	0m24,713s
$ find | head -n 10
.
./ajax
./ajax/libs
./ajax/libs/zxcvbn
./ajax/libs/zxcvbn/package.json
./ajax/libs/zurb-ink
./ajax/libs/zurb-ink/package.json
./ajax/libs/zumper-angular-payments
./ajax/libs/zumper-angular-payments/package.json
$ du -hs
1,1G	

Looking at the code, we need to change this code, so it also checks the tree (git ls-tree) (something similar to #67):

tools/cmd/autoupdate/git.go

Lines 113 to 116 in 1928a3c

if _, err := os.Stat(pckgpath); !os.IsNotExist(err) {
util.Debugf(ctx, "%s already exists; aborting\n", pckgpath)
continue
}

This code need to run git sparse-checkout add <PATH> first (if core.sparseCheckout is true):
// CalculateVersionSRIs calculates SRIs for the files in
// a particular package version.
func (p *Package) CalculateVersionSRIs(version string) map[string]string {

This code need to be changed, so it doesn't get the existing versions from the filesystem but from the tree (git ls-tree):

tools/packages/git.go

Lines 15 to 18 in 1928a3c

// GitListPackageVersions first lists all the versions (and top-level package.json)
// in the package and passes the list to git ls-tree which filters out
// those not in the tree.
func GitListPackageVersions(ctx context.Context, basePath string) []string {

^ @xtuc was it implemented this way on purpose? (6138172#diff-2538e4c3ee0b09db855e7569a1865cc2R16) Why do it stat the filesystem, wouldn't it be easier just to rely on git ls-tree?

/refs cdnjs/bot-ansible#11

@xtuc
Copy link
Member

xtuc commented Jul 7, 2020

The bot wasn't designed by me/us, I rewrote it.
To be clear, we are not planning to optimize git because we are working to move away from it eventually.

Would you mind sending me an email at: sven at cloudflare com, so we can discuss about the plans

@klausenbusk
Copy link
Contributor Author

To be clear, we are not planning to optimize git because we are working to move away from it eventually.

Fair enough :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants