Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New S3 object storage implementation / BlobStore #1030

Merged
merged 8 commits into from
Aug 26, 2024

Conversation

bryanlb
Copy link
Contributor

@bryanlb bryanlb commented Aug 12, 2024

Summary

Refactors the existing object storage implementation. The current blobfs implementation is overly abstracted, and as such requires a lot of unnecessary list API calls to S3 (#850). This causes significant performance issues as the number of items in the bucket increases.

A clean sheet design was implemented here to reduce the abstractions, and setup the design so that we can take advantage of https://github.com/awslabs/aws-java-nio-spi-for-s3 in a future PR.

Initial performance benchmarks for large S3 buckets (large in this context being ~ 1PB/1.5M items) are up to 40% faster for downloading. Delete operations / chunk expirations have also seen significant speed increases.

This also removes the snapshot path and indexType fields - the former because all data is now accessed via the snapshot ID, and the latter because it is unused/unnecessary.

@bryanlb bryanlb force-pushed the bburkholder/new-object-storage branch from 664aa34 to 5a396a8 Compare August 12, 2024 21:29
@bryanlb bryanlb force-pushed the bburkholder/new-object-storage branch 2 times, most recently from 8085869 to 89b7d25 Compare August 14, 2024 20:44
@bryanlb bryanlb force-pushed the bburkholder/new-object-storage branch from 89b7d25 to 04090c5 Compare August 14, 2024 21:10
@bryanlb bryanlb linked an issue Aug 14, 2024 that may be closed by this pull request
@bryanlb bryanlb changed the title Bburkholder/new object storage New S3 object storage implementation / ChunkStore Aug 14, 2024
@bryanlb bryanlb force-pushed the bburkholder/new-object-storage branch from 1c5205c to 3d7fae1 Compare August 20, 2024 17:06
@bryanlb bryanlb marked this pull request as ready for review August 20, 2024 18:50
@bryanlb bryanlb requested a review from kyle-sammons August 20, 2024 19:17
@bryanlb bryanlb changed the title New S3 object storage implementation / ChunkStore New S3 object storage implementation / BlobStore Aug 21, 2024
@bryanlb bryanlb merged commit ac95ab5 into master Aug 26, 2024
2 checks passed
@bryanlb bryanlb deleted the bburkholder/new-object-storage branch August 26, 2024 17:17
zarna1parekh pushed a commit to airbnb/kaldb that referenced this pull request Sep 5, 2024
* Initial chunk store implementation

* Remove unnecessary object storage code, part 1

* Add listFiles, delete to chunkStore implementation, remove from blobfs

* Remove use of blobfsutils copyFromS3 test helper

* Remove remaining blobfs implementations

* Remove snapshotpath, index type from snapshots

* Refactoring, add documentation, add tests

---------

Co-authored-by: Bryan Burkholder <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snapshot delete failing to delete snapshots for very large buckets
2 participants