Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Generate large data corpora (1 to 10 TB) for the big5 workload #490

Open
5 of 6 tasks
gkamat opened this issue Mar 26, 2024 · 0 comments
Open
5 of 6 tasks
Assignees
Labels
enhancement New feature or request

Comments

@gkamat
Copy link
Collaborator

gkamat commented Mar 26, 2024

Description

The big5 workload is available with two sizes of data corpora, 60 GB and a 100 GB. The latter features a more representative timestamp sequence. Larger data corpora would be appropriate for performance testing at scale. This issue is to track generation of such larger corpora.

Initially, a 1 TB corpus will be generated and tested out. OSB scaling and stability will also be relevant in this context. Once this size of corpus can be used effectively, larger corpora, up to 10 TB in size, perhaps with multiple indices will be tackled.

Task Breakdown

@gkamat gkamat added the enhancement New feature or request label Mar 26, 2024
@gkamat gkamat removed the untriaged label Mar 27, 2024
@gkamat gkamat self-assigned this Apr 10, 2024
@gkamat gkamat changed the title Generate 500 GB and 1 TB data corpora for the big5 workload [META] Generate large data corpora (1 to 10 TB) for the big5 workload May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: This Quarter
Status: In Progress
Status: In Progress
Development

No branches or pull requests

1 participant