Make datasets more accessible #241

MrPowers · 2021-12-29T11:53:54Z

Thanks for the excellent work on this project.

I'd like to experiment with the datasets and would rather not have to generate the datasets myself. I've never used R and don't really want to learn at this moment. I'm more interested in looking at stuff like if using broadcast joins would materially impact the Spark benchmarks.

Can you provide downloadable data files? Or can you make the files accessible on S3? I'm making important data files accessible to the community in a S3 bucket, so I'd also be happy to upload them there if that'd help.

Thanks again for building / maintaining this project. Hope I'll be able to contribute!

ncclementi · 2022-03-23T23:06:24Z

Hi there, checking in here, is there any update on having the data files available on an S3 bucket? I'd really appreciate it, especially for the 1e9 case which seems to have problems to create see #110

Thank you
cc: @jangorecki

MrPowers · 2022-03-24T09:39:28Z

We could make the 50 GB accessible in S3 via multiple gzipped files that users could download and reassemble on their local machines too. That'd let uses download the file in parallel from S3 and limit the massive file problem. Thoughts @jangorecki / @ncclementi?

jangorecki · 2022-03-24T12:37:51Z

Hi, you need to contact h2o support. I am no longer maintainer of the project.

MrPowers · 2022-03-24T13:51:39Z

ok @jangorecki, will do. Thanks for your great contributions on this project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make datasets more accessible #241

Make datasets more accessible #241

MrPowers commented Dec 29, 2021

ncclementi commented Mar 23, 2022

MrPowers commented Mar 24, 2022

jangorecki commented Mar 24, 2022

MrPowers commented Mar 24, 2022

Make datasets more accessible #241

Make datasets more accessible #241

Comments

MrPowers commented Dec 29, 2021

ncclementi commented Mar 23, 2022

MrPowers commented Mar 24, 2022

jangorecki commented Mar 24, 2022

MrPowers commented Mar 24, 2022