-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make datasets more accessible #241
Comments
Hi there, checking in here, is there any update on having the data files available on an S3 bucket? I'd really appreciate it, especially for the 1e9 case which seems to have problems to create see #110 Thank you |
We could make the 50 GB accessible in S3 via multiple gzipped files that users could download and reassemble on their local machines too. That'd let uses download the file in parallel from S3 and limit the massive file problem. Thoughts @jangorecki / @ncclementi? |
Hi, you need to contact h2o support. I am no longer maintainer of the project. |
ok @jangorecki, will do. Thanks for your great contributions on this project. |
Thanks for the excellent work on this project.
I'd like to experiment with the datasets and would rather not have to generate the datasets myself. I've never used R and don't really want to learn at this moment. I'm more interested in looking at stuff like if using broadcast joins would materially impact the Spark benchmarks.
Can you provide downloadable data files? Or can you make the files accessible on S3? I'm making important data files accessible to the community in a S3 bucket, so I'd also be happy to upload them there if that'd help.
Thanks again for building / maintaining this project. Hope I'll be able to contribute!
The text was updated successfully, but these errors were encountered: