-
-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Download all datasets contained in all R-packages #185
Comments
I asked on twitter if there are ways to do this without having to install the packages. This is the best answer I got: Seems pretty promising 😃 |
Thanks Heidi! devtools::install_github("gaborcsardi/gh")
library(gh)
repos = gh("GET /search/code?q=user:cran+extension:rda")
catf("#Repos: %i", repos$total_count) This way we can download the rda files only, e.g., via Another point: why should we avoid downloading all packages by the crawler. Is it because of time and memory? We can simply download each package, extract the data sets, upload to OpenML and remove the package afterwards. The time aspect is unimportant. The crawler does not need to be fast. |
Started to work on a crawler which operates on the github cran repositories and reads 1) the data itself and 2) metadata from the corresponding Rd file. Works well so far. Just need to parallelize stuff and handle potential errors. |
@jakobbossek I'd love to see the results of your crawler/experiment. Did you publish it? |
A huge collection can be found here http://vincentarelbundock.github.io/Rdatasets/datasets.html |
Closing as this should not be part of the R package. It's a separate project, i.e., writing a bot that crawls data sets and uploads them to openml. |
We can do something like (ugly code) and then upload everything
The text was updated successfully, but these errors were encountered: