Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloads refactor - use a common downloader across all APIs #87

Merged
merged 1 commit into from
May 1, 2016

Conversation

tarrow
Copy link
Contributor

@tarrow tarrow commented Mar 22, 2016

Refactor code to put common downloader code all together.

Downloads are still async but now have 10 workers rather than as many workers as there are downloads.

Also fixed an area where I could see a possible bug occuring (but didn't see in the wild) where the ids for arxiv files were found independent of the pdf and supplementary files urls which means that if a pdf (or supp files) url was missing then paper would be put in a folder with the wrong id.

We handle retries within each download not in bulk at the end. This could be adapted if we want.

@blahah blahah added the WIP label Mar 29, 2016
@blahah blahah changed the title Updateurldownloader Update url downloader Mar 29, 2016
@blahah blahah changed the title Update url downloader Downloads refactor - use a common downloader across all APIs Mar 29, 2016
@blahah
Copy link
Member

blahah commented Apr 8, 2016

@tarrow how close is this to being ready to review?

@tarrow tarrow force-pushed the updateurldownloader branch from 6328361 to d976126 Compare April 8, 2016 18:14
@blahah blahah added RTR and removed WIP labels Apr 8, 2016
@tarrow
Copy link
Contributor Author

tarrow commented Apr 12, 2016

Any thoughts on this yet? I think the ability to download an arbitrary number of papers without it timing out is quite a good thing to get working. Please let me know if I can improve any bits of it :)

This was referenced Apr 24, 2016
@blahah blahah merged commit d6052ba into ContentMine:master May 1, 2016
@blahah
Copy link
Member

blahah commented May 1, 2016

All looks good @tarrow! We can refine later, but for now let's just :shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants