Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple SDF files #16

Open
stanstrup opened this issue Nov 4, 2017 · 4 comments
Open

Support multiple SDF files #16

stanstrup opened this issue Nov 4, 2017 · 4 comments

Comments

@stanstrup
Copy link
Collaborator

For example for pubchem.

Multithreading with pbapply would be nice.

See also #1 (comment)

@stanstrup
Copy link
Collaborator Author

If compound_tbl_sdf was internal to createCompDb (so you'd always call createCompDb directly) you could append the sqlite file instead to avoid the memory requirements. This was what I did in my approach for pubchem.

@jorainer
Copy link
Member

jorainer commented Nov 6, 2017

Note: createCompDb does already support to generate a CompDb from multiple input files. The man page does also tell you that you can provide the name(s) of the file(s). I will make it more clearly in the help page.
So far I used lapply to process multiple files - I'll switch to bplapply.

jorainer added a commit that referenced this issue Nov 6, 2017
- Mention multi-file support in createCompDb documentation (issue #16).
@jorainer
Copy link
Member

jorainer commented Nov 6, 2017

OK, I have extended the documentation a little. I've also tried to enable parallel processing, but that's not possible because SQLite/RSQLite does not support concurrent write operations.
I've also tried: https://stackoverflow.com/questions/36831302/parallel-query-of-sqlite-database-in-r and https://www.r-bloggers.com/synchronization-for-r-with-the-flock-package/ but that didn't help either. So, presently it's not possible.

@stanstrup
Copy link
Collaborator Author

Ah yes I tried the exact same things. That's why I ended up doing an sqlite for each SDF and then constructing the final sqlite after the parallel runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants