Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add code/description how to create a CompDb from MassBank #66

Open
jorainer opened this issue Oct 21, 2020 · 7 comments
Open

Add code/description how to create a CompDb from MassBank #66

jorainer opened this issue Oct 21, 2020 · 7 comments

Comments

@jorainer
Copy link
Member

MassBank releases their databases at regular intervals and shares the data with a rather open license, which makes them an ideal candidate for annotation databases that could be distributed via Bioconductor's AnnotationHub.

Explanation: I'm building so called EnsDb databases for all species for each release of Ensembl. These databases are self-contained SQLite files with gene, transcript, exon and protein annotations and can be downloaded/fetched from AnnotationHub. This is very convenient for the user.

CompDb databases could be distributed in a similar fashion.

What I will try next is to define simple scripts to easily import data from the MassBank (MySQL database) into a CompDb database.

@stanstrup
Copy link
Collaborator

Is there an advantage to this compared to using the SDF from MoNA?

@jorainer
Copy link
Member Author

I can not say for the content. What I like about the MassBank is that a) the license is pretty clear, so data can be (re)shared, b) MassBank makes releases, which allows to "freeze" the data - important for reproducible research and c) extracting the data directly from their database is easier than importing from text files (SDF and/or json).

@jorainer
Copy link
Member Author

OMG - did not expect that. So, MassBank has one compound for each spectrum. Far from being a normalized database :(

@michaelwitting
Copy link
Collaborator

Yes, and the IDs differ between the different labs. Only common thing could be the InChIKey to cross-map, but never tried so far.

@jorainer
Copy link
Member Author

Problem is that not all compounds have an inchikey - which makes it then really tricky. Well, for now I will import the data as is.

@michaelwitting
Copy link
Collaborator

Do all of them have a SMILES? Then the InChIKey could be calculated with this one: https://github.com/CDK-R/rinchi

@jorainer
Copy link
Member Author

Indeed - it seems that all of them have SMILES. Good point - maybe you could chime in here too: MassBank/MassBank-web#266

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants