-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NOT and other search operators in queries #44
Comments
Not at this time.
Right now the best way to accomplish that is to hit the get_all_values_for_tag endpoint (example) and then apply your own string filter against the results.
The results from the above endpoint will have the full list of values currently present in entries.
You can accomplish that with the above query as well. Just check that all the values of '_Entity.Polymer_type' in the list for a given ID are all of type 'polyribonucleotide'.
The enumeration query returns the allowed values we have in the dictionary for that tag. You are correct that for this tag that method would give you all the possible values. For other tags that have non-mandatory enumerations, the tag could potentially have other additional user-specified values, in which case it would be best to get the full list from the
You questions were very well-researched. I hope my response is helpful; feel free to reach out in the future with other questions or suggestions. Cheers, |
Many thanks, Jon. This helped to clarify the situation. I do have a few more questions. I'll state them here, but please tell me if you prefer one topic per opened issue (labelled as a question) in the future.
Earlier code relied on that information to sync a certain subset of entries to a local repo, but I have found the API to be far quicker and superior in comparison to the other alternatives.
Excuse me If some of these questions are a bit confused. After reading some of the recommended "publications describing the star format" and after browsing the NMR-STAR documentation/schema, I feel that I still haven't fully grasped the NMR-STAR format. I do have an OK understanding of the syntax/structure of a STAR-file in general. It is with the introduction of NMR data that I start to have trouble. Do you have any recommended reading? Anyways, once again, thank you for doing this work, Jon. We are benefitting a lot from the BMRB, and hopefully, now also the API! Cheers, Noah |
They can get modified without changing the ID, however that rarely happens to core data (if ever). Database links, for example, get modified periodically by BLAST searches.
In theory: molecular dynamics is currently modeled with multiple assemblies in the same entry. We don't have any data, though, except for one or two handmade test-case entries.
Good question. In the previous version of NMR-STAR assembly was called "molecular system", if that helps.
Multiple experiments on the same assembly are common, assembly is not modified.
Yes. I'd say it works bottom-up: you have ligands like, say, zinc, and monomers e.g. alanine. Those link into entities, which typically would have a residue sequence. One or more entities form an assembly, e.g. a dimer. OTOH you could describe a mixture of isomers as an assembly, too. |
Okay, that clears up the confusion about Entries and Assemblies.
Good to know. I need the DB links, although the ones found through BLAST might not be of the highest importance. In case I need the last modified info in the end, I assume the best way of accessing it (if it is not accsesible through the API) would be to send a request to http://www.bmrb.wisc.edu/ftp/pub/bmrb/entry_lists/nmr-star3.1/? Side note: I get 500: Internal Server Error seemingly randomly right now. Every other request seem to fail. Both from within Python, with or without headers, and in the web browser, on multiple computers. |
Apologies for my delay -
Either is fine.
No. For bulk access of the most recent version of all our entries, it is probably best to use the FTP interface to keep an up-to-date local mirror rather than using the API to see which entries have changed. Running rsync -av --include='*_3.str' --include='*/' --exclude='*' www.bmrb.wisc.edu::bmrb_entries /tmp/res Will download the NMR-STAR files for all of our entries. Remove the 'include' and 'exclude' arguments to get the full entry directories. Running the command again will synchronize your local directory with any changes that have happened on the server. Does your workflow require you be notified when certain entries change, or you just want to stay up to date? Because I see how we don't have a great way to support the former, and that is something I could potentially add to the API if needed. Also something to note is that if we do make major changes to an entry (author submits corrections after release, for example), we will add a row to the
We have a publication in process that is an overview of NMR-STAR. I'll update you when it's available.
Thanks for the report! We were doing some maintenance yesterday that caused this issue. It is now resolved. Cheers, *edited to suggest using rsync as per @dmaziuk's suggestion rather than using wget. |
We've been going back and forth on BLAST DB links to some extent: e.g. we used to update "last queried" tag, but that messes up bulk downloads because it updating every file's timestamp... Anyway, full BLAST results are available in Entry directories are also available via |
Thanks again for your replies.
The latter is fine in my case, no need to be updated on each modification. I only need to re-run and update every week or so, and I only need at most a few hundred entries each time (but often many less than that - maybe 2-10 entries). It is important that the DB links are up to date, however. If a previously downloaded entry has been updated I would want to fetch it again on the next update.
Initially I wanted to go the rsync route, but I am also working with something which is supposed to be as platform independent as possible. Trying to make rsync portable, lightweight and platform independent wasn't something I wanted to jump into. As of right now, an "inventory list" with file size and last mod information is created from a request to http://www.bmrb.wisc.edu/ftp/pub/bmrb/entry_lists/nmr-star3.1/, while the rest is done through the API (and locally). |
Hi again, @dmaziuk and @jonwedell. Another question has popped up in our lab. Namely, does the BMRB keep a log of which entries are added, removed and modified? We're specifically looking for that information with regards to the time period 2017-present. If not, do you know personally if there's been a lot of added as well as deleted or updated RNA entries in that period? |
Hey @lucubrator - For our macromolecule entries, we do track when entries are released, withdrawn, and updated with new information from the author in an entry-tracking database. In addition, minor updates, usually related to typo fixes or minor internal changes can also happen without a record in the release database. Except when trivial, those changes are tracked in the _Release loop present in each entry, which is publicly available. (And to go even further, we track every single change that ever happens to our entries using version control software, though we don't currently offer a way for users to access anything but the most recent version of an entry.) I can say that very few entries are withdrawn (what we call what you refer to as "deleted") and major updates after release are relatively rare. You can see which entries have been withdrawn here: https://bmrb.io/data_library/withdrawn.shtml It is possible to calculate specific figures if that would further your research. If so, please reach out to us at [email protected] with the specific data you want, and one of us can write the appropriate query to provide the data you are looking for. |
Is there a way of putting in a NOT operator in a query?
For example, let's say I want to retrieve a list of all entries which does not have any instance of
_Entity.Polymer_type'='polyribonucleotide'
.Is it possible to do something like:
search/get_id_by_tag_value/_Entity.Polymer_type/[!]polyribonucleotide
Also, what would be the recomended way of retrieveing a list of ALL entries where ALL entities have, let's say,
_Entity.Polymer_type'='polyribonucleotide'
? Is there some kind of search operator/switch for that?If not how would one get a list of all possible values for the
_Entity.Polymer_type
tag? I assume you would use the Get tag enumerations (GET) query. I guess one could then take the union between all the sets of entries where_Entity.Polymer_type
is what you don't want it to be, and then take the assymetric difference between (a), this union set, and (b), the set of entries which contain at least one entity with'polyribonucleotide'
. Or is there a faster and more efficient way?I think I have read through the README/docs here, but I might have missed something on http://www.bmrb.wisc.edu. Appologies if that is the case.
The text was updated successfully, but these errors were encountered: