-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor metaproteomics aggregation script #27
Conversation
@aclum - I'd appreciate any early feedback you have on the general approach, feel free to tag anyone else who should have eyes on this. I'm not sure how exactly we'll be able to set the API bearer tokens as environmental variables, but that's how I've been doing development (I altered the readme to describe what environmental variables we'll need). |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
@aclum. Thanks for the input. I've rewritten to include a call to get a bearer token with the API username and password (set as environmental variables). I've also pulled out the functions we can reuse for the classes into an abstract I'll leave this in draft until the next release since it depends on the migrated database and next schema release. |
@picowatt I'll let you know when this is ready for review - I'm going to incorporate Alicia's comments and test this after new release first. |
This comment was marked as outdated.
This comment was marked as outdated.
With my updated permissions (thanks @eecavanna), I checked that the json.submit endpoint is working as expected. I loaded a single metaP's annotations to dev mongo's @aclum - is there a server/data portal issue to make sure the functional searches/ingests are expecting MetaProteomics records in the |
No, there is not a corresponding nmdc-server ticket yet, would you please make one @kheal ? @eecavanna @dwinston what is the max payload json:submit can handle? @kheal what is the max length for expected aggregation results? |
I wrote the script so that the json:submit only submits one workflow's aggregation results at at time to avoid payload issues - though I haven't testing the full lot yet. I can run the whole script locally to write into dev mongo overnight as a test. |
Associated ticket filed here: microbiomedata/nmdc-server#1468 |
Moving this back into draft. Testing revealed the first API call to be exceptionally slow, attempting to fix now. |
This is a temporary fix
I've implemented a partial fix for this that will likely not work for future subclasses of the
I tested the full run of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks reasonable (my comments are minor and can be ignored if they don't apply).
Also - Is there a way to test this code to make sure it works?
@shreddd I'll add your suggested logging options tomorrow, thanks for that helpful feedback. I've tested the script locally and it successfully loaded the records into the dev mongo. When I reran it, the script did nothing (as expected). |
From <https://www.mongodb.com/docs/v6.0/reference/operator/query/regex/#index-use>: > Further optimization can occur if the regular expression is a "prefix expression", which means that all potential matches start with the same string. This allows MongoDB to construct a "range" from that prefix and only match against those values from the index that fall within that range. > A regular expression is a "prefix expression" if it starts with a caret (^) or a left anchor (\A), followed by a string of simple symbols. For example, the regex /^abc.*/ will be optimized by matching only against the values from the index that start with abc.
This PR will refactor the
generate_metap_agg.py
script to address #26.Overall, the
generate_metap_agg.py
has been refactored toWill not be ready for release until microbiomedata/nmdc-schema#2203 has been merged in (done).