-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API endpoint /nmdcschema/{collection_name}
's next_page_token
doesn't yield next page's results
#806
Comments
@dwinston @eecavanna - not sure if I've assigned the right folks, just wanted to make sure this was going to get the attention of the appropriate people. |
Testing this with a different collection and I'm seeing the same results. Honing in a bit, I don't think the next_page_token is actually giving the next page's results in this endpoint.
returns a They return the exact same results. Hopefully this helps narrow down the bug. I've updated the title to match. |
functional_annotation_agg
) collectionnext_page_token
doesn't yield next page's results
next_page_token
doesn't yield next page's results/nmdcschema/{collection_name}
's next_page_token
doesn't yield next page's results
Thanks for reporting this, and for doing so this thoroughly. This does seem like a bug to me. I think you have assigned the right people, given our current procedures. In the future, the procedure might change to: assign it to me and I'd do an initial assessment and pull in others as needed, so they can otherwise focus on whatever they've committed to for the sprint; but I'm not aware of that being the standard procedure yet (just a potential one, based upon some recent conversations). Two things:
|
I'll remove @dwinston as an assignee for now and will reassign him with commentary after I do some digging. |
Hi @kheal, I think the issue is that the endpoint is expecting the URL query parameter to be named I see it being named while True:
i = i + max_page_size
print(str(i) + " records processed")
url = f"https://api-dev.microbiomedata.org/nmdcschema/{collection}?&filter={filter}&max_page_size={max_page_size}&next_page_token={next_page_token}&projection={fields}" I assume the endpoint is not seeing that token and so is always returning the first page of results (and the same Will you retry with the parameter being named - url = f"https://api-dev.microbiomedata.org/nmdcschema/{collection}?&filter={filter}&max_page_size={max_page_size}&next_page_token={next_page_token}&projection={fields}"
+ url = f"https://api-dev.microbiomedata.org/nmdcschema/{collection}?&filter={filter}&max_page_size={max_page_size}&page_token={next_page_token}&projection={fields}" |
/nmdcschema/{collection_name}
's next_page_token
doesn't yield next page's results/nmdcschema/{collection_name}
's next_page_token
doesn't yield next page's results
I confirmed that, when I name the query parameter $ curl -X GET 'https://api-dev.microbiomedata.org/nmdcschema/workflow_execution_set?&filter=%7B%22type%22%3A%22nmdc%3AMetaproteomicsAnalysis%22%7D&max_page_size=10&projection='
{"resources":[{"id":"nmdc:wfmp-11-1ky3j817.1", ...}, ...],"next_page_token":"nmdc:sys0dyggry73"}%
$ curl -X GET 'https://api-dev.microbiomedata.org/nmdcschema/workflow_execution_set?&filter=%7B%22type%22%3A%22nmdc%3AMetaproteomicsAnalysis%22%7D&max_page_size=10&projection=&page_token=nmdc:sys0dyggry73'
{"resources":[{"id":"nmdc:wfmp-11-6n1xme37.1", ...}, ...],"next_page_token":"nmdc:sys0jjz9ym68"}%
$ curl -X GET 'https://api-dev.microbiomedata.org/nmdcschema/workflow_execution_set?&filter=%7B%22type%22%3A%22nmdc%3AMetaproteomicsAnalysis%22%7D&max_page_size=10&projection=&page_token=nmdc:sys0jjz9ym68'
{"resources":[{"id":"nmdc:wfmp-11-f6sfn088.1", ...}, ...],"next_page_token":"nmdc:sys06714m695"}% |
User error is my favorite type of bug - sorry for the unnecessary chatter. I'll test and close the issue after fixing the query filter. Maybe the real issue is that there is no feedback for sending incorrect parameter keys. |
Thanks for including the reproduction info! I think that really reduced the time involved in identifying the root cause.
I'm on my way to a meeting from 2-3pm. I'll file a ticket about that after the meeting in case one doesn't exist by then. |
Describe the bug
When using pagination in the "/nmdcschema/{collection_name}" endpoint (with the
functional_annotation_agg
collection), the results always contain anext_page_token
causing an infinite loop in fetching scripts.To Reproduce
Expected behavior
With
max_page_size = 0
, we fetch 25118 records found in 0.51 seconds.While pagination may take longer, I expect to retrieve the same number of records before the next_page_token is null and the loop breaks. Instead, if
max_page_size = 5000
, the loop never seems to break (I got it up to >100,000 records before recording this bug).While this isn't an immediate blocker, we need some mechanism to fetch these records with pagination to sustain the aggregator.
Acceptance Criteria
functional_annotation_agg
endpoint should result in the same number of records as non-paginated results.Additional context
We use this API call for gathering the IDs of the previously-aggregated workflows (see microbiomedata/nmdc-aggregator#27)
The text was updated successfully, but these errors were encountered: