Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcripts end point broken when using a replica set #557

Open
julie-sullivan opened this issue Jun 14, 2021 · 2 comments
Open

Transcripts end point broken when using a replica set #557

julie-sullivan opened this issue Jun 14, 2021 · 2 comments
Assignees
Labels

Comments

@julie-sullivan
Copy link
Contributor

CellBase (for Transcripts ONLY) is sorting after the pagination. It must sort before the SKIP and LIMIT are being applied. If there is a replicaset present, then the query results will be incorrect.

In the database adapter I did this:

Bson sort = MongoDBQueryUtils.getSort(options); << get SORT from query
options.remove(QueryOptions.SORT); << remove SORT so we don't sort twice!

aggregateList.add(match);
aggregateList.add(sort); << Add SORT here, right after genes
aggregateList.add(unwind);
aggregateList.add(match2);
aggregateList.add(excludeAndInclude);
aggregateList.add(project);

I also tried sort after the projection. The files were the same but they were truncated as the SORT failed:

Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting.

You can opt in to external sorting: https://docs.mongodb.com/manual/reference/command/aggregate/#std-label-aggregate-cmd-allowDiskUse

Going to test this.

@julie-sullivan julie-sullivan self-assigned this Jun 25, 2021
@julie-sullivan
Copy link
Contributor Author

cellbase_transcript_client.search(
    biotype=self._relevant_biotypes,
    include=self.CELLBASE_TRANSCRIPT_QUERY_INCLUDE,
    assembly=self._assembly,
    annotationFlags=InterpretationProcess.GENE_CODE_BASIC_TRANSCRIPT_SET,
    sort='id',
)

if you run this script more than once, you will get different results.

@julie-sullivan
Copy link
Contributor Author

For the above to work, you need an additional index. {"transcripts.biotype":1, id:1}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant