Transcripts end point broken when using a replica set #557

julie-sullivan · 2021-06-14T08:55:00Z

CellBase (for Transcripts ONLY) is sorting after the pagination. It must sort before the SKIP and LIMIT are being applied. If there is a replicaset present, then the query results will be incorrect.

In the database adapter I did this:

Bson sort = MongoDBQueryUtils.getSort(options); << get SORT from query
options.remove(QueryOptions.SORT); << remove SORT so we don't sort twice!

aggregateList.add(match);
aggregateList.add(sort); << Add SORT here, right after genes
aggregateList.add(unwind);
aggregateList.add(match2);
aggregateList.add(excludeAndInclude);
aggregateList.add(project);

I also tried sort after the projection. The files were the same but they were truncated as the SORT failed:

Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting.

You can opt in to external sorting: https://docs.mongodb.com/manual/reference/command/aggregate/#std-label-aggregate-cmd-allowDiskUse

Going to test this.

The text was updated successfully, but these errors were encountered:

julie-sullivan · 2021-06-25T10:19:21Z

cellbase_transcript_client.search(
    biotype=self._relevant_biotypes,
    include=self.CELLBASE_TRANSCRIPT_QUERY_INCLUDE,
    assembly=self._assembly,
    annotationFlags=InterpretationProcess.GENE_CODE_BASIC_TRANSCRIPT_SET,
    sort='id',
)

if you run this script more than once, you will get different results.

julie-sullivan · 2021-06-25T10:19:53Z

For the above to work, you need an additional index. {"transcripts.biotype":1, id:1}

julie-sullivan self-assigned this Jun 25, 2021

julie-sullivan added the bug label Jun 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcripts end point broken when using a replica set #557

Transcripts end point broken when using a replica set #557

julie-sullivan commented Jun 14, 2021

julie-sullivan commented Jun 25, 2021

julie-sullivan commented Jun 25, 2021

Transcripts end point broken when using a replica set #557

Transcripts end point broken when using a replica set #557

Comments

julie-sullivan commented Jun 14, 2021

julie-sullivan commented Jun 25, 2021

julie-sullivan commented Jun 25, 2021