Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sorting by frequency in disease to phenotype table not working correctly #910

Closed
ptgolden opened this issue Nov 22, 2024 · 6 comments · Fixed by #911
Closed

Sorting by frequency in disease to phenotype table not working correctly #910

ptgolden opened this issue Nov 22, 2024 · 6 comments · Fixed by #911

Comments

@ptgolden
Copy link
Member

Sorting by frequency in a disease to phenotype table can occasionally cause confusing and incorrect results. For an example, show 100 results at a time in the association table for MONDO:0007523 and try sorting by frequency. Neither ascending nor descending sort seems to do anything.

For an example with more associations with frequency values attached, try MONDO:0958235. By default, it is sorted by frequency. But after explicitly sorting by frequency by click on the table header, things get wonky.

@ptgolden
Copy link
Member Author

The issue is that the Solr query sorts by frequency_qualifier, while the frequency bar is populated by has_count and has_total. This makes sense-- the former is an HPO qualifier, the latter HMIM-- but visually, it is not intuitive.

@ptgolden
Copy link
Member Author

ptgolden commented Nov 22, 2024

After a conversation with @amc-corey-cox, I see what's going on. Disease-phenotype associations from HPOA have either a categorical frequency_qualifier OR discrete frequency values (either count/total or percentage%). See https://pmc.ncbi.nlm.nih.gov/articles/PMC6324074/table/tbl2/). When sorting associations by frequency in the API, only the categorical frequency is used. More precise measurements are not taken into account.

...Additionally, there is a bug when rendering the frequencies in the association table that was introduced in 400ab72. There's code to deal with all the different cases of frequency measurements, but this is how it checks them:

if (row.has_count !== undefined && row.has_total !== undefined) {
return `${row.has_count} of ${row.has_total} cases`;
}

In the row object, the absence is a value is marked by null. All of those checks should be changed either to if (row.has_count !== null) or if (row.has count != null) (note this is using loose equality which is probably only ever properly used if you expect a value to be either undefined or null).

Since null !== undefined is true, the first branch is taken in all of the comparisons. Hence a bunch of tools tips saying that frequency is "null of null cases".

@ptgolden
Copy link
Member Author

Aaaand that derived frequency is already added to the index: https://github.com/monarch-initiative/monarch-ingest/blob/a4894f496633ed800b779f1b2df669b41838bd50/scripts/frequency_update_processor.js! The bug then is that the table should call a sort to the derived frequency rather than frequency_qualifier.

@ptgolden
Copy link
Member Author

LAST BUG and then this will be working fine. @kevinschaper, it seems that frequency_qualifier_label is always null, any idea why this is? See https://monarchinitiative.org/v3/api/entity/MONDO:0007523/biolink:DiseaseToPhenotypicFeatureAssociation?direct=false&limit=100&offset=0&query=&sort=frequency_computed_sortable_float+asc&traverse_orthologs=false for an example.

@ptgolden
Copy link
Member Author

I imagine it's something to be added in the HPO ingest

@ptgolden
Copy link
Member Author

Or, maybe not, since frequency_qualifier_label is not a part of biolink:DiseaseToPhenotypicFeatureAssociation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant