Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make gene search a little fuzzier #94

Closed
adf-ncgr opened this issue Oct 26, 2017 · 8 comments
Closed

Make gene search a little fuzzier #94

adf-ncgr opened this issue Oct 26, 2017 · 8 comments
Assignees

Comments

@adf-ncgr
Copy link
Contributor

Most people will probably not think to query using our extended naming conventions, so it would be nice to allow a search for Glyma.02G107500 to yield the same result as a search for glyma.Glyma.02G107500. We will of course then be forced to deal with the potential for ambiguity, ie not just "none found" or "one found" but also ">1 found"; one could argue that the latter would be an opportunity to shuttle them off to the multi view, but I'm not sure that would be the most intuitive approach. I'm not suggesting we go much further than LIKE %Glyma.02G107500% although we may someday want to deal on the backend with the fact that there are a few ways to reference a gene (e.g. by name or uniquename aka ID, maybe also "symbols" like Dt1).

self-assigning this, though not committing to an epic or milestone just yet

@adf-ncgr adf-ncgr self-assigned this Oct 26, 2017
@adf-ncgr
Copy link
Contributor Author

adf-ncgr commented Oct 26, 2017

follow-up thought to self; in the lorax context, the current practice is to prefix USR to sequences that have come in from without (ie userland). For cases where we are augmenting pre-built trees in legumeinfo with other sequences in a resource that has GCV-capabilities, it would be nice to find a way to include them in the set of children nodes to be multiviewed (e.g. if the info about the datasource could be attached and preserved- think about this one in connection with LegumeFederation/Interface-and-Usability#5 where it would be most relevant)

@alancleary
Copy link
Contributor

Perhaps we could add fuziness while addressing the issue of possible ambiguity by having a dropdown of potential matching genes appear and refine as you type in the gene search widget. This could even handle the unlikely case where multiple providers have a gene with the same name by indicating which prodivers each potential match corresponds to! Seriously though, this would put the task of resolving ambiguity on the user rather than the providers, making the overall search behavior more consistent.

@adf-ncgr
Copy link
Contributor Author

adf-ncgr commented Dec 13, 2017

federated auto-completion sounds like a good PhD project for a distributed-computing-AI student!
or were you more suggesting that the initial fuzzy search be done with whatever they entered and then the fuzziness-resolution would be done client-side on the aggregated multi-results using auto-complete against the restricted subset in that list? the latter sounds more reasonable, and probably what you were suggesting in the first place?

@alancleary
Copy link
Contributor

I actually meant the prior; I don't think implementing auto-completion would be too difficult (famous last words), the crux would be handling responses from multiple servers, which we've already done elsewhere in the app. The distinction here is that the results would be "streamed" to the auto-completion box as they come in, rather than aggregating them all before updating the UI, as our current multi-provider methods do. I don't think there's a n issue for this, but I believe we've fantasized about switching our existing multi-provider code to such a "streaming" method.

@adf-ncgr
Copy link
Contributor Author

well, I am not sure auto-completion of gene search is high on my wishlist in the first place; I was more concerned with the fact that people are likely to come to GCV with gene names that are unprefixed substrings of the LIS naming scheme (ie Glyma.02G107500 instead of glyma.Glyma.02G107500), which I am not sure "classical autocomplete" would even help with.

In addition to unprefixed names, there is also the potential for dealing with "synonyms", like GmTfl1 instead of Glyma19g37890 per http://www.pnas.org/content/107/19/8563.full) or "old version naming" (like Glyma.19g194300 in Williams82.gnm2.ann1 instead of Glyma19g37890 in Williams82.gnm1.ann1 per https://soybase.org/sbt/search/search_results.php?category=FeatureName&version=Glyma2.0&search_term=Glyma.19g194300).

Obviously, the ability to do the more complex types of identifier translations will be service implementation dependent. I guess we might want to think about whether any of this will merit
a change to the service API before we dive in deeper, though.

@alancleary
Copy link
Contributor

Ah, I see. Perhaps it's time to add a new view, specifically, one that users get ridirected to when the provided gene is invalid. This could have a list of fuzzy matches.

@abretaud
Copy link
Contributor

There is also the use case where you come to the context viewer following a link with a gene id (from a tripal gene page for example). It would be nice to have a landing page that would look for matching genes, and redirect to the good one if there's a perfect match, or show a list of possibilities if there are multiple

@alancleary
Copy link
Contributor

Fuzzy search was introduced in epic #212. While you still need to provide the gene view (called search/multi when this issue was first opened) with exact gene identifiers, you can now use the search widget (or the search url) to provide one or more genes and/or regions to be fuzzily searched for.

Redirecting to a context view in the case of getting exact hits hasn't been implemented due to the complexity introduced by the asynchronous federation of the search. This may be implemented in the future if there's enough demand for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants