-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make gene search a little fuzzier #94
Comments
follow-up thought to self; in the lorax context, the current practice is to prefix USR to sequences that have come in from without (ie userland). For cases where we are augmenting pre-built trees in legumeinfo with other sequences in a resource that has GCV-capabilities, it would be nice to find a way to include them in the set of children nodes to be multiviewed (e.g. if the info about the datasource could be attached and preserved- think about this one in connection with LegumeFederation/Interface-and-Usability#5 where it would be most relevant) |
Perhaps we could add fuziness while addressing the issue of possible ambiguity by having a dropdown of potential matching genes appear and refine as you type in the gene search widget. This could even handle the unlikely case where multiple providers have a gene with the same name by indicating which prodivers each potential match corresponds to! Seriously though, this would put the task of resolving ambiguity on the user rather than the providers, making the overall search behavior more consistent. |
federated auto-completion sounds like a good PhD project for a distributed-computing-AI student! |
I actually meant the prior; I don't think implementing auto-completion would be too difficult (famous last words), the crux would be handling responses from multiple servers, which we've already done elsewhere in the app. The distinction here is that the results would be "streamed" to the auto-completion box as they come in, rather than aggregating them all before updating the UI, as our current multi-provider methods do. I don't think there's a n issue for this, but I believe we've fantasized about switching our existing multi-provider code to such a "streaming" method. |
well, I am not sure auto-completion of gene search is high on my wishlist in the first place; I was more concerned with the fact that people are likely to come to GCV with gene names that are unprefixed substrings of the LIS naming scheme (ie Glyma.02G107500 instead of glyma.Glyma.02G107500), which I am not sure "classical autocomplete" would even help with. In addition to unprefixed names, there is also the potential for dealing with "synonyms", like GmTfl1 instead of Glyma19g37890 per http://www.pnas.org/content/107/19/8563.full) or "old version naming" (like Glyma.19g194300 in Williams82.gnm2.ann1 instead of Glyma19g37890 in Williams82.gnm1.ann1 per https://soybase.org/sbt/search/search_results.php?category=FeatureName&version=Glyma2.0&search_term=Glyma.19g194300). Obviously, the ability to do the more complex types of identifier translations will be service implementation dependent. I guess we might want to think about whether any of this will merit |
Ah, I see. Perhaps it's time to add a new view, specifically, one that users get ridirected to when the provided gene is invalid. This could have a list of fuzzy matches. |
There is also the use case where you come to the context viewer following a link with a gene id (from a tripal gene page for example). It would be nice to have a landing page that would look for matching genes, and redirect to the good one if there's a perfect match, or show a list of possibilities if there are multiple |
Fuzzy search was introduced in epic #212. While you still need to provide the gene view (called search/multi when this issue was first opened) with exact gene identifiers, you can now use the search widget (or the search url) to provide one or more genes and/or regions to be fuzzily searched for. Redirecting to a context view in the case of getting exact hits hasn't been implemented due to the complexity introduced by the asynchronous federation of the search. This may be implemented in the future if there's enough demand for it. |
Most people will probably not think to query using our extended naming conventions, so it would be nice to allow a search for Glyma.02G107500 to yield the same result as a search for glyma.Glyma.02G107500. We will of course then be forced to deal with the potential for ambiguity, ie not just "none found" or "one found" but also ">1 found"; one could argue that the latter would be an opportunity to shuttle them off to the multi view, but I'm not sure that would be the most intuitive approach. I'm not suggesting we go much further than LIKE %Glyma.02G107500% although we may someday want to deal on the backend with the fact that there are a few ways to reference a gene (e.g. by name or uniquename aka ID, maybe also "symbols" like Dt1).
self-assigning this, though not committing to an epic or milestone just yet
The text was updated successfully, but these errors were encountered: