-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Natural-language search #84
Comments
We don't necessarily need to have embeddings or entity-recognition to try this. We could start with a simple NLP pipeline that would:
We could then derive facet-selections from the output (identifying via traversal Ultimately we could do away with the facet configuration by creating facets on the fly based upon the parent (or orphan) dimensions entailed by the query (so you could edit the results with the matched parent-dimensions or expand the range of parent-dimensions with a further query). In the meantime it would be useful to track which matches didn't fit into facets - not least for improving the property hierarchy. |
Indeed we might be able to implement that pipeline as an ES Analyser - then it'd just be a single match query against the codes index. |
I've created https://github.com/Swirrl/cogs-issues/issues/289 for moving this forward. |
We've now agreed to go with the google-style UI. We might return to a faceted comparison later. I suggest we create new view for now (instead of deleting lots of code). We basically want a single search box with one cube per results. The rich snippet beneath each shows the dimensions and a sample of values that match. We could extend this to show all dimensions present in the cube (which could play a role in the decision between cubes). We could use a visual cue (like a background colour) to highlight those that matched. Alternatively we could only show those dimensions that matched. We could extend highlighting even further to show matches within code labels using the highlighting feature from Elasticsearch. Note that this UI doesn't need facets necessarily. Options:
We might also look to extend the search to match against non-cube-structural elements like the dataset description etc. |
I've begun work on a At the moment this only goes We will need to revise the logic to go: I think we need to tackle the following before we can release it:
|
This extends the ideas from the original mockups.
#25 would gives us semantic search - i.e. code search within facets via sentence embeddings.
We could build upon these embeddings to do entity linking across all codes. This would ignoring the dimension-property configuration we have currently for each facet - instead the recognised entities would define the set of dimensions involved for a given search.
The UI would start by presenting an open-text search, much like google.
Any entities we're able to recognise (and link to resources in our database) would form columns (like the facets we have now), labelled as per the user's query.
The above shows each column also describing the dimension (i.e. that Germany has been interpreted with the Partner Geography dimension). We've since included the dimension in each cell. In any case we might want to allow the user to see/ customise the interpretation in the edit dialogue.
This would require an advanced natural-language-understanding pipeline, but would obviate the need to curate Q&A forms (#82) or a facet configuration (thus it would automatically work across all families).
The text was updated successfully, but these errors were encountered: