Natural-language search #84

Robsteranium · 2021-05-05T14:29:57Z

This extends the ideas from the original mockups.

#25 would gives us semantic search - i.e. code search within facets via sentence embeddings.

We could build upon these embeddings to do entity linking across all codes. This would ignoring the dimension-property configuration we have currently for each facet - instead the recognised entities would define the set of dimensions involved for a given search.

The UI would start by presenting an open-text search, much like google.

Any entities we're able to recognise (and link to resources in our database) would form columns (like the facets we have now), labelled as per the user's query.

The above shows each column also describing the dimension (i.e. that Germany has been interpreted with the Partner Geography dimension). We've since included the dimension in each cell. In any case we might want to allow the user to see/ customise the interpretation in the edit dialogue.

This would require an advanced natural-language-understanding pipeline, but would obviate the need to curate Q&A forms (#82) or a facet configuration (thus it would automatically work across all families).

Robsteranium · 2021-05-21T13:55:10Z

We don't necessarily need to have embeddings or entity-recognition to try this.

We could start with a simple NLP pipeline that would:

receive open text query as input e.g. "import cars from Germany 2019"
tokenise it e.g. ["import" "cars" "from" "Germany" "2019"] (possibly with downcasing)
remove stop words e.g. ["import" "cars" "Germany" "2019"]
for each token, find codes whose labels match

We could then derive facet-selections from the output (identifying via traversal code --> scheme --> dimension --> parent --> facet).

Ultimately we could do away with the facet configuration by creating facets on the fly based upon the parent (or orphan) dimensions entailed by the query (so you could edit the results with the matched parent-dimensions or expand the range of parent-dimensions with a further query).

In the meantime it would be useful to track which matches didn't fit into facets - not least for improving the property hierarchy.

Robsteranium · 2021-05-21T14:12:07Z

Indeed we might be able to implement that pipeline as an ES Analyser - then it'd just be a single match query against the codes index.

Robsteranium · 2021-07-06T10:39:39Z

I've created https://github.com/Swirrl/cogs-issues/issues/289 for moving this forward.

Robsteranium · 2021-11-29T15:49:05Z

We've now agreed to go with the google-style UI. We might return to a faceted comparison later.

I suggest we create new view for now (instead of deleting lots of code).

We basically want a single search box with one cube per results. The rich snippet beneath each shows the dimensions and a sample of values that match.

We could extend this to show all dimensions present in the cube (which could play a role in the decision between cubes). We could use a visual cue (like a background colour) to highlight those that matched. Alternatively we could only show those dimensions that matched.

We could extend highlighting even further to show matches within code labels using the highlighting feature from Elasticsearch.

Note that this UI doesn't need facets necessarily. Options:

Retain facets and facet configuration, show facets on rich snippets. This would mean more consistency across cubes (each would show the same set of dimensions with the same facet label (instead of dataset-specific variations).
Drop facets and show parent dimensions on snippets. This would mean cubes could show a different set of dimensions - being less consistent but allowing us to include any cubes and not just those for which facets had been configured. The labelling would at least be consistent (as we're looking at parent dimensions).
Drop facets and show dataset-specific dimensions. This requires no harmonisation (of parent dimensions) or facet configuration. The UI would show all the different labels used in different datasets.

We might also look to extend the search to match against non-cube-structural elements like the dataset description etc.

Robsteranium · 2021-12-07T16:56:16Z

Robsteranium mentioned this issue Dec 10, 2021

Global search #112

Merged

Robsteranium mentioned this issue Feb 22, 2022

Highlight closed levels with selections #106

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Natural-language search #84

Natural-language search #84

Robsteranium commented May 5, 2021

Robsteranium commented May 21, 2021 •

edited

Loading

Robsteranium commented May 21, 2021

Robsteranium commented Jul 6, 2021

Robsteranium commented Nov 29, 2021

Robsteranium commented Dec 7, 2021 •

edited

Loading

Natural-language search #84

Natural-language search #84

Comments

Robsteranium commented May 5, 2021

Robsteranium commented May 21, 2021 • edited Loading

Robsteranium commented May 21, 2021

Robsteranium commented Jul 6, 2021

Robsteranium commented Nov 29, 2021

Robsteranium commented Dec 7, 2021 • edited Loading

Robsteranium commented May 21, 2021 •

edited

Loading

Robsteranium commented Dec 7, 2021 •

edited

Loading