feat: Improve search #1022

bprusinowski · 2023-04-14T14:55:03Z

Goals / Scope

The goal of this PR is to improve the search, especially when it comes to finding exact matches.

Description

I've noticed that sometimes it's hard to find a particular dataset, especially when there is a lot of datasets with similar properties, like the NFI ones. I was developing some features using the NFI: All target values by Lower/higher altitudinal zones dataset, so I searched for it several times and was surprised that when I typed almost full name of the dataset, I needed to scroll down (a lot) to find it, while datasets on top were much worse matches (or at least seemed like).

I started to dig in and realised that there are several possible improvements to help with such cases:

do not remove stop words from query before we compute scores,
once above is true, we are able to test if cube title / description, etc include a full query (and we can add bonus points if such scenario happens),
we still remove stop words, but only to prevent giving scores for them,
I've noticed that we fetch metadata for cubes in all languages, which is fine, as we can miss some properties in one language and we should still show the cube in the search results, but – we were overwriting scores for a given cube with score of the last language row we iterated over, which led to slightly inconsistent results. Now we overwrite the score for a given cube only if it's higher than the current score + we add bonus points for results in user's current language.

How to test

Exact match

Set page language to en and search for a given string: all target values by lower/higher (drafts on, INT).
See that the NFI: All target values by Lower/higher altitudinal zones cube appears on top.
Follow the same approach on TEST to see that the cube appear at the bottom of the list (~100/161 places).

Language bonus points

Set page language to en and search for a given string: with hierarchie (drafts on, INT).
See that the English result appears on top.
Follow the same approach on TEST to see that the German result appears first.

vercel · 2023-04-14T14:56:59Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
visualization-tool	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Apr 14, 2023 3:10pm

ptbrowne · 2023-04-14T15:17:53Z

LGTM 👏 Thanks for the clear explanation !

feat: Improve search

daeb901

bprusinowski requested a review from ptbrowne as a code owner April 14, 2023 14:55

vercel bot deployed to Preview April 14, 2023 15:00 View deployment

fix: Search test

e176adf

vercel bot deployed to Preview April 14, 2023 15:10 View deployment

bprusinowski merged commit c6b54b5 into main Apr 14, 2023

bprusinowski deleted the feat/improve-search branch April 14, 2023 15:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Improve search #1022

feat: Improve search #1022

bprusinowski commented Apr 14, 2023 •

edited

Loading

vercel bot commented Apr 14, 2023 •

edited

Loading

ptbrowne commented Apr 14, 2023 •

edited

Loading

feat: Improve search #1022

feat: Improve search #1022

Conversation

bprusinowski commented Apr 14, 2023 • edited Loading

Goals / Scope

Description

How to test

Exact match

Language bonus points

vercel bot commented Apr 14, 2023 • edited Loading

ptbrowne commented Apr 14, 2023 • edited Loading

bprusinowski commented Apr 14, 2023 •

edited

Loading

vercel bot commented Apr 14, 2023 •

edited

Loading

ptbrowne commented Apr 14, 2023 •

edited

Loading