Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve search #1022

Merged
merged 2 commits into from
Apr 14, 2023
Merged

feat: Improve search #1022

merged 2 commits into from
Apr 14, 2023

Conversation

bprusinowski
Copy link
Collaborator

@bprusinowski bprusinowski commented Apr 14, 2023

Goals / Scope

The goal of this PR is to improve the search, especially when it comes to finding exact matches.

Description

I've noticed that sometimes it's hard to find a particular dataset, especially when there is a lot of datasets with similar properties, like the NFI ones. I was developing some features using the NFI: All target values by Lower/higher altitudinal zones dataset, so I searched for it several times and was surprised that when I typed almost full name of the dataset, I needed to scroll down (a lot) to find it, while datasets on top were much worse matches (or at least seemed like).

I started to dig in and realised that there are several possible improvements to help with such cases:

  • do not remove stop words from query before we compute scores,
  • once above is true, we are able to test if cube title / description, etc include a full query (and we can add bonus points if such scenario happens),
  • we still remove stop words, but only to prevent giving scores for them,
  • I've noticed that we fetch metadata for cubes in all languages, which is fine, as we can miss some properties in one language and we should still show the cube in the search results, but – we were overwriting scores for a given cube with score of the last language row we iterated over, which led to slightly inconsistent results. Now we overwrite the score for a given cube only if it's higher than the current score + we add bonus points for results in user's current language.

How to test

Exact match

  1. Set page language to en and search for a given string: all target values by lower/higher (drafts on, INT).
  2. See that the NFI: All target values by Lower/higher altitudinal zones cube appears on top.
  3. Follow the same approach on TEST to see that the cube appear at the bottom of the list (~100/161 places).

Language bonus points

  1. Set page language to en and search for a given string: with hierarchie (drafts on, INT).
  2. See that the English result appears on top.
  3. Follow the same approach on TEST to see that the German result appears first.

@bprusinowski bprusinowski requested a review from ptbrowne as a code owner April 14, 2023 14:55
@vercel
Copy link

vercel bot commented Apr 14, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
visualization-tool ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 14, 2023 3:10pm

@ptbrowne
Copy link
Collaborator

ptbrowne commented Apr 14, 2023

LGTM 👏 Thanks for the clear explanation !

@bprusinowski bprusinowski merged commit c6b54b5 into main Apr 14, 2023
@bprusinowski bprusinowski deleted the feat/improve-search branch April 14, 2023 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants