Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt/use similarity dataset in search #1829

Open
ajparsons opened this issue Sep 26, 2024 · 0 comments
Open

Prompt/use similarity dataset in search #1829

ajparsons opened this issue Sep 26, 2024 · 0 comments
Assignees

Comments

@ajparsons
Copy link
Contributor

This is implied for alerts by #1824 - but to have a different ticket to talk through the implications of using this in the main search.

This is porting some aspect of the CAPE search approach to TWFY. Basic dataset here

To recap: we've calculated similar ngrams, which can be used to identify probably related searches that stemming wouldn't. e.g.

  • foreign aid: overseas aid, international aid
  • Freedom of information: right to information, information rights
  • air pollution: air quality, air pollutants
  • offshore wind: offshore renewable energy, offshore renewable, offshore energy
  • gambling harm: gambling-related harm, gambling addiction
  • contaminated blood: blood contamination, infected blood
  • Gay rights: lgbt rights, rights of lgbt, transgender rights
  • free school meals: free school dinners, free school lunches, school meals, school meal provisions

In CAPE, we hide this behind the scenes - specifying a level of fuzzy matching, and then letting the server modify the search query to include related terms (which are then shown as examples of what might be included).

In the alerts flow, I've suggested a step where this these can be opted in as extra tickboxes.

So a question of which of these is the easiest/right approach if we wanted to do this in search as well as alerts (which might make sense as they generally share the same interface).

How would we expose suggestions as part of the search flow, or give an option for automatically adding them or not? Or do we want to make this alerts only for now? (complexity/time/etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants