Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to search #258

Closed
ptbrowne opened this issue Dec 8, 2021 · 18 comments · Fixed by #256
Closed

Improvements to search #258

ptbrowne opened this issue Dec 8, 2021 · 18 comments · Fixed by #256
Labels

Comments

@ptbrowne
Copy link
Collaborator

ptbrowne commented Dec 8, 2021

Meta issue:

Search terms:

"Anzahl": #180
"Bade": #96
"OFPP"
"PRTR": #105
"MFM-U"

@ptbrowne ptbrowne added the search label Dec 8, 2021
@ptbrowne
Copy link
Collaborator Author

ptbrowne commented Dec 8, 2021

@ortnever @fabiancresson @PDumasBAR I've created this meta issue to have all the issues at the same place. Search is pretty hard to get right, after trying to tweak the original code, I've tried to use another technique, you can test it here : https://visualize-ad-search-rfimjjlwpq.herokuapp.com/en/browse, I have the feeling that the results are better, what do you think ?

@ptbrowne
Copy link
Collaborator Author

ptbrowne commented Dec 8, 2021

If you have weird results, do not hesitate to post the search query, so that I can list all the queries that you want to make.

@ptbrowne
Copy link
Collaborator Author

ptbrowne commented Dec 9, 2021

Thanks for the feedback jeremy, it is true, I will look a bit more into it.

@ptbrowne
Copy link
Collaborator Author

ptbrowne commented Dec 9, 2021

This should be better now :)

@jstcki
Copy link
Contributor

jstcki commented Dec 9, 2021

Nice!

A very small thing: now the whole word (incl. punctuation) is highlighted instead of the string I typed. I would expect an exact highlighting but it doesn't really matter much 🙂

image

@ortnever
Copy link

ortnever commented Dec 9, 2021

I tried to search the MFM-U Dataset (draft)
With Keyword "MFM" no Result. Why?

image

I find the dataset with keyword "Air"
https://visualize-ad-search-rfimjjlwpq.herokuapp.com/en/browse?includeDrafts=true&order=TITLE_ASC&search=Air

image

@PDumasBAR
Copy link

For what I see, it works only on complete word.
Noise -> OK
Nois -> not OK

@ptbrowne
Copy link
Collaborator Author

ptbrowne commented Dec 9, 2021

With Keyword "MFM" no Result. Why?

@ortnever This is due to how the search engine cuts word, I will look into it, thanks for the example.

@PDumasBAR When I search for "Nois", I get the correct results (also with "noi" as shown below), are you sure you are using the deployment preview : https://visualize-ad-search-rfimjjlwpq.herokuapp.com/en/browse?includeDrafts=true&order=TITLE_ASC&search=noi

image

@PDumasBAR
Copy link

OK
I've just tested with link in a mail, not verified the url :(

I think the "search" is not finished and we must wait a little before testing.

If I search "noi", it works, but the next search with "noise" doesn't work. Maybe my previous commentar was about that, first "noise" -> ok, then "noi" -> not ok, without reloading the page.

@ptbrowne
Copy link
Collaborator Author

ptbrowne commented Dec 9, 2021

Thanks for the feedback Pierre.

I think the "search" is not finished and we must wait a little before testing.

The search for now is very crude technically and I am exploring options to improve it. Search depends a lot on what you expect the tool to be able to search. This is why I try to involve you as early as possible, and regroup the used search terms at the top. For example I learn that BAFU searches datasets with acronyms. This might differ from generic users that would search with common words.

If I search "noi", it works, but the next search with "noise" doesn't work. Maybe my previous commentar was about that, first "noise" -> ok, then "noi" -> not ok, without reloading the page.

This is weird, it works for me and for @bprusinowski. I tested in Firefox and in Chrome, do you have more information ?
When you say "doesn't work", it means that there are no results ? Just to be sure I re-add the URL to test here : https://visualize-ad-search-rfimjjlwpq.herokuapp.com/en/browse.

@PDumasBAR
Copy link

Hi Patrick,

I'm using EDGE, Version 95.0.1020.44 (Official build) (64-bit)

  1. Open the link and search "noise" -> 2 datasets found
  2. Delete "se", it remains "noi" -> nothing displayed, at least the last time :(
    but now it works :)

That's what I say. Not that the functionality is not good. Only it's a work in progress and testing make not so much sense until it reach a little more stable status.

If it can helps, I think the search must react the same way as on opendata.swiss where the search look in all displayed fields. Of course title and description, but also in category, organization or tag (make less sense in visualize where the tags are not displayed).

The opendata search is also "multi words" and coupled with an auto-complete list, given the user the opportunity to go directly to a matching dataset. I think this is the good direction. But also, only nice to have :)

@FabianCretton
Copy link

about "key word search - strange ordering of the results" #251
See my new comment here: #251 (comment)

@markusb-ch
Copy link

We (Vero and I) did some testing this morning and I want to summarize the behaviour we would like to see:

  1. searching for a word should search for the whole word or words that contain this word - but it should not search for words with less letters.
    Example: If I search for "bath" only datasets with the word "bath" or "bathing" should be in the result, datasets containing "ba" should not be displayed.

  2. search should include metadata like category data or keywords which you enter for the cube.
    Example datasets flagged with "Raum und Umwelt" should show up when searching for "Umwelt"

  3. When two Words are entered, both should be handled individually and only datasets should come up which contain both words (logic is and, not or).
    Example "schweizer bakterien" would show no results but we have one dataset containing the word "schweizer" and "bakterien" which should be displayed.

@sosiology
Copy link
Contributor

@Mbch @ortnever

Thank you very much for taking the time to test the search and share your valuable feedback.

The search functionality we have right now is implemented on a very high level: It loads all the datasets and the filtering/search is done on the server which is why the results shown may not be 100% relevant to the search query/keyword.

We agree that there is potential to improve this feature. To apply the behaviour you described above, and make the search scalable and more performant, we would have to implement a full text search functionality from the database.

@ptbrowne did some testing last year, and based on what we know at the moment, we assume that implementing a full text search functionality that aligns with the expected behaviour is technically feasible, but we would need to first invest some more time exploring this option to validate this assumption.

We estimate that we could further explore the possibilities and implement a full text search in the course of one Sprint. If this is a priority, it is something we could consider for the final sprint (3.5).

@FabianCretton
Copy link

@sosiology do you mean that the search is not done with a SPARQL query ?

@ptbrowne
Copy link
Collaborator Author

Yep, the search has not been implemented with a SPARQL query at the moment. It should be done via the full text search functionality of stardog in the future.

@FabianCretton
Copy link

and one more question @ptbrowne about point 1 of @Mbch here above: is it not easy to "disable" to fact that the search is done with letters of the input word instead of words themselves ? I guess there is a kind of "fuzzy" search feature that could be disabled ? Or maybe say the smallest number of letters for the grouping ?
I mean, when looking for "OFPP", I do not want to find results with "o", "f", "p", "p" individually. This is the current main problem. Like I don't want to get "perfect store". But it could be interesting to say that 3 letters are ok (for instance "ofp", or "fpp".
So, to resume: is that "fuzzy search" customizable before waiting for a bigger work on the search feature ?

@ptbrowne ptbrowne mentioned this issue Feb 25, 2022
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants