Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quoted searches with underscores return non-exact matches #10505

Closed
murdo-moj opened this issue May 15, 2024 · 3 comments
Closed

Quoted searches with underscores return non-exact matches #10505

murdo-moj opened this issue May 15, 2024 · 3 comments
Labels
bug Bug report stale

Comments

@murdo-moj
Copy link

murdo-moj commented May 15, 2024

Hello, the DataHub demo instance's search appears to be broken. I am looking at an example from the docs:

If you want to:

  • Exact match on term or phrase
    • "datahub_schema" Sample results
    • datahub_schema Sample results
    • Enclosing one or more terms with double quotes will enforce exact matching on these terms, preventing further tokenization.

Both of the results here are the same with 393 results. The quotes aren't doing anything. Perhaps they are being stripped somewhere before the query is passed to elasticsearch?

@tom-webber
Copy link

Further context from some experimentation:

The underscore character does appear to be adding some wildcard functionality that spaces do not.
Here are some example searches in the demo instance and the number of results returned:

Search term Demo Link Number of results
"datahub_schema" (demo) 393
datahub_schema (demo) 393
"datahub schema" (demo) 2
datahub schema (demo) 42
datahub | schema (demo) 393
datahub (demo) 51
schema (demo) 384

It appears as though the underscore character is forcing an 'OR' search for the words it separates, regardless of the presence of quotes (whereas a space character leads to an 'AND' search without quotes, and a 'EXACT' search with quotes)

Copy link

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

@github-actions github-actions bot added the stale label Jun 15, 2024
@david-leifker
Copy link
Collaborator

A better example was added. Later changes were made to interpret datahub_schema and "datahub_schema" as quoted and would avoid tokenization in this case. A new example is shown with pet profiles vs "pet profiles"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report stale
Projects
None yet
Development

No branches or pull requests

3 participants