Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support finding data with typos #153

Merged
merged 1 commit into from
Oct 16, 2024

Conversation

overengineered
Copy link
Contributor

What: Increases tolerance for typos in data and/or query text

Why: match sorter currently finds data with some typos, but others. E.g. "canceled" would find "cancelled", but not "cacneled".

How: getClosenessRanking calculator allows skipping one of the characters from the query. This allows finding data that contains the most common typos: swapped letters, missing letters and diverging spelling (localisation/localization). Matches that have missing letters have reduced ranking.

Checklist:

  • Documentation N/A
  • Tests
  • Ready to be merged

Since this change only affects searches where no good match (like full word, or exact start or word) gets found, I think there's no need to have add to options some flag to disable/enable this feature. If searching ua and finding United States of America is good enough right now, then my changes that improve tolerance for typos should be also acceptable.

But I can see that there could be some value in providing configuration for this new feature - and I am open to do it.

Copy link
Owner

@kentcdodds kentcdodds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like this, but worry about the impact on filtering. I worry that it may end up matching far too much and I think it may be worth adding another ranking level for this. What do you think?

@overengineered
Copy link
Contributor Author

I think ranking level is not really appropriate for controlling this functionality. As an example for query defer with proposed typo tolerance the following matches are ordered like this:

lawyer defends victims
United States of America
indestructible dog crate

All those results are not good matches for the query, but when ordering for similarity to defer in my mind this order seems correct. Only the second match would be findable without typo tolerance. Essentially, typo tolerance adds both "better quality" results and "lower quality" results. So it seems that it would be best for typo tolerance to be in MATCHES ranking and let scoring function do its work.

In my view the MATCHES ranking is for results where no good matches were found and we are scraping the bottom of the barrel to see if something might be similar. Even current approach for MATCHES ranking can find lots of weird matches - and proposed changes do not drastically change behaviour. If spurious matches are not appropriate, increasing threshold seems like the best way to turn it off.

But if this new matching should be controlled separately, I would do it like this:

  interface MatchSorterOptions<ItemType = unknown> {
    <..>
    sorter?: Sorter<ItemType>
+   fuzzy?: 'scattered' | 'partial'
  }

I think true / false do work here as current MATCHES behaviour is an already little bit fuzzy. I think these constants fittingly describe behaviour differences.

@overengineered
Copy link
Contributor Author

overengineered commented Oct 11, 2024

I just had an idea to allow typo tolerance with ranking like this:

- MATCHES
+ TIGHT_PARTIAL
+ SCATTERED

Basically typo tolerant matching would be higher than the current MATCHES matching. However my proposed code change would have to be modified so that it allowed skipping letters only if matching letters are packed tightly. I would imagine restricting finding all the letters except one within a span that is just a few letters longer than initial query would almost always be higher quality match than finding all letters, but widely scattered.

@kentcdodds
Copy link
Owner

I think you're right. This is fine. If people find it's problematic then we can ship a breaking change with new levels.

@kentcdodds kentcdodds merged commit 8fc0645 into kentcdodds:main Oct 16, 2024
4 checks passed
Copy link

🎉 This PR is included in version 6.4.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

@diegohaz
Copy link

FYI, this change caused one of our automated tests to fail: https://github.com/ariakit/ariakit/pull/4218/files#diff-ffb64734a719f82219f5f82d3ee8c52e111c4a560940b6bb14b60ab6d3e74de0

This isn't a problem. I'll update the test. The UI works well, and I believe it's a good feature. I'm sharing this so you might consider making it optional to avoid any friction until the next major.

@kentcdodds
Copy link
Owner

@diegohaz, I think I'll just do a major version bump and deprecate 6.4.0

kentcdodds added a commit that referenced this pull request Oct 17, 2024
#153 (comment)

BREAKING CHANGE: The last release was arguably a breaking change so we're going to trigger this major version and deprecate the 6.4.0.
@kentcdodds
Copy link
Owner

I've deprecated 6.4.0 and released 7.0.0: https://github.com/kentcdodds/match-sorter/releases/tag/v7.0.0

renovate bot added a commit to ariakit/ariakit that referenced this pull request Oct 17, 2024
This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [match-sorter](https://redirect.github.com/kentcdodds/match-sorter) |
[`6.3.4` ->
`7.0.0`](https://renovatebot.com/diffs/npm/match-sorter/6.3.4/7.0.0) |
[![age](https://developer.mend.io/api/mc/badges/age/npm/match-sorter/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/match-sorter/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/match-sorter/6.3.4/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/match-sorter/6.3.4/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

### Release Notes

<details>
<summary>kentcdodds/match-sorter (match-sorter)</summary>

###
[`v7.0.0`](https://redirect.github.com/kentcdodds/match-sorter/releases/tag/v7.0.0)

[Compare
Source](https://redirect.github.com/kentcdodds/match-sorter/compare/v6.4.0...v7.0.0)

This has the same contents as v6.4.0:

##### Features

- support finding data with typos
([#&#8203;153](https://redirect.github.com/kentcdodds/match-sorter/issues/153))
([8fc0645](https://redirect.github.com/kentcdodds/match-sorter/commit/8fc0645af4b2dfdbb53dd9c1c088ab52cd997f5f))

##### Bug Fixes

- **release:** manually release a major version
([d9b7dab](https://redirect.github.com/kentcdodds/match-sorter/commit/d9b7dab7d10f65db0dbc7df5788ee6e81eb26377)),
closes
[/github.com/kentcdodds/match-sorter/pull/153#issuecomment-2417996730](https://redirect.github.com//github.com/kentcdodds/match-sorter/pull/153/issues/issuecomment-2417996730)

##### BREAKING CHANGES

- **release:** The last release was arguably a breaking change so we're
going to trigger this major version and deprecate the 6.4.0.

###
[`v6.4.0`](https://redirect.github.com/kentcdodds/match-sorter/releases/tag/v6.4.0)

[Compare
Source](https://redirect.github.com/kentcdodds/match-sorter/compare/v6.3.4...v6.4.0)

##### Features

- support finding data with typos
([#&#8203;153](https://redirect.github.com/kentcdodds/match-sorter/issues/153))
([8fc0645](https://redirect.github.com/kentcdodds/match-sorter/commit/8fc0645af4b2dfdbb53dd9c1c088ab52cd997f5f))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/ariakit/ariakit).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC4xMjAuMSIsInVwZGF0ZWRJblZlciI6IjM4LjEyMC4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Haz <[email protected]>
kodiakhq bot pushed a commit to timelessco/recollect that referenced this pull request Oct 20, 2024
This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [match-sorter](https://redirect.github.com/kentcdodds/match-sorter) | [`^6.3.1` -> `^7.0.0`](https://renovatebot.com/diffs/npm/match-sorter/6.3.4/7.0.0) | [![age](https://developer.mend.io/api/mc/badges/age/npm/match-sorter/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/match-sorter/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/match-sorter/6.3.4/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/match-sorter/6.3.4/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) |

---

### Release Notes

<details>
<summary>kentcdodds/match-sorter (match-sorter)</summary>

### [`v7.0.0`](https://redirect.github.com/kentcdodds/match-sorter/releases/tag/v7.0.0)

[Compare Source](https://redirect.github.com/kentcdodds/match-sorter/compare/v6.4.0...v7.0.0)

This has the same contents as v6.4.0:

##### Features

-   support finding data with typos ([#&#8203;153](https://redirect.github.com/kentcdodds/match-sorter/issues/153)) ([8fc0645](https://redirect.github.com/kentcdodds/match-sorter/commit/8fc0645af4b2dfdbb53dd9c1c088ab52cd997f5f))

##### Bug Fixes

-   **release:** manually release a major version ([d9b7dab](https://redirect.github.com/kentcdodds/match-sorter/commit/d9b7dab7d10f65db0dbc7df5788ee6e81eb26377)), closes [/github.com/kentcdodds/match-sorter/pull/153#issuecomment-2417996730](https://redirect.github.com//github.com/kentcdodds/match-sorter/pull/153/issues/issuecomment-2417996730)

##### BREAKING CHANGES

-   **release:** The last release was arguably a breaking change so we're going to trigger this major version and deprecate the 6.4.0.

### [`v6.4.0`](https://redirect.github.com/kentcdodds/match-sorter/releases/tag/v6.4.0)

[Compare Source](https://redirect.github.com/kentcdodds/match-sorter/compare/v6.3.4...v6.4.0)

**DEPRECATED**: This was arguably a breaking change so we've deprecated this version and released a major version.

##### Features

-   support finding data with typos ([#&#8203;153](https://redirect.github.com/kentcdodds/match-sorter/issues/153)) ([8fc0645](https://redirect.github.com/kentcdodds/match-sorter/commit/8fc0645af4b2dfdbb53dd9c1c088ab52cd997f5f))

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "before 4am on Monday" in timezone Asia/Kolkata, Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] If you want to rebase/retry this PR, check this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/timelessco/recollect).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants