Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add null byte as hard context separator #295

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

LukasKalbertodt
Copy link
Contributor

Pull Request

Related issue

Fixes https://github.com/orgs/meilisearch/discussions/744

What does this PR do?

Adds \0 as context separator.

This allows one to use \0 as artificial separator, for example when concatting lots of small strings into a large string. See this discussion for context: https://github.com/orgs/meilisearch/discussions/744

PR checklist

Please check if your PR fulfills the following requirements:

  • Does this PR fix an existing issue, or have you listed the changes applied in the PR description (and why they are needed)?
  • Have you read the contributing guidelines?
  • Have you made sure that the title is accurate and descriptive of the changes?

Thank you so much for contributing to Meilisearch!

@ManyTheFish
Copy link
Member

ManyTheFish commented Jun 27, 2024

Hello @LukasKalbertodt,
Thank you for your PR,
I just let you know that the CONTEXT_SEPARATOR list is a promoting list, which means that if we categorize a token as a separator AND this token is part of the CONTEXT_SEPARATOR list, then it will be considered a hard separator.
However, if the token is not categorized as a separator first, it will never be promoted, even if it is part of the CONTEXT_SEPARATOR list.
If you want to consider \0 as a separator by default, you have to add it to the DEFAULT_SEPARATOR list as well!

This allows one to use \0 as artificial separator, for example when concatting lots of small strings into a large string. See this discussion for context: https://github.com/orgs/meilisearch/discussions/744
Copy link
Member

@ManyTheFish ManyTheFish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, thank you for the contribution!

bors merge

Copy link
Contributor

meili-bors bot commented Jul 2, 2024

Build succeeded:

@meili-bors meili-bors bot merged commit 4cadf24 into meilisearch:main Jul 2, 2024
4 checks passed
@LukasKalbertodt LukasKalbertodt deleted the patch-1 branch July 2, 2024 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants