Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve spaces when replacing token #21

Closed
wants to merge 4 commits into from
Closed

Preserve spaces when replacing token #21

wants to merge 4 commits into from

Conversation

culturedniichan
Copy link
Contributor

@culturedniichan culturedniichan commented Jan 21, 2024

When choosing another token from the probabilities, the token's text (content) that may have had a space before it to indicate a new word was not preserved. So for example, if we had

The maid cafe was located in Tokyo

And I wanted to replace "located" (assuming it was its own token) with "situated", what happens is as follows:

  • The text for located is actually " located"
  • However, when replaced, we put "situated", no starting space. This means it becomes

The maid cafe wassituated in Tokyo

What the updated code does is:

  1. Check if the token that will be replaced at position i starts with a space, If not, we do nothing, as we're replacing a token that was part of a word (e.g. joi-ned if we replace the token 'ned')
  2. Adds to the new token a starting space, unless the new token is an English punctuation mark. For example, if we replace 'went' in 'the man went', if the token is the apostrophe ' (e.g. the man's luggage) then there will be no space prepended.

…oken has a preceding space, and the new token does not start with a punctuation mark that does not require a preceding space after the word in English
@lmg-anon
Copy link
Owner

The token's text should have that space, if it doesn't have then it's a bug in the backend. I assume you are using oobabooga?

@culturedniichan
Copy link
Contributor Author

The token's text should have that space, if it doesn't have then it's a bug in the backend. I assume you are using oobabooga?

Yes, I'm using oobabooga

@lmg-anon
Copy link
Owner

I made a PR in oobabooga's repository to fix this issue there, hopefully it will get merged.
oobabooga/text-generation-webui#5339

@culturedniichan
Copy link
Contributor Author

Great. I didn't know it was an ooba issue. I don't have an OpenAI key, and I don't usually use Kobold, so all my tests and use are based on oobabooga. I'll just keep this branch on my computer until (hopefully) it gets merged into ooba

@culturedniichan
Copy link
Contributor Author

Latest Ooba commit seems to have solved the issue. Thanks

@culturedniichan culturedniichan deleted the preserve-spaces-when-replacing-token branch January 22, 2024 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants