-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenize Regex as Parameter #268
base: master
Are you sure you want to change the base?
Conversation
What's the use case here? |
The default regex is It's been a while since I worked on this, but it looks like I wanted to add |
I second this functionality. I have a use case for diffing two html strings and this would enable me to adjust the tokenizer to meet my needs. |
* `options` : An object with options. Currently, only `context` is supported and describes how many lines of context should be included. | ||
* `options` : An object with options. | ||
* `context` : describes how many lines of context should be included. | ||
* `tokenizer` : Overrides the default regex used to split text into words. supported by `diffWords` and `diffWordsWithSpace` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the wrong place; you've documented it as a parameter of createTwoFilesPatch
instead of diffWords
.
Worth thinking about before I merge this - for Chinese and Japanese support, we might need tokenization logic too complicated to be encompassed in a regex, either built in to jsdiff or as something you can plug in yourself: #328 (comment). I want to carefully think through what I ultimately want the API to look like before merging this PR and make sure it's not gonna commit us to an API that's fundamentally incompatible with supporting Chinese and Japanese. |
This PR adds support for a
tokenizer
parameter which gives the user more control over what constitutes a "token". If thetokenizer
parameter is not set then the default regex is used.