Python splitters optionally provide chunk char offsets #135
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
It can be helpful to know where a given chunk falls within the entire text. On the Rust side, you can get the chunk along with its corresponding byte offset. But there wasn't a comparable method for the Python package.
Because Rust byte offsets aren't useful in Python, these are mapped to the corresponding character index of the beginning of the chunk. Since string indexing in Python is normally done with character indexes, this should allow for different string comparison and matching operations with this number.
Closes #133