Release v0.9.1 · benbrandt/text-splitter

What's Changed

Python TextSplitter and MarkdownSplitter now both provide a new chunk_indices method that returns a list not only of chunks, but also their corresponding character offsets relative to the original text. This should allow for different string comparison and matching operations on the chunks.

def chunk_indices(
    self, text: str, chunk_capacity: Union[int, Tuple[int, int]]
) -> List[Tuple[int, str]]:
    ...

A similar method already existed on the Rust side. The key difference is that these offsets are character not byte offsets. For Rust strings, it is usually helpful to have the byte offset, but in Python, most string methods and operations deal with character indices.

by @benbrandt in #135

Full Changelog: v0.9.0...v0.9.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.9.1

What's Changed

Contributors