From ab252d8a2b646d4c40e0f6874e7ac7cde574e9f4 Mon Sep 17 00:00:00 2001 From: Ben Brandt Date: Sat, 23 Mar 2024 15:13:52 +0100 Subject: [PATCH] Add Markdown example to the Python readme --- bindings/python/README.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/bindings/python/README.md b/bindings/python/README.md index 46ac0bfc..1699cd2d 100644 --- a/bindings/python/README.md +++ b/bindings/python/README.md @@ -79,6 +79,19 @@ splitter = TextSplitter.from_callback(lambda text: len(text)) chunks = splitter.chunks("your document text", chunk_capacity=(200,1000)) ``` +### Markdown + +All of the above examples also can also work with Markdown text. You can use the `MarkdownSplitter` in the same ways as the `TextSplitter`. + +```python +from text_splitter import MarkdownSplitter +# Default implementation uses character count for chunk size. +# Can also use all of the same tokenizer implementations as `TextSplitter`. +splitter = MarkdownSplitter() + +splitter.chunks("# Header\n\nyour document text", 1000) +``` + ## Method To preserve as much semantic meaning within a chunk as possible, each chunk is composed of the largest semantic units that can fit in the next given chunk. For each splitter type, there is a defined set of semantic levels. Here is an example of the steps used: