Skip to content

Commit

Permalink
Add CustomTextSplitter for the Python package (#80)
Browse files Browse the repository at this point in the history
* Add CustomTextSplitter for the Python package

Allows for custom text splitting on the Python side. It isn't as
nice as implementing the trait on the Rust side, but does make it more
flexible.

* Common tag

* Update changelog
  • Loading branch information
benbrandt authored Jan 13, 2024
1 parent 320ef56 commit 53cc041
Show file tree
Hide file tree
Showing 10 changed files with 310 additions and 84 deletions.
17 changes: 5 additions & 12 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,9 @@ on:
push:
branches:
- main
- master
tags:
- "python-v*"
paths:
- "bindings/python/**"
- ".github/workflows/python.yml"
- "v*"
pull_request:
paths:
- "bindings/python/**"
- ".github/workflows/python.yml"
workflow_dispatch:

concurrency:
Expand Down Expand Up @@ -57,7 +50,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.10"
python-version: "3.11"
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
with:
Expand Down Expand Up @@ -94,7 +87,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.10"
python-version: "3.11"
architecture: ${{ matrix.target }}
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
Expand Down Expand Up @@ -131,7 +124,7 @@ jobs:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.10"
python-version: "3.11"
- uses: dtolnay/rust-toolchain@stable
- uses: Swatinem/rust-cache@v2
with:
Expand Down Expand Up @@ -182,7 +175,7 @@ jobs:
release:
name: Release
runs-on: ubuntu-latest
if: "startsWith(github.ref, 'refs/tags/python-v')"
if: "startsWith(github.ref, 'refs/tags/v')"
needs: [lints, linux, windows, macos, sdist]
steps:
- uses: actions/download-artifact@v4
Expand Down
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,19 @@
# Changelog

## v0.5.1

### What's New

- Python bindings and Rust crate now have the same version number.

#### Rust

- Constructors for `ChunkSize` are now public, so you can more easily create your own `ChunkSize` structs for your own custom `ChunkSizer` implementation.

#### Python

- New `CustomTextSplitter` that accepts a custom callback with the signature of `(str) -> int`. Allows for custom chunk sizing on the Python side.

## v0.5.0

### What's New
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "text-splitter"
version = "0.5.0"
version = "0.5.1"
authors = ["Ben Brandt <[email protected]>"]
edition = "2021"
description = "Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens (when used with large language models)."
Expand Down
4 changes: 4 additions & 0 deletions bindings/python/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Changelog

## v0.5.1 and beyond

See main CHANGELOG.md for the repo.

## v0.3.1

Fix broken Github release
Expand Down
Loading

0 comments on commit 53cc041

Please sign in to comment.