Skip to content

Releases: google/budoux

v0.6.3

22 Oct 07:09
27fd3bb
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.6.2...v0.6.3

v0.6.2

12 Jan 01:36
4b0f8c5
Compare
Choose a tag to compare

Thai is now supported! 🎉

What's Changed

New Contributors

Full Changelog: v0.6.1...v0.6.2

v0.6.1

17 Nov 06:34
d02254f
Compare
Choose a tag to compare

What's Changed

  • Bump @typescript-eslint/eslint-plugin from 6.9.1 to 6.10.0 in /javascript by @dependabot in #353
  • Bump org.apache.maven.plugins:maven-surefire-plugin from 3.2.1 to 3.2.2 in /java by @dependabot in #354
  • Bump actions/dependency-review-action from 3.1.1 to 3.1.2 by @dependabot in #357
  • Bump @types/node from 20.8.3 to 20.9.0 in /javascript by @dependabot in #356
  • Support weighted samples by @tushuhei in #358
  • Fix unpaired close tags and self-closing tags by @kojiishi in #360
  • [Java] Stop emitting close tags if self-closing by @kojiishi in #362
  • Update Google Java Format action by @tushuhei in #363
  • Bump actions/dependency-review-action from 3.1.2 to 3.1.3 by @dependabot in #364
  • Bump @typescript-eslint/eslint-plugin from 6.10.0 to 6.11.0 in /javascript by @dependabot in #365
  • [java] Fix errors by collapsed white spaces and <br> by @kojiishi in #367
  • Bump github/codeql-action from 2.22.5 to 2.22.6 by @dependabot in #368
  • [java] Replace wholeText() with NodeVisitor by @kojiishi in #369
  • Implement tail for node visitor by @tushuhei in #370
  • Update jsoup to 1.16.2 by @tushuhei in #371
  • Version up to 0.6.1 by @tushuhei in #372

Full Changelog: v0.6.0...v0.6.1

v0.6.0

06 Nov 22:02
93ac23c
Compare
Choose a tag to compare

Noteworthy changes

  • BudouX Web Components don't use Shadow DOM anymore. The segmentation results will be reflected in their Light DOM, where the global styles can apply. #291
  • Phrases are segmented by ZWSP (U+200B) not <wbr> for a better screen reader experience. #346
  • You can insert non-breaking markup (<nobr and white-space: nowrap) when you have a phrase you don't want to break. #240

What's Changed

Full Changelog: v0.5.2...v0.6.0

v0.5.2

03 Jul 05:46
66f13b6
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.5.1...v0.5.2

v0.5.1

20 Apr 02:56
50178f6
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.5.0...v0.5.1

v0.5.0

01 Mar 03:05
638b82b
Compare
Choose a tag to compare

Highlights

  • No major change in using default parsers.
  • If you're using a custom model, you need to update it. Read on the "Updating Models" section.
  • The defineClassAs method in javascript/src/html_processor.ts is removed.

Updating Models

As described in #112, the model file structure has been updated for performance improvement and file size reduction. The change is simple; it just adds one layer depth by grouping features as the following example shows.

Before:

{"UW1:a": 123, "UW3:b": 271}

After:

{"UW1": {"a": 123}, "UW3": {"b": 271}}

You can update your custom model to the latest by running scripts/translate_model.py.

$ python translate_model.py --format=json old-model.json > new-model.json

What's Changed

Full Changelog: v0.4.1...v0.5.0

v0.4.1

12 Jan 23:02
c1d8199
Compare
Choose a tag to compare

⚠️ Breaking Change ⚠️

We added a significant change to the model training script scripts/train.py.

  • The --chunk-size option is removed because the bottleneck of memory consumption has shifted due to the overhaul.
  • The script does not shuffle the input data any more. You need to shuffle the data by yourself using tools such as shuf if needed.

What's Changed

Full Changelog: v0.4.0...v0.4.1

v0.4.0

14 Dec 03:52
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.3.0...v0.4.0

v0.3.0

05 Dec 01:29
b22226f
Compare
Choose a tag to compare

What's Changed

Faster model training

We made model training faster by applying JAX's JIT compilation, pooling file writes, etc.

  • Faster training data encoding by @tushuhei in #89
  • Add out_span option for better GPU utilization by @tushuhei in #90
  • Apply JAX JIT compiling for faster training by @tushuhei in #95
  • Check in updated Simplified Chinese model by @tushuhei in #99

Smaller models

We made models smaller by removing less important features, disabling ASCII encoding, etc.

Misc

Full Changelog: v0.2.1...v0.3.0