You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for writing this library, it's really useful!
I'm seeing a crash with particular emoji input on the latest version installed from PyPI, here's a testcase:
from lingua import Language, LanguageDetectorBuilder
langdetector = LanguageDetectorBuilder.from_all_languages().build()
langdetector.detect_multiple_languages_of('test 🙈')
thread '<unnamed>' panicked at 'byte index 6 is not a char boundary; it is inside '🙈' (bytes 5..9) of `test 🙈`', src/lib.rs:436:27
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "[...]/crash_repro.py", line 4, in <module>
langdetector.detect_multiple_languages_of('test 🙈')
pyo3_runtime.PanicException: byte index 6 is not a char boundary; it is inside '🙈' (bytes 5..9) of `test 🙈`
The text was updated successfully, but these errors were encountered:
Hi, thanks for writing this library, it's really useful!
Nice of you to say that, thank you. :) That motivates me to maintain and improve the library further on.
The cause of your exception is that, whenever detect_multiple_languages_of() returns exactly one DetectionResult, the end index is erroneously calculated as the character offset for Rust. This should be the byte offset instead which then gets converted to character offset for the Python bindings. I'm going to release version 2.0.2 shortly which will fix it.
Hi, thanks for writing this library, it's really useful!
I'm seeing a crash with particular emoji input on the latest version installed from PyPI, here's a testcase:
The text was updated successfully, but these errors were encountered: