Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Davlan/bert fails in demo #116

Closed
rg3h opened this issue May 18, 2023 · 5 comments
Closed

[Bug] Davlan/bert fails in demo #116

rg3h opened this issue May 18, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@rg3h
Copy link

rg3h commented May 18, 2023

Describe the bug

uncaught promise error when using https://xenova.github.io/transformers.js/
and selecting the Token classification w/ Davlan/bert-base-multilingual-cased-ner-hrl

Task:
How to reproduce
Steps or a minimal working example to reproduce the behavior

go to https://xenova.github.io/transformers.js/
Select the Token classification w/ Davlan/bert-base-multilingual-cased-ner-hrl option from the drop down
Select "Generate" button
Nothing appears to happen and developer window reports:
caught (in promise) Error: token_ids must be a non-empty array of integers.
at Function.decode (worker-ed2ef37e.js:1790:46739)
at worker-ed2ef37e.js:1791:944
at Array.map ()
at token_classification (worker-ed2ef37e.js:1791:906)
at async worker-ed2ef37e.js:1790:127237

Expected behavior
A clear and concise description of what you expected to happen.

I was hoping to see the resulting token ids for the demo.

Logs/screenshots
If applicable, add logs/screenshots to help explain your problem.

Environment

  • Transformers.js version: not sure, it is whatever your demo page is running on 18May2023
  • Browser (if applicable): I was running Brave
  • Operating system (if applicable): Windows 10
  • Other:

Additional context
Add any other context about the problem here.

The translation demo does work, so other aspects of this awesome library are working.

@rg3h rg3h added the bug Something isn't working label May 18, 2023
@xenova
Copy link
Collaborator

xenova commented May 19, 2023

Good catch - thanks! I thought I tested all the demos after migrating the demo-site, but looks like I missed this one :) Fixing now.

@xenova xenova closed this as completed in 4a6b8cc May 19, 2023
@xenova
Copy link
Collaborator

xenova commented May 19, 2023

Demo back up and running - just need to fix the decoding for the numbers (should be 2016, not 2 0 1 6; unrelated to this issue)
image

@xenova
Copy link
Collaborator

xenova commented May 19, 2023

Fixed the number tokenization (which will be updated in v2.0.1; release coming soon)
image

@rg3h
Copy link
Author

rg3h commented May 19, 2023

Thanks xenova -- looking forward to it!

@xenova
Copy link
Collaborator

xenova commented May 20, 2023

DimQ1 added a commit to DimQ1/transformers.js that referenced this issue Jun 1, 2023
* [demo] Fix token-classification (Closes huggingface#116)

* Fix Bert tokenizer regex for numbers

* Update Bert pretokenizer regex

Should match the rust implementation: https://github.com/huggingface/tokenizers/blob/b4fcc9ce6e4ad5806e82826f816acfdfdc4fcc67/tokenizers/src/pre_tokenizers/bert.rs#L11

- Removes whitespace
- Splits on unicode punctuation and certain ascii characters

* Add tokenizer test cases with numbers

* Build demo website after release

* [package.json] Update keywords

* [version] Update to 2.0.1

* Freeze onnxruntime dependencies (huggingface#124)

Their latest version has a few issues, particularly with webgpu, and also uses .wasm files which are incompatible with their previous versions.

So, while those issues are sorted out, it's best to freeze their packages to the latest stable version.

* Use versioned links (#Closes huggingface#114)

Prevents issues where users copy-paste the import code, and then a future update breaks it.

Also ensures that the default wasm files match the target version

* [version] Update to 2.0.2

* Update package-lock.json

* Update README.md

* Replace `Math.max` with custom `max` function

* Add `sentence-transformers` models to supported models/tasks

* Correctly use default module if present

* Use error mapping instead of switch block

* [docs] Fix numbering

---------

Co-authored-by: Joshua Lochner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants