Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad Encoding Output for Chinese Pretrained Model using JavaScript Bindings #3572

Open
hunterwebapps opened this issue Mar 24, 2021 · 7 comments

Comments

@hunterwebapps
Copy link

Both versions are installed from the npm package.
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85

Tried in both these environments:

  • Windows 10.0.19042 / Python 3.9.0
  • WSL2 Debian 10 / Python 3.7.3

I'll include my code here. I am going to work around this and not use the npm bindings. I haven't tried any other bindings. I just wanted to report here so that the team is aware. When I followed the Getting Started instructions on https://deepspeech.readthedocs.io/ it does in fact work on the command line using pip install. Just not with deepspeech npm package. Pretrained English model/scorer workers perfectly. Just not the pretrained chinese model provided on release page.

Thank you!

const DeepSpeech = require('deepspeech');
const MemoryStream = require('memory-stream');
const { readFileSync, writeFileSync } = require('fs');
const { Duplex } = require('stream');

const chineseModelName = 'deepspeech-0.9.3-models-zh-CN.pbmm';
const chineseScorerName = 'deepspeech-0.9.3-models-zh-CN.scorer';

const model = new DeepSpeech.Model(`models/${chineseModelName}`);

model.enableExternalScorer(`models/${chineseScorerName}`);

const buffer = readFileSync('test-data/chinese.wav');
const audioStream = new MemoryStream();

var stream = new Duplex();
stream.push(buffer);
stream.push(null);
stream.pipe(audioStream);

audioStream.on('finish', () => {
  let audioBuffer = audioStream.toBuffer();

  const metadata = model.sttWithMetadata(audioBuffer, 10);

  const output = metadata.transcripts
    .map((transcript) => {
      return transcript.tokens
        .map((token) => token.text)
        .join('')
    })
    .join('\n');

  writeFileSync('test-data/chinese.txt', output);
  console.log(output);

  DeepSpeech.FreeMetadata(metadata);
  DeepSpeech.FreeModel(model);
  process.exit(0);
});
@lissyx
Copy link
Collaborator

lissyx commented Mar 24, 2021

Thanks but you should be more explicit on the expected and actual output, or link your discourse thread ...

@hunterwebapps
Copy link
Author

hunterwebapps commented Mar 24, 2021

Here's the discourse thread.
https://discourse.mozilla.org/t/pretrained-chinese-model-invalid-inference-output/77439/5

It seems to be specific to the npm package. It just puts out bad encoding, rather than the expected output of valid encoding. Like ����� instead of 我会说中文。But when I used the python command line tool it outputs as expected.

I can work without the javascript bindings, but I just wanted to report here so the team is aware, and anybody else who is looking (like I was) can at least find some info. Thanks!

@lissyx
Copy link
Collaborator

lissyx commented Mar 24, 2021

Here's the discourse thread.
https://discourse.mozilla.org/t/pretrained-chinese-model-invalid-inference-output/77439/5

It seems to be specific to the npm package. It just puts out bad encoding, rather than the expected output of valid encoding. Like ����� instead of 我会说中文。But when I used the python command line tool it outputs as expected.

I can work without the javascript bindings, but I just wanted to report here so the team is aware, and anybody else who is looking (like I was) can at least find some info. Thanks!

Right, so if you can give a try to the link above it might help us: this are current master bindings, and they are built with newer SWIG version, where they have (properly) fixed the NodeJS incompatibilities we had patches for on our SWIG fork.

So hopefully, the issue might have been on our patches. If that's the case, this newer npm package would fix.

@hunterwebapps
Copy link
Author

hunterwebapps commented Mar 24, 2021

Ah! I misunderstood. I just ran the updated version and am actually getting no output. It's telling me that my audio files are 0 seconds in length. I am still using the 0.9.3 models, and I tried both english (which was working with the corresponding release version) and chinese. Neither worked (both saying audio files are 0 sec long). I tried to look for updated models with the new alpha versions, but there don't appear to be any available. I just tried to modify the below url. No surprise there, but I figured I'd try.

https://github.com/mozilla/DeepSpeech/releases/download/v0.10.0-alpha.3/deepspeech-0.10.0-alpha.3-models-zh-CN.pbmm

@lissyx
Copy link
Collaborator

lissyx commented Mar 25, 2021

Neither worked (both saying audio files are 0 sec long).

Those NPM packages were green on CI, so I'd suspect something weird on your side, but I can't tell for sure.

@lissyx
Copy link
Collaborator

lissyx commented Mar 31, 2021

@hunterwebapps As you can see in #3317 and on https://github.com/mozilla/DeepSpeech/projects/13 we are in the process of moving to GitHub Actions current status is that we have mostly end-to-end pipeline on macOS but it's not covering the mandarin work ; if you are interested it would be welcome to add test coverage there.

Getting feedback from people on the new GitHub Actions flow is also super important to us, so it would be a perfect case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants