Still seeing a lot of regressions on pt-br #231

andrenatal · 2022-04-07T19:09:15Z

It's clear that there's a regression here.

@eu9ene and I met to evaluate that and it seems that marian is being automatically updated in bergamot-translator, which is probably causing a lot of these newly introduced issues in pt-br and ru. We need to identify what's happening and probably use mozilla's bergamot-translator repo to freeze versions since we don't have control on what's being landed on browsermt's can break our project or not.

Notice the screenshot below all these repeated , and the (em inglês) annotations (which means "in english") after non-translated sentences, which are all over. This is critical and needs to be investigated during the next week's task force on bergamot-translator @abhi-agg

The text was updated successfully, but these errors were encountered:

eu9ene · 2022-04-07T19:18:46Z

I agree it's critical to investigate this because if there are issues in the engine, they will be extremely hard to catch.

I see continuous version bumping of 3rd_party components in bergamot-translator https://github.com/browsermt/bergamot-translator/pulls?q=is%3Apr+is%3Aclosed. This makes it less stable. I would rather suggest freezing and updating components altogether periodically followed by a QA pass. We're trying to limit this by using the Mozilla fork, but we can't avoid it if we need to land the latest improvements from the bergamot-translator.

We should start with comparing translations of the same page using different versions of the extension and bergamot-translator module making sure that en-pt model version is the same.

jerinphilip · 2022-04-07T20:44:17Z

Notice the screenshot below all these repeated , and the (em inglês) annotations (which means "in english") after non-translated sentences, which are all over.

Neural MT systems do the repeated comma, the proper fix is harder. Is pt-br a prod model? We may be able to work around this (by means of repetition detection in the decoding code), but the change is not going to be pleasant from a software correctness perspective. (Side-story: A colleague just recently was trying to do natural language to SQL translation. The software had repeated trigram detection and constraint which prevented it from generating nested parenthesis > 3. The point being, such a fix for the comma could break something elsewhere).

Em Ingles looks legible enough to not be counted as a bug from a software development perspective. The fix is probably in the data or ML model space (and the issue arising from unfiltered data on named entities used in training).

I believe we're missing Nightly updates as a source of regression from time to time - browsermt/marian-dev#81 is unaddressed, and there's nothing much we can do about it.

Marian is not automatically updated, commits get in after review in browsermt/marian-dev. I'm the one pressing the merge buttons on dependabot's PRs at bergamot-translator. Sometimes these include improvements upstreamed by clients like translateLocally which goes through bergamot-translator and makes sense to have them in main. There are tests on small and large samples of data for translation which address stability. Please feel free to add more. It's as simple as adding files and expected outputs for more complex corner cases similar to bergamot.html/input.txt.

eu9ene · 2022-04-07T21:01:58Z

I'm close to finalizing retraining of en->pt with all the latest improvements in the pipeline, but the problem here is not the model but that it looks like quality regressed without a model update. We should investigate this. If it's true, it is a software problem, not training. Ideally, software should produce deterministic results for the same model. If not exactly the same, at least with the same level of quality.

andrenatal · 2022-04-07T21:20:13Z

It's the first time since this started that I see these indicators of the language that couldn't be translated "(Em Inglês)" in this case, to show up, and, adding arbitrarily stuff to the page, it's not good and not only from the UX perspective. The pt-br model that's being used is the same for a long time, and I noticed also differences in the translation itself, to the worse, let alone those sequence of commas.

So if we can't guarantee stability and compatibility between our project and the models we train with what's coming from upstream, we should utilize our own fork instead of browsermt's. Otherwise this will never reach a stable and deployable state. It's not practical to request a retrain of all models whenever the engine is updated like @eu9ene pointed, and since we don't receive support neither notifications from upstream about possible incompatibilities (and obviously can't count on that), we should have a way to proceed ourselves without breaking our own software.

@abhi-agg and I will develop that further next week after the holidays.

eu9ene · 2022-04-07T21:50:47Z

@andrenatal I tried some older versions of the extension and also https://github.com/mozilla/translate that uses pretty old version of the engine ("v0.3.1+c7b626d") and I see those issues you pointed out to. So I guess they always were there, but they are pretty rare that's why you didn't see them before.

jerinphilip · 2022-04-07T22:01:30Z

@eu9ene FWIW, there is a quality regression test that checks if BLEU scores on translating 1M English to German remain high enough which has undergone very minimal changes (often due to batching and floating-point rounding variations[1])

https://github.com/browsermt/bergamot-translator-tests/commits/main/speed-tests/intgemm_8bit.avx512_vnni.expected
https://github.com/browsermt/bergamot-translator-tests/blob/main/speed-tests/intgemm_8bit.avx512_vnni.expected

I will also point out that viewport prioritized translation is subject to mouse/scroll movements which create a difference in batching, causing floating-point errors to give slightly different outputs in some rare cases. As you notice, these are rare - and there is value in a product that does maybe 80 or 90% valid translations than not having it trying to cover the long tail. The translation task is also following one to many correspondences by nature, so some of the variations (which appear to be interpreted as instability in this thread) also happen to be acceptable.

[1] https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

eu9ene · 2022-04-07T22:04:25Z

We expect some variations for sure, the question was whether it is some sudden drastic regression in quality or not. It seems it's not, it just fluctuates for different texts. It's great that we have some CI for this.

eu9ene · 2022-04-07T22:17:46Z

Closing then. Let's reevaluate quality when a new en->pt model is ready.

andrenatal added the bug Something isn't working label Apr 7, 2022

andrenatal assigned eu9ene and abhi-agg Apr 7, 2022

eu9ene closed this as completed Apr 7, 2022

andrenatal changed the title ~~Stil seeing a lot of regressions on pt-br~~ Still seeing a lot of regressions on pt-br Apr 12, 2022

jerinphilip mentioned this issue Jun 29, 2022

Automatic speed evaluation mozilla/firefox-translations-models#20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Still seeing a lot of regressions on pt-br #231

Still seeing a lot of regressions on pt-br #231

andrenatal commented Apr 7, 2022 •

edited

Loading

eu9ene commented Apr 7, 2022 •

edited

Loading

jerinphilip commented Apr 7, 2022

eu9ene commented Apr 7, 2022

andrenatal commented Apr 7, 2022 •

edited

Loading

eu9ene commented Apr 7, 2022

jerinphilip commented Apr 7, 2022

eu9ene commented Apr 7, 2022 •

edited

Loading

eu9ene commented Apr 7, 2022 •

edited

Loading

Still seeing a lot of regressions on pt-br #231

Still seeing a lot of regressions on pt-br #231

Comments

andrenatal commented Apr 7, 2022 • edited Loading

eu9ene commented Apr 7, 2022 • edited Loading

jerinphilip commented Apr 7, 2022

eu9ene commented Apr 7, 2022

andrenatal commented Apr 7, 2022 • edited Loading

eu9ene commented Apr 7, 2022

jerinphilip commented Apr 7, 2022

eu9ene commented Apr 7, 2022 • edited Loading

eu9ene commented Apr 7, 2022 • edited Loading

andrenatal commented Apr 7, 2022 •

edited

Loading

eu9ene commented Apr 7, 2022 •

edited

Loading

andrenatal commented Apr 7, 2022 •

edited

Loading

eu9ene commented Apr 7, 2022 •

edited

Loading

eu9ene commented Apr 7, 2022 •

edited

Loading