-
Notifications
You must be signed in to change notification settings - Fork 49
Still seeing a lot of regressions on pt-br #231
Comments
I agree it's critical to investigate this because if there are issues in the engine, they will be extremely hard to catch. I see continuous version bumping of 3rd_party components in bergamot-translator https://github.com/browsermt/bergamot-translator/pulls?q=is%3Apr+is%3Aclosed. This makes it less stable. I would rather suggest freezing and updating components altogether periodically followed by a QA pass. We're trying to limit this by using the Mozilla fork, but we can't avoid it if we need to land the latest improvements from the bergamot-translator. We should start with comparing translations of the same page using different versions of the extension and bergamot-translator module making sure that en-pt model version is the same. |
Neural MT systems do the repeated comma, the proper fix is harder. Is
I believe we're missing Nightly updates as a source of regression from time to time - browsermt/marian-dev#81 is unaddressed, and there's nothing much we can do about it. Marian is not automatically updated, commits get in after review in browsermt/marian-dev. I'm the one pressing the merge buttons on dependabot's PRs at bergamot-translator. Sometimes these include improvements upstreamed by clients like translateLocally which goes through bergamot-translator and makes sense to have them in main. There are tests on small and large samples of data for translation which address stability. Please feel free to add more. It's as simple as adding files and expected outputs for more complex corner cases similar to bergamot.html/input.txt. |
I'm close to finalizing retraining of en->pt with all the latest improvements in the pipeline, but the problem here is not the model but that it looks like quality regressed without a model update. We should investigate this. If it's true, it is a software problem, not training. Ideally, software should produce deterministic results for the same model. If not exactly the same, at least with the same level of quality. |
It's the first time since this started that I see these indicators of the language that couldn't be translated So if we can't guarantee stability and compatibility between our project and the models we train with what's coming from upstream, we should utilize our own fork instead of browsermt's. Otherwise this will never reach a stable and deployable state. It's not practical to request a retrain of all models whenever the engine is updated like @eu9ene pointed, and since we don't receive support neither notifications from upstream about possible incompatibilities (and obviously can't count on that), we should have a way to proceed ourselves without breaking our own software. @abhi-agg and I will develop that further next week after the holidays. |
@andrenatal I tried some older versions of the extension and also https://github.com/mozilla/translate that uses pretty old version of the engine ("v0.3.1+c7b626d") and I see those issues you pointed out to. So I guess they always were there, but they are pretty rare that's why you didn't see them before. |
@eu9ene FWIW, there is a quality regression test that checks if BLEU scores on translating 1M English to German remain high enough which has undergone very minimal changes (often due to batching and floating-point rounding variations[1]) https://github.com/browsermt/bergamot-translator-tests/commits/main/speed-tests/intgemm_8bit.avx512_vnni.expected I will also point out that viewport prioritized translation is subject to mouse/scroll movements which create a difference in batching, causing floating-point errors to give slightly different outputs in some rare cases. As you notice, these are rare - and there is value in a product that does maybe 80 or 90% valid translations than not having it trying to cover the long tail. The translation task is also following one to many correspondences by nature, so some of the variations (which appear to be interpreted as instability in this thread) also happen to be acceptable. [1] https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html |
We expect some variations for sure, the question was whether it is some sudden drastic regression in quality or not. It seems it's not, it just fluctuates for different texts. It's great that we have some CI for this. |
Closing then. Let's reevaluate quality when a new en->pt model is ready. |
It's clear that there's a regression here.
@eu9ene and I met to evaluate that and it seems that marian is being automatically updated in bergamot-translator, which is probably causing a lot of these newly introduced issues in pt-br and ru. We need to identify what's happening and probably use mozilla's bergamot-translator repo to freeze versions since we don't have control on what's being landed on browsermt's can break our project or not.
Notice the screenshot below all these repeated
,
and the(em inglês)
annotations (which means "in english") after non-translated sentences, which are all over. This is critical and needs to be investigated during the next week's task force on bergamot-translator @abhi-aggThe text was updated successfully, but these errors were encountered: