Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Step6_zero_phrase_filtering problem #18

Open
Tamali6 opened this issue Jan 29, 2021 · 2 comments
Open

Step6_zero_phrase_filtering problem #18

Tamali6 opened this issue Jan 29, 2021 · 2 comments

Comments

@Tamali6
Copy link

Tamali6 commented Jan 29, 2021

While training monoses I got an error in Step 7 which is------

Traceback (most recent call last):
File "/home/xyz/monoses/training/tuning/tune.py", line 335, in
main()
File "/home/xyz/monoses/training/tuning/tune.py", line 322, in main
extract_zmert_params(tmp + '/dcfg.txt.ZMERT.final'))
File "/home/xyz/monoses/training/tuning/tune.py", line 73, in extract_zmert_params
with open(path, encoding='utf-8', errors='surrogateescape') as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpv1m8y_i1/dcfg.txt.ZMERT.final'
clean-corpus.perl: processing /home/xyz/models/monoses/src-tgt/tmpzbtcque6/train.bt & .trg to /home/xyz/models/monoses/src-tgt/tmpzbtcque6/train-supervised/clean, cutoff 3-80, ratio 9

From the log file and intermediate results, I find out that

  • It successfully generated phrase tables.
  • However, in step 6 it filtered 0% which I suspect.

P(f|e) filter limit: 100
Filtering using P(e|f) only. n=100

..................................................[n:500000]
..................................................[n:1000000]
..................................................[n:1500000]
..................................................[n:2000000]
..................................................[n:2500000]
..................................................[n:3000000]
..................................................[n:3500000]
..................................................[n:4000000]
..................................................[n:4500000]
..................................................[n:5000000]
..................................................[n:5500000]
..................................................[n:6000000]
..................................................[n:6500000]
..................................................[n:7000000]
..................................................[n:7500000]
..................................................[n:8000000]
..................................................[n:8500000]
..................................................[n:9000000]
..................................................[n:9500000]
..................................................[n:10000000]

unfiltered phrases pairs: 10000000

 P(f|e) filter [first]: 0   (0%)
   significance filter: 0   (0%)
        TOTAL FILTERED: 0   (0%)

FILTERED phrase pairs: 10000000   (100%)
  • Then, in Step 7 while running decoder, it printed -

Call to decoder returned 1; was expecting 0.
Z-MERT exiting prematurely (MertCore returned 30)...

@kellymarchisio
Copy link

kellymarchisio commented Sep 14, 2021

I confirm that I have experienced this issue many times. The temporary directory is deleted before extract_zmert_params is called. The failure is intermittent

@kellymarchisio
Copy link

kellymarchisio commented Sep 25, 2021

Ok, I finally figured out my related issue.

I got Z-MERT exiting prematurely (MertCore returned 1)...

This was due to moses2 segfaulting under the hood -> it segfaulted because one of the lines in the dev file I was passing into it was too long. I truncated each line in the dev set to 200 chars, and the segfault resolved. If you're doing unsupervised tuning, I recommend truncating the dev file you pass to moses2

Note: This also happened when I accidentally passed in two files for --supervised-tuning that were of different lengths.
Note2: Failure to use the moses tokenizer or escape-special-chars.perl script can also cause moses2 segfaults within zmert (https://github.com/moses-smt/mosesdecoder/tree/master/scripts/tokenizer)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants