Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process Killed #46

Closed
bayesrule opened this issue Apr 2, 2022 · 2 comments
Closed

Process Killed #46

bayesrule opened this issue Apr 2, 2022 · 2 comments

Comments

@bayesrule
Copy link

bayesrule commented Apr 2, 2022

Hi,

  1. I'm running the following step, tried twice. Both end up with process "killed" (the 2nd attempt already got downloaded files, so no download was skipped). Any suspected reason? RAM 32 GB, not enough memory?
  2. Why https://opus.nlpl.eu/ParaCrawl.php showed v9 in title but can't get v9 version. The actual latest is v8.

Thanks!

common:

output_directory: CCMatrix_de-en

steps:

  • type: opus_read
    parameters:
    corpus_name: CCMatrix
    source_language: de
    target_language: en
    preprocessing: raw
    src_output: sents.de.gz
    tgt_output: sents.en.gz

image
image

@svirpioj
Copy link
Member

svirpioj commented Apr 5, 2022

  1. This is possibly the same issue as opus_read fails to extract CCMatrix OpusTools#32. I tried to run the step, and indeed it's taking a lot of memory. (I killed the process at 15G before it started swapping.)
  2. Cannot replicate this, downloading ParaCrawl v9 works fine for me both with OpusFilter and OpusTools.

@bayesrule
Copy link
Author

@svirpioj many thanks for answering! For 1, looks like no other choice, I've changed to use ParaCrawl. For 2, v9 can be downloaded now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants