Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opusfilter fails to compress data when it is downloaded via moses #75

Closed
thfrkielikone opened this issue Jul 9, 2024 · 3 comments
Closed

Comments

@thfrkielikone
Copy link

Running this:

steps:
  - type: opus_read
    parameters:
      corpus_name: OpenSubtitles
      source_language: fi
      target_language: en
      release: v2018
      preprocessing: moses
      src_output: opensubtitles.fi.gz
      tgt_output: opensubtitles.en.gz
      suppress_prompts: true

Results in files opensubtitles.fi.gz and opensubtitles.en.gz that are in fact plain text.

@svirpioj
Copy link
Member

Seems that there are also some other issues regarding the integration with the latest OpusTools using moses preprocssing, like setting output_directory makes the process totally fail. I'll look into this, but I think the problems are on OpusTool's side (ping @miau1).

@svirpioj
Copy link
Member

I suggest using the raw or xml options for preprocessing until we get this fixed.

@svirpioj
Copy link
Member

Fixed in 3.2.0. It is now recommended to download corpora using the moses preprocessing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants