Memory Issue: opus_read fails to extract MultiCCAligned #21

aflueckiger · 2021-03-17T11:00:29Z

Using v1.2.1, the following command successfully downloads the resources of MultiCCAligned. After the download, however, the conversion to Moses-format fails without any error message due to a lack of memory (RAM).

opus_read --directory MultiCCAligned -r v1 --source en --target de --write en-de.en en-de.de --write_mode moses

opus_read seems to read the dataset into memory. The memory increases above 60GB before the process dies.

A similar operation to download the WMT dataset works:

opus_read --directory WMT-News -r v2019 --source en --target de --write en-de.en en-de.de --write_mode moses

Thanks for this library. A tool to collect and filter the ever-increasing datasets is of great use.

The text was updated successfully, but these errors were encountered:

aflueckiger · 2022-04-26T15:33:30Z

The issue still exists, yet I close this in favor of #32.

aflueckiger closed this as completed Apr 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Issue: opus_read fails to extract MultiCCAligned #21

Memory Issue: opus_read fails to extract MultiCCAligned #21

aflueckiger commented Mar 17, 2021 •

edited

Loading

aflueckiger commented Apr 26, 2022

Memory Issue: opus_read fails to extract MultiCCAligned #21

Memory Issue: opus_read fails to extract MultiCCAligned #21

Comments

aflueckiger commented Mar 17, 2021 • edited Loading

aflueckiger commented Apr 26, 2022

aflueckiger commented Mar 17, 2021 •

edited

Loading