Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch upload with emx2-pyclient times out in case of large datasets #4271

Open
dtroelofsprins opened this issue Sep 27, 2024 · 0 comments
Open

Comments

@dtroelofsprins
Copy link
Contributor

What version of EMX2 are you using (see footer)
v11.8.0

Describe the bug
The PALGA database consist of over 54 mln records, to upload all these data, the file is split up in 54 files with about 1 mln records. A python script is used to upload these files, however after uploading about 17 batches the script fails over and over with error:
raise PyclientException(f"Error uploading file: {task.get('description')}") molgenis_emx2_pyclient.exceptions.PyclientException: Message: Error uploading file: Import fail ed: Transaction failed: Upsert into table 'Samples' failed: canceling statement due to timeout or by user request. in 317867ms

The script uses the schema/api/zip?async=true

Uploading the same zip-file via UI still works, but it's a bit undoable to do this for the over 30 remaining files.

To Reproduce
Steps to reproduce the behavior:

  1. ssh to 'https://palga-emx2.molgenis.net' (emx2-az-backend128)
  2. Goto /usr/local/share/molgenis/tools
  3. python3 batch_upload.py
  4. See error

Expected behavior
Uploading more than 50 files with about 1mln records using the emx2-pyclient (api/zip) should work.

Additional context
Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant