Cannot setup environment on server to run code #1

ryabhmd · 2024-08-06T12:53:30Z

To run scilons_pipeline.py, I've been trying to build an image on slurm and install the datatrove[all] package (as per the instructions in the README).
I've tried to re-use several images from /netscratch/enroot (e.g. python+3.10.4-bullseye.sqsh, ubuntu20+conda.sqsh) and then installing the packages but always end up with incompatibility issues in the installed packages, which results in not being able to use the image.

E.g. when I build on ubuntu20+conda.sqsh and install the datatrove library I get:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
conda-repo-cli 1.0.75 requires requests==2.31.0, but you have requests 2.32.3 which is incompatible.

However, the required version of requests in incompatible with the datasets package.
Once I save the image and use it to run the code it cannot find any of the modules.

Any ideas on how to build an image to run the code? Maybe another I need to use another image to install the package in?

malteos · 2024-08-06T15:41:23Z

Do you need requests for anything in the pipeline? My best guess is that you can simply ignore this error message.

You can also use one of my images: /netscratch/mostendorff/enroot/malteos_eulm_podman.sqsh

It has datatrove==0.2.0 installed.

ryabhmd · 2024-08-07T08:31:08Z

Thanks! Your image works. :)
However, now when I run the pipeline and it gets to the slurm execution part to launch a job from within the script, I get the following error:

FileNotFoundError: [Errno 2] No such file or directory: 'sbatch' srun: error: serv-3317: task 0: Exited with exit code 1
I tried to look at similar issues (e.g. this one) but they didn't solve the issue.
Any ideas?

malteos · 2024-08-07T09:19:30Z

Slurm commands are not available within a containerized compute job. See https://github.com/scilons/datatrove/blob/main/src/datatrove/executor/slurm.py#L35

You need to start the Slurm pipeline from a login node or rewrite it to use a local execution pipeline.

lfoppiano · 2024-10-03T19:40:17Z

The local pipeline works fine and can be ran with an interactive job. I'm wondering if we want to use the slurm, should I create the environment directly in the login machine and run it there?

lfoppiano · 2024-10-08T09:53:42Z

I've installed mamba and set up a virtual environment, then I ran it from there. I'm closing this, feel free to let me know if you have further questions

ryabhmd assigned malteos Aug 6, 2024

lfoppiano closed this as completed Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot setup environment on server to run code #1

Cannot setup environment on server to run code #1

ryabhmd commented Aug 6, 2024

malteos commented Aug 6, 2024

ryabhmd commented Aug 7, 2024

malteos commented Aug 7, 2024

lfoppiano commented Oct 3, 2024

lfoppiano commented Oct 8, 2024

Cannot setup environment on server to run code #1

Cannot setup environment on server to run code #1

Comments

ryabhmd commented Aug 6, 2024

malteos commented Aug 6, 2024

ryabhmd commented Aug 7, 2024

malteos commented Aug 7, 2024

lfoppiano commented Oct 3, 2024

lfoppiano commented Oct 8, 2024