Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Determine / document which systems we intend to support training / executing models on #154

Open
hughes036 opened this issue Apr 13, 2023 · 1 comment

Comments

@hughes036
Copy link

hughes036 commented Apr 13, 2023

We have already had quite a bit of trouble running example training on the systems that we have access to (our macbooks, Ubuntu workstations). As of now, here is what we have attempted, the result, and the blockers:

System Result Blocker Details Solution Related Issue
macOS Monterey 12.6 Intel i7 Fails C++ compile error (at runtime) fatal error: 'omp.h' file not found. AllenCellModeling/cyto-dl#184
macOS Ventura 13.3.1 Intel i7 Fails C++ compile error (at runtime) libomp.dylib not found. brew install libomp
export DYLD_LIBRARY_PATH=/usr/local/opt/libomp/lib:/usr/local/lib
macOS Monterey 12.4 Apple M1 Fails C++ compile error (at runtime) fatal error: 'omp.h' file not found. AllenCellModeling/cyto-dl#184
Ubuntu 16 Fails GPU driver runtime error RuntimeError: The NVIDIA driver on your system is too old (found version 9010). Please update your GPU driver. Update GPU driver from nvidia.com
OR install PyTorch version compiled with current CUDA driver.
Ubuntu 20 (EC2) Succeeds . . . .
Slurm (CPU) . . . . .
Slurm (GPU) . . . . .
AWS cluster (GPU) . . . . .

In all cases, the setup steps were:

  • Create a fresh venv based on Python 3.8 or 3.9
  • upgrade pip
  • pip install wheel
  • pip install boto3
  • pip install -e .
  • pip install requirements/requirements.txt
  • python scripts download_test_data.py

And the experiment run was python aics_im2im/train.py experiment=im2im/segmentation.yaml trainer=cpu

@hughes036 hughes036 self-assigned this Apr 13, 2023
@hughes036
Copy link
Author

We also have problems running napari + our plugin on the EC2 instances which we are running cyto-dl on. So, we don't have a system that will run both.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant