Fixes for test_training.py failures #245

GenevieveBuckley · 2023-10-25T07:55:21Z

We've been seeing CI failures related to the test_training.py script.

I've investigated, and found:

The major cause was that micro-sam genrally expects the device variable to be a string (eg: "cpu", "cuda", or "mps"), but the training script had it set as a PyTorch object device = torch.device(type="cpu").
There was another problem, not noticeable on the CI, where we did not correctly set the backend device for everything. Having some things on the cpu, and some on a different pytorch backend means we can't perform the computations. This problem was not noticeable on the CI since it is cpu only, but it was very noticeable when I tried to run the tests locally (where I have another backend available that is higher-priority than the cpu).

This PR addresses both issues. Most lines of code relate to problem 2.

pytest test/test_training.py now passes locally on my machine. The CI job should run the tests for this branch once I submit the PR.

GenevieveBuckley · 2023-10-25T07:59:27Z

Here is an example of a CI job which fails partly because of the test_training.py problem this PR hopes to fix.
https://github.com/GenevieveBuckley/micro-sam/actions/runs/6636342678/job/18028669022?pr=3

 FAILED test/test_training.py::TestTraining::test_training - RuntimeError: Unsupported device: cpu
Please choose from 'cpu', 'cuda', or 'mps'.

Full details (click to expand)


Run aganders3/headless-gui@v1
running on linux
install linux_pkgs
starting XVfb
xvfb pid=5591, display=0
running linux_setup
/usr/bin/bash -l -c pytest
============================= test session starts ==============================
platform linux -- Python 3.11.6, pytest-7.4.3, pluggy-1.3.0 -- /home/runner/micromamba/envs/sam/bin/python
cachedir: .pytest_cache
PyQt5 5.15.10 -- Qt runtime 5.15.2 -- Qt compiled 5.15.2
rootdir: /home/runner/work/micro-sam/micro-sam
configfile: pyproject.toml
plugins: napari-0.4.18, qt-4.2.0, cov-4.1.0, npe2-0.7.2, napari-plugin-engine-0.2.0
collecting ... collected 22 items

  /home/runner/micromamba/envs/sam/lib/python3.11/site-packages/timm/models/layers/__init__.py:49: DeprecationWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
    warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", DeprecationWarning)

../../../micromamba/envs/sam/lib/python3.11/site-packages/timm/models/registry.py:4
  /home/runner/micromamba/envs/sam/lib/python3.11/site-packages/timm/models/registry.py:4: DeprecationWarning: Importing from timm.models.registry is deprecated, please import via timm.models
    warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.models", DeprecationWarning)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.11.6-final-0 -----------
Coverage XML written to file coverage.xml

============================= slowest 10 durations =============================
49.07s call     test/test_instance_segmentation.py::TestInstanceSegmentation::test_tiled_automatic_mask_generator
15.54s call     test/test_instance_segmentation.py::TestInstanceSegmentation::test_automatic_mask_generator
11.16s setup    test/test_instance_segmentation.py::TestInstanceSegmentation::test_automatic_mask_generator
8.15s call     test/test_gui.py::test_annotator_2d
5.46s call     test/test_util.py::TestUtil::test_tiled_prediction
2.34s call     test/test_sam_annotator/test_widgets.py::test_embedding_widget
2.03s call     test/test_prompt_based_segmentation.py::TestPromptBasedSegmentation::test_segment_from_mask_non_square
1.45s setup    test/test_prompt_based_segmentation.py::TestPromptBasedSegmentation::test_segment_from_box
0.69s call     test/test_prompt_generators.py::TestPromptGenerators::test_iterative_prompt_generator
0.55s call     test/test_prompt_based_segmentation.py::TestPromptBasedSegmentation::test_segment_from_mask
=========================== short test summary info ============================
FAILED test/test_gui.py::test_annotator_2d - TypeError: 'module' object is not callable
FAILED test/test_training.py::TestTraining::test_training - RuntimeError: Unsupported device: cpu
Please choose from 'cpu', 'cuda', or 'mps'.
======= 2 failed, 16 passed, 4 skipped, 2 warnings in 108.94s (0:01:48) ========
Error: The process '/usr/bin/bash' failed with exit code 1
The process '/usr/bin/bash' failed with exit code 1

codecov · 2023-10-25T08:05:13Z

Codecov Report

❗ No coverage uploaded for pull request base (dev@4fd9ce2). Click here to learn what that means.
The diff coverage is n/a.

@@          Coverage Diff           @@
##             dev     #245   +/-   ##
======================================
  Coverage       ?   38.34%           
======================================
  Files          ?       30           
  Lines          ?     3987           
  Branches       ?        0           
======================================
  Hits           ?     1529           
  Misses         ?     2458           
  Partials       ?        0

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

GenevieveBuckley · 2023-10-25T08:09:45Z

The CI passes now, hooray! 🎉 🎉

constantinpape

Thanks @GenevieveBuckley.
This looks good and I will merge this now so that we can go ahead with the rest.

Eventually we should add a few more unit-tests for the device handling and update the type annotations so that both strs and torch devices are allowed. I will take care of this later.

Fixes for test_training.py failures

ec7adad

constantinpape approved these changes Oct 25, 2023

View reviewed changes

constantinpape merged commit 4561969 into computational-cell-analytics:dev Oct 25, 2023
3 checks passed

constantinpape mentioned this pull request Oct 25, 2023

Accept str and torch.device in device handling; add unittests #246

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes for test_training.py failures #245

Fixes for test_training.py failures #245

GenevieveBuckley commented Oct 25, 2023

GenevieveBuckley commented Oct 25, 2023

codecov bot commented Oct 25, 2023

GenevieveBuckley commented Oct 25, 2023

constantinpape left a comment

Fixes for test_training.py failures #245

Fixes for test_training.py failures #245

Conversation

GenevieveBuckley commented Oct 25, 2023

GenevieveBuckley commented Oct 25, 2023

codecov bot commented Oct 25, 2023

Codecov Report

GenevieveBuckley commented Oct 25, 2023

constantinpape left a comment

Choose a reason for hiding this comment