Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up the test with satpy #2826

Closed
ClementLaplace opened this issue Jun 14, 2024 · 4 comments · Fixed by #2850
Closed

Speed up the test with satpy #2826

ClementLaplace opened this issue Jun 14, 2024 · 4 comments · Fixed by #2850

Comments

@ClementLaplace
Copy link
Contributor

Right now the test related to satpy are run sequentially with pystest which takes arround 20 minutes to run. I think we can use the pytest-xdist which enables to parallelize the tests and speed up significantly the test.

We can install it on conda through this command bellow

conda install conda-forge::pytest-xdist

Then to run pytest in parrallel you just need to write this command

pytest -n number_of_worker

where number_of_worker is a number. This solution could be integrated into the CI/CD to makes the pipeline faster.

@djhoese
Copy link
Member

djhoese commented Jun 17, 2024

Things that get in the way of this working:

  1. Dask workers being shared or not shared. It looks like pytest-xdist uses processes (instead of threads) so this would likely result in num_xdist_workers * num_dask_workers being run at the same time which could hurt the running machine a little more than it would have to if these values (num workers for xdist and for dask) were more carefully chosen.
  2. Similar to 1, some tests use dask distributed schedulers that might not be controlled by pytest fixtures and therefore might end up duplicating, or even worse, reusing schedulers that could cause conflicts...maybe.
  3. Satpy configuration (satpy.config) and possibly the associated environment variables are reset with every test. If the satpy.config ends up getting shared between xdist workers at all then this would likely (eventually) cause race conditions between tests. Similarly tests modifying dask.config or pyresample.config. These config objects should be held in memory though so separate processes should be fine...I think.
  4. Poorly written tests not using tmp_path: some old tests create fake input files or write temporary output files in the current working directory. This could cause race conditions between tests/workers...although I'm tempted to say there aren't that many tests that don't use tmp_path.

I guess it is worth a shot to try it. It'd be nice to have CI finish faster.

@mraspaud
Copy link
Member

I just tried this in my laptop, and I think it should work. I don't have all the dependencies installed, so I get some failed tests, but they do not seem to differ with and without optimisation.
With -n 6 on my laptop, I got down from 16 minutes to 4.5 minutes for running all the unit tests.
@ClementLaplace if you feel like making a small PR to the satpy CI files to activate this, we could see if it works in github also?

@ClementLaplace
Copy link
Contributor Author

@mraspaud I have created a pull request here, it seems that some tests have failed. I have not fully checked what happened with the tests. I'm going to work on it.

@ameraner ameraner linked a pull request Jul 19, 2024 that will close this issue
@ameraner
Copy link
Member

I'll close this as the PR #2850 was merged, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants