Enable sampling in chunks with external jax samplers #7465

andrewdipper · 2024-08-16T19:46:22Z

Initial take on extending blackjax and numpyro samplers to be able to sequentially sample multiple chunks. This eliminates the requirement of the gpu having sufficient memory to store all samples at once - they just need to fit in cpu memory.

Changes / features:

Sampling with one chunk (current behavior) and sampling with multiple chunks will return the exact same samples. This is how the chunking is tested
As long as the first chunk compiles / samples the remainder should not cause an out of memory error. I have pretty high confidence this holds for blackjax. Numpyro is harder due to the higher level api, but there won't be a big memory jump. Hopefully this prevents annoying errors deep into sampling
Postprocessing is done on a per chunk basis (and is compiled with sampling for blackjax)
When num_chunks==1 samples are stored on the sampling device consistent with current behavior. With multiple chunks they are transferred to cpu memory

Some question marks:

Currently progress through the chunks is just written to the log - I'm not sure if this is the most reasonable solution.
The postprocessing_backend option is removed. I think this is reasonable as any postprocessing memory requirements should be dominated by the already necessary transpose of the chains and samples dimensions (this is due to vmap(scan) materializing the scan dimension first and subsequently transposing). Unless I'm missing another reason to force the postprocessing backend?

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

📚 Documentation preview 📚: https://pymc--7465.org.readthedocs.build/en/7465/

andrewdipper · 2024-08-17T00:36:32Z

The test failure seems a bit random - I haven't been able to trigger a failure locally. I get some acceptance_rates pretty far from 0.5 so I'm not sure how stable it's expected to be.

ricardoV94 · 2024-08-17T04:18:45Z

tests/sampling/test_jax.py

@@ -229,7 +229,7 @@ def test_get_log_likelihood():
    b_true = trace.log_likelihood.b.values
    a = np.array(trace.posterior.a)
    sigma_log_ = np.log(np.array(trace.posterior.sigma))
-    b_jax = _get_log_likelihood(model, [a, sigma_log_])["b"]
+    b_jax = jax.vmap(_get_log_likelihood_fn(model))([a, sigma_log_])["b"]


why did the behavior (had to) change?

For postprocessing I needed to be able to calculate the log_likelihood without the final wrapping vmap. It's possible to have it just calculate the likelihood instead of returning a function. However the extra vmap will still be necessary

Changed to calculate the likelihood instead of using a returned likelihood calculator function.

ricardoV94 · 2024-08-17T04:19:54Z

The test failure seems a bit random - I haven't been able to trigger a failure locally. I get some acceptance_rates pretty far from 0.5 so I'm not sure how stable it's expected to be.

I don't think this was failing before so might be related to the changes

ricardoV94 · 2024-08-17T04:22:34Z

@ferrine any opinion on the removal of postprocessing_backend?

ricardoV94 · 2024-08-17T04:25:00Z

pymc/sampling/jax.py

+        import warnings
+
+        warnings.warn(
+            "postprocessing_backend={'cpu', 'gpu'} will be removed in a future release, "


We should deprecate before rendering the argument useless or raise already. Also can the message mention the alternative is num_chunks now?

Ok, makes sense - I can add back that functionality - it's just a few extra branches to keep track of

Added back logic for postprocessing_backend. It doesn't have any integration with chunked sampling but restores postprocessing all at once on a different backend.

andrewdipper · 2024-08-18T01:40:53Z

The test failure seems a bit random - I haven't been able to trigger a failure locally. I get some acceptance_rates pretty far from 0.5 so I'm not sure how stable it's expected to be.

I don't think this was failing before so might be related to the changes

It was splitting the random key 1 extra time as compared to the current behavior. Removing the extra split fixes the failure and I believe the numpyro samples generated will now be identical to those currently generated. However, this means that choosing the wrong key can still trigger the test failure. Based off of pyro-ppl/numpyro#1786, it seems like acceptance_rates won't be super stable.

ricardoV94 · 2024-08-18T06:40:58Z

The test failure seems a bit random - I haven't been able to trigger a failure locally. I get some acceptance_rates pretty far from 0.5 so I'm not sure how stable it's expected to be.

I don't think this was failing before so might be related to the changes

It was splitting the random key 1 extra time as compared to the current behavior. Removing the extra split fixes the failure and I believe the numpyro samples generated will now be identical to those currently generated. However, this means that choosing the wrong key can still trigger the test failure. Based off of pyro-ppl/numpyro#1786, it seems like acceptance_rates won't be super stable.

Okay if it's not stable feel free to choose the best code and pick a seed that happens to works

ferrine · 2024-08-18T06:48:04Z

@ferrine any opinion on the removal of postprocessing_backend?

What will be different there what what will be memory consumption? What overhead is put on the gpu/ram?

ferrine · 2024-08-18T06:50:10Z

What if a single sample does not compile on the gpu? Is it realistic? What about num_samples_in_chunk parameter?

codecov · 2024-08-18T10:49:25Z

Codecov Report

Attention: Patch coverage is 86.48649% with 15 lines in your changes missing coverage. Please review.

Project coverage is 92.40%. Comparing base (cdcdb58) to head (4e6b1fa).

Files with missing lines	Patch %	Lines
pymc/sampling/jax.py	86.48%	15 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #7465      +/-   ##
==========================================
- Coverage   92.44%   92.40%   -0.04%     
==========================================
  Files         103      103              
  Lines       17119    17153      +34     
==========================================
+ Hits        15825    15850      +25     
- Misses       1294     1303       +9

Files with missing lines	Coverage Δ
pymc/sampling/jax.py	`91.83% <86.48%> (-2.96%)`	⬇️

andrewdipper · 2024-08-18T17:04:52Z

@ferrine any opinion on the removal of postprocessing_backend?

What will be different there what what will be memory consumption? What overhead is put on the gpu/ram?

I don't know in general what the memory complexity of the postprocessing transformations can be. However, when sampling with chain_method=vectorized the vmap(scan) seems to always be turned into scan(vmap) with a subsequent transpose on the (chain, samples) dimension. That requires a copy on the leaves of the pytree. My (perhaps invalid) assumption is most transformations fit within that space but I'm not sure on the chain_method="parallel" case.

Practically speaking the postprocessing is jit compiled with the sampling step so if sampling starts then the memory is sufficient (for numpyro if tuning starts then memory is sufficient). I'm not sure I can see a case where postprocessing memory requirements are very high and cpu memory is so dominant of gpu memory that num_chunks cannot get the memory down. And remember that currently all samples must fit on the gpu together.

What if a single sample does not compile on the gpu? Is it realistic? What about num_samples_in_chunk parameter?

I'm not sure on if that happens / what the current resolution would be.

The parameterization is with draws and num_chunks with num_samples_in_chunk = draws / num_chunks. Is there as reason to prefer num_samples_in_chunk?

andrewdipper · 2024-08-18T17:06:27Z

Is there a proper way to run tests with a gpu backend enabled? My test for postprocessing_backend gets skipped since the backend is not available.

ricardoV94 · 2024-08-18T20:16:05Z

Is there a proper way to run tests with a gpu backend enabled? My test for postprocessing_backend gets skipped since the backend is not available.

No, GitHub actions doesn't include gpu in the free plan

twiecki · 2024-09-14T15:49:01Z

What's the stauts here, can we merge?

andrewdipper · 2024-09-17T22:23:03Z

Let me know if anything else needs to be done on my end

ricardoV94 · 2024-09-17T23:42:18Z

I am leaning a bit on "this is too much complexity on our side".

andrewdipper · 2024-09-18T00:44:50Z

I believe you're talking about higher level complexity. But iirc for blackjax the multi_step function can be replaced by a new (potentially unreleased) run_inference_algorithm from them - that should simplify part of the code a fair bit. The samples would just be generated with different random seeds.

Either way let me know what you decide

ricardoV94 · 2024-09-21T19:41:00Z

@andrewdipper wanna give a try at that simpler approach?

andrewdipper · 2024-09-22T01:26:24Z

Sure, I'll give it a go

andrewdipper · 2024-10-07T05:36:04Z

Apologies for the delay - I got caught up.

Switched to using blackjax.util.run_inference_algorithm and tried to clarify things a bit. Let me know if you think it's viable.

I removed the postprocessing test as it doesn't get run and blackjax chunked sampling will no longer be identical to when it's just a single chunk. I plan to swap in some other sampling tests so the code has test coverage.

andrewdipper added 14 commits August 12, 2024 09:58

chunking jax samplers

ebe4dee

fix print

e364b1c

add test

c771598

add test

44dfc88

add numpyro test

a1b6a9b

fix

ff32bac

remove overarching jit

190ad0f

.

0ed0066

fix jit

feb53b7

fix docstring

0afc973

update docstrings

270d0b3

fix lint

9e2362e

Merge branch 'pymc-devs:main' into chunking

bb40568

enhance test

57e52e3

ricardoV94 reviewed Aug 17, 2024

View reviewed changes

fix _get_log_likelihood / numpyro rng

641e142

add postprocessing on different device

9108c60

andrewdipper added 3 commits October 6, 2024 21:31

Merge branch 'pymc-devs:main' into chunking

eeb99a8

switch to blackjax run_inference_algorithm

b34b4cc

fix lint

4e6b1fa

andrewdipper requested a review from ricardoV94 October 7, 2024 05:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable sampling in chunks with external jax samplers #7465

Enable sampling in chunks with external jax samplers #7465

andrewdipper commented Aug 16, 2024 •

edited by github-actions bot

Loading

andrewdipper commented Aug 17, 2024

ricardoV94 Aug 17, 2024

andrewdipper Aug 17, 2024

andrewdipper Aug 18, 2024

ricardoV94 commented Aug 17, 2024

ricardoV94 commented Aug 17, 2024

ricardoV94 Aug 17, 2024

andrewdipper Aug 17, 2024

andrewdipper Aug 18, 2024

andrewdipper commented Aug 18, 2024

ricardoV94 commented Aug 18, 2024

ferrine commented Aug 18, 2024

ferrine commented Aug 18, 2024

codecov bot commented Aug 18, 2024 •

edited

Loading

andrewdipper commented Aug 18, 2024

andrewdipper commented Aug 18, 2024

ricardoV94 commented Aug 18, 2024

twiecki commented Sep 14, 2024

andrewdipper commented Sep 17, 2024

ricardoV94 commented Sep 17, 2024

andrewdipper commented Sep 18, 2024

ricardoV94 commented Sep 21, 2024

andrewdipper commented Sep 22, 2024

andrewdipper commented Oct 7, 2024

Enable sampling in chunks with external jax samplers #7465

Are you sure you want to change the base?

Enable sampling in chunks with external jax samplers #7465

Conversation

andrewdipper commented Aug 16, 2024 • edited by github-actions bot Loading

Checklist

Type of change

andrewdipper commented Aug 17, 2024

ricardoV94 Aug 17, 2024

Choose a reason for hiding this comment

andrewdipper Aug 17, 2024

Choose a reason for hiding this comment

andrewdipper Aug 18, 2024

Choose a reason for hiding this comment

ricardoV94 commented Aug 17, 2024

ricardoV94 commented Aug 17, 2024

ricardoV94 Aug 17, 2024

Choose a reason for hiding this comment

andrewdipper Aug 17, 2024

Choose a reason for hiding this comment

andrewdipper Aug 18, 2024

Choose a reason for hiding this comment

andrewdipper commented Aug 18, 2024

ricardoV94 commented Aug 18, 2024

ferrine commented Aug 18, 2024

ferrine commented Aug 18, 2024

codecov bot commented Aug 18, 2024 • edited Loading

Codecov Report

andrewdipper commented Aug 18, 2024

andrewdipper commented Aug 18, 2024

ricardoV94 commented Aug 18, 2024

twiecki commented Sep 14, 2024

andrewdipper commented Sep 17, 2024

ricardoV94 commented Sep 17, 2024

andrewdipper commented Sep 18, 2024

ricardoV94 commented Sep 21, 2024

andrewdipper commented Sep 22, 2024

andrewdipper commented Oct 7, 2024

andrewdipper commented Aug 16, 2024 •

edited by github-actions bot

Loading

codecov bot commented Aug 18, 2024 •

edited

Loading