AD Meta Issue for 1.0 #2411

willtebbutt · 2024-12-02T16:31:17Z

Below is my view on what AD in Turing 1.0 ought to look like. Please feel free to comment / add your own thoughts -- I'll update this statement in light of new items.

This issue should make clear what the context / background is for the problem, detail what steps need to be taken to make progress, and make it clear what it will take to make this issue as being closed. The context of this at any given point in time reflects what we currently believe to be true, and is subject to change.

Note: this issue is only half done -- I still need to discuss performance.

Summary

There are two main questions to ask about a given AD on a given Turing.jl model:

does it run (correctly)?
is it performant?

In 1.0, we want to be able to be able to make fairly confident statements about the kinds of models that AD works on -- this must be achieved through testing.

Similarly, we want to be able to make quantitative statements about the performance a user should expect from a given AD, and give them advice for debugging if it appears to be slow.

Testing: does it run?

In order to be confident that we have reasonable support in a large range of cases, we need to

define roughly what it is that we want to support,
know what we currently do / do not test + fill in the gaps, and
ensure that the test cases get run in the right places.

1. Rough Support Requirements:

This is the thing that I have the least strong opinions on. Certainly, we want to test all of the varinfos, every Distributions.Distribution that we care about in at least one model, and all of the various bits of syntax which DynamicPPL.jl exposes to the user.

2. Existing test cases for AD and where we run them:

We have some of these in DynamicPPL.TestUtils.DemoModels. and (my understanding is) that they're quite good at checking that you can differentiate a very simple Turing.jl model (specifically, one comprising a single distribution, but implemented in a range of ways).
AD backends are testing in DynamicPPL here.
This loops over the combination of each AD backend, each element of DemoModels, and each varinfo.

So I get the impression we have moderate coverage of DynamicPPL features, good coverage of the various varinfos, for each AD tested. This testing happens inside Turing.jl.

3. Ensuring Testing Happens

There are three things which should happen in order to give us a reasonable degree of confidence in AD:

we define a collection of models which we want to be able to differentiate,
we run the tests for these in one of the TuringLang repos, making sure to test the thing that users actually call, and
we derive from this collection of models a collection of (f, args...), which we can pass to AD backends and say "hey, this is our current best guess at what you need to be able to differentiate if you want to support Turing.jl. If you want to ensure support for Turing.jl, just run these as part of your integration tests in your CI, and make sure that you can differentiate them correctly and quickly".

Note that it is not sufficient to only do one of 2 or 3, as they each serve slightly different purposes.

2 is necessary because ultimately we are the ones who want to be sure that AD works for our users, and to know what current does not work. In particular, if we change something in Turing.jl which causes AD-related problems, we want to know about them before merging them. Knowing about them, we can either change our implementation to play nicely with the AD having problems, or open an upstream issue if an AD fails to differentiate something that we think it ought really be able to differentiate.

3 is necessary because AD authors will often change internals in their packages. Hopefully their unit tests will catch most problems before they release changes, but there is really no substitute for having a very large array of real test cases to provide something like fuzzing / property testing for your AD. From Turing.jl's perspective, having our test cases being run as part of the CI for the ADs that we care about ensures a better experience for our users.

@penelopeysm has made a start on a more general package https://github.com/penelopeysm/ModelTests.jl/, which aims to systematise testing a bit more thoroughly and provide test cases for use by external packages (correct me if I'm wrong Penny). From my perspective, it goes about this in exactly the right way. In particular:

DynamicPPL.jl can just use the ad_ldp or the ad_di function to turn models into test cases, while
AD backends, such as Mooncake, can hook into make_function and make_params.

Performance

I will finish this section off another day.

Concrete Todo items:

decide where we want to keep this testing infrastructure. In particular, do we keep them in ModelTests and move this package into the Turing org, or locate the functionality from ModelTests inside DPPL.jl itself. Discussion here
extend testing functionality to permit us to manually flag test cases as "broken" on a particular backend
decide what additional test cases we want to add, and add them.
detail the "Performance" section of this issue (me)
make use of testing infrastructure in the DynamicPPL test suite (if it stays in DPPL, there may be nothing to do here)
make use of testing infrastructure in the Mooncake test suite (for me to do)
start discussions with other AD backends about incorporating our test suite in their integration tests

Linked Issues / PRs:

Questions:

what is the answer to the first concrete todo item?
what existing places where we test ADs have I missed?
does this plan make sense? Is there anything else that should be added?
is anything unclear? In particular, is it at all unclear what we want to get done for Turing.jl 1.0?

The text was updated successfully, but these errors were encountered:

penelopeysm · 2024-12-02T20:22:49Z

decide where we want to keep this testing infrastructure. In particular, do we keep them in ModelTests and move this package into the Turing org, or locate the functionality from ModelTests inside DPPL.jl itself.

what is the answer to the first concrete todo item?

I've started a discussion on this single point here: #2412

wsmoses · 2024-12-06T05:18:47Z

Can getting #1887 merged be added to the todo's here?

willtebbutt added this to the Turing v1.0.0 milestone Dec 2, 2024

mhauru mentioned this issue Dec 5, 2024

Turing 1.0 Meta Meta Issue #2420

Open

13 tasks

penelopeysm added the roadmap label Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AD Meta Issue for 1.0 #2411

AD Meta Issue for 1.0 #2411

willtebbutt commented Dec 2, 2024 •

edited

Loading

penelopeysm commented Dec 2, 2024

wsmoses commented Dec 6, 2024

AD Meta Issue for 1.0 #2411

AD Meta Issue for 1.0 #2411

Comments

willtebbutt commented Dec 2, 2024 • edited Loading

Summary

Testing: does it run?

Performance

Concrete Todo items:

Linked Issues / PRs:

Questions:

penelopeysm commented Dec 2, 2024

wsmoses commented Dec 6, 2024

willtebbutt commented Dec 2, 2024 •

edited

Loading