Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote state and time-span in schroedinger to the type from H(t)*psi #356

Merged
merged 42 commits into from
Jan 25, 2023

Conversation

AmitRotem
Copy link
Contributor

@AmitRotem AmitRotem commented Dec 28, 2022

For optimizing cost wrt par, it is sometimes useful to get the gradient of cost wrt par.
Promoting the state and time-span to the type from Hamiltonian * state makes it possible to use ForwardDiff.jl on cost - if you happen to define the Hamiltonian in a compatible manner to ForwardDiff.jl.

For example;

using QuantumOptics
import Preferences, UUIDs
Preferences.set_preferences!(UUIDs.UUID("f6369f11-7733-5829-9624-2563aa707210"), "nansafe_mode" => true)
import ForwardDiff as FD

# 3 level kerr transmon with drive
ba = FockBasis(2)
T2 = 1e4
function get_Ht(p::Vector{<:Tp}) where Tp
    ω0, α = Tp(5.0), Tp(-0.2)
    A, freq, ϕ, T = p
    op = 2π*([number(ba), 2\create(ba)*number(ba)*destroy(ba), im*(create(ba)-destroy(ba))])
    fω(t) = ω0
    fα(t) = α
    fΩ(t) = A*cospi(2t*freq + 2ϕ)*sinpi(t/T)^2
    H_at_t = LazySum(zeros(Tp,length(op)), op)
    function Ht(t,_)
        H_at_t.factors.= (fω(t), fα(t), fΩ(t))
        return H_at_t
    end
    return Ht
end

ψ0 = Operator(SpinBasis(1/2), basisstate(ba, 1), basisstate(ba, 2))
target = ψ0*exp(im*0.5π*dense(sigmax(SpinBasis(1/2)))) # x gate
function cost(par)
    T = last(par)
    Ht = get_Ht(par)
    ts = (0.0, T)
    _, ψT = timeevolution.schroedinger_dynamic(ts, ψ0, Ht)
    1-abs2(tr(target'last(ψT))/2)*exp(-T/T2)
end

p0 = [0.03, 5.0, 0.25, 100.0]
cost(p0) |> display
FD.gradient(cost, p0)

@codecov
Copy link

codecov bot commented Dec 28, 2022

Codecov Report

Merging #356 (c9ad058) into master (64108ba) will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #356      +/-   ##
==========================================
+ Coverage   98.03%   98.05%   +0.02%     
==========================================
  Files          16       16              
  Lines        1422     1438      +16     
==========================================
+ Hits         1394     1410      +16     
  Misses         28       28              
Impacted Files Coverage Δ
src/timeevolution_base.jl 100.00% <ø> (ø)
src/schroedinger.jl 93.75% <100.00%> (+3.12%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Krastanov
Copy link
Collaborator

Krastanov commented Dec 28, 2022

You probably already knew most of this (this being marked as a draft) but:

  • Could you remove Manifest.toml from this pull request?
  • I do not think the creation of the example folder is very appropriate. Nothing else in this library uses that style. Maybe this should go to a dedicated section in the documentation?
  • This probably needs tests being added
  • What is the performance cost of the changes you made? Does it still work as fast as before in cases where you do not need differentiation?

It would be quite wonderful to have autodiff working with this library.

@AmitRotem
Copy link
Contributor Author

AmitRotem commented Dec 28, 2022

For this small example;

p0 = [0.03, 5.0, 0.25, 100.0]
cost(p0);
@benchmark cost(p0)

In this branch, benchmarking gives run time of 21.429 ms ± 797.528 μs with 289 allocations.
In master branch, benchmarking gives run time of 22.229 ms ± 1.182 ms with 277 allocations.
So mostly a small allocation overhead ?

I'm guessing this will be true for the general case of working with ComplexF64.
Do you think there is some edge cases where the promotion will results in some abstract type, resulting lots of allocations ?
This happens in this example when taking the gradient if I don't promote ω0 or α.

# general case is Ts<:Complex, Tt<:Real
Ts = eltype(H)
Tt = real(Ts)
(isconcretetype(Ts) && isconcretetype(Tt)) || @warn "For using `ForwardDiff` on `schroedinger` the element type of `real(H(t,psi)*psi)` must be concrete !!\nGot elements of type $Tt \nTry promoting the Hamiltonian elements based on the parameters you are differentiating by."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situations would this warning message be seen? I feel it might be confusing as it will show up in front of users that have not touched ForwardDiff or schroedinger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove it.
This was trying to warn about something like

ω0, α = 5.0, -0.2 # not promoting
#...
fun = [fω, fα, fΩ]
H_at_t = LazySum([f(0.0) for f=fun], op)
@show H_at_t.factors #  type is abstract !! Complex{Real}

in the definition of get_Ht.
But It doesn't cover all possible "bad" definitions of get_Ht.

Ts = eltype(H)
Tt = real(Ts)
(isconcretetype(Ts) && isconcretetype(Tt)) || @warn "For using `ForwardDiff` on `schroedinger` the element type of `real(H(t,psi)*psi)` must be concrete !!\nGot elements of type $Tt \nTry promoting the Hamiltonian elements based on the parameters you are differentiating by."
tspan = Tt.(tspan)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine there are situations where no promotion is necessary. Can we shortcircuit those?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll declare type Dual and only promote that case, probably would be safer.

@Krastanov
Copy link
Collaborator

The extra allocation does not seem particularly problematic on first sight (as long as it is just a constant non-growing overhead), but it can probably be removed with a bit of extra logic in the promotion function.

But I am a bit confused, why is promotion even necessary. You are not deriving with respect to state or tspan, so why is it necessary to make them dual numbers? I thought ForwardDiff can multiply by a constant? This works for instance without explicitly promoting the vector [1,2,3]:

julia> cost(H) = sum(H*[1.,2.,3.])

julia> ForwardDiff.gradient(cost, rand(3,3))
...

@Krastanov
Copy link
Collaborator

I guess the answer to my question about why promotion is necessary is because there are in-place operations done by the diff eq solver?

How about we just shortcircuit on the isconcrete checks. If the types are concrete and if they are different from the types of tspan and state, then we modify them. If not, we do not touch them?

@Krastanov
Copy link
Collaborator

I am also a bit worried: how does this work when another autodiff tool is used? Is this making Zygote and Enzyme happier or is it breaking things for them.

@AmitRotem
Copy link
Contributor Author

s the answer to my question about why promotion is necessary is because there are in-place operations done by the diff eq solver?

I think it's a QO.jl issue, meaning in-place operations on states and operators. Using DiffEq.jl directly with matrix and vector there no need for the user to promote anything.

@Krastanov
Copy link
Collaborator

s the answer to my question about why promotion is necessary is because there are in-place operations done by the diff eq solver?

I think it's a QO.jl issue, meaning in-place operations on states and operators. Using DiffEq.jl directly with matrix and vector there no need for the user to promote anything.

I think the correct solution would be to fix that issue in QO.jl instead of working around it. My worry is that workarounds that do not follow the DiffEq.jl style would just become more and more difficult to keep functional as the autodiff capabilities in julia evolve.

Any idea why DiffEq does not have this problem, but QO has it? Anything in particular that QO does that makes things difficult?

@AmitRotem
Copy link
Contributor Author

I am also a bit worried: how does this work when another autodiff tool is used? Is this making Zygote and Enzyme happier or is it breaking things for them.

Last time I tryed Zygote didn't work with QO, and I didn't check if the promoting changed anything.
Anyway, If we promote only if H is Dual it shouldn't affect other packages right?

@Krastanov
Copy link
Collaborator

I am also a bit worried: how does this work when another autodiff tool is used? Is this making Zygote and Enzyme happier or is it breaking things for them.

Last time I tryed Zygote didn't work with QO, and I didn't check if the promoting changed anything. Anyway, If we promote only if H is Dual it shouldn't affect other packages right?

That is true, but similarly to my comment above, this would be a fragile fix around some other underlying issue. I would be hesitant to use a library that has such special cases in its code, and such a library would be more difficult for others to continue developing. Moreover, that approach would need to add ForwardDiff as a dependency.

Figuring out why QO does not support something that is supported by the underlying DiffEq would probably lead to simpler code and support for other unrelated autodiff tools. Could you give a minimum example of a working autodiff with DiffEq?

@AmitRotem
Copy link
Contributor Author

s the answer to my question about why promotion is necessary is because there are in-place operations done by the diff eq solver?

I think it's a QO.jl issue, meaning in-place operations on states and operators. Using DiffEq.jl directly with matrix and vector there no need for the user to promote anything.

I think the correct solution would be to fix that issue in QO.jl instead of working around it. My worry is that workarounds that do not follow the DiffEq.jl style would just become more and more difficult to keep functional as the autodiff capabilities in julia evolve.

Any idea why DiffEq does not have this problem, but QO has it? Anything in particular that QO does that makes things difficult?

I think it's the preallocation of dpsi, than mul! is failing with ForwardDiff because it try to convert Dual from H*psi to ComplexF64 in dpsi - related to qojulia/QuantumOpticsBase.jl#76 .

DiffEq.jl Also had this issue in the past, meaning the user had to promote time and state for ForwardDiff to work. Not sure what solution they implemented internally.

@Krastanov
Copy link
Collaborator

I think the fix should be in integrate or recast! where we are interacting with the low level DiffEq library.

https://github.com/qojulia/QuantumOptics.jl/blob/master/src/timeevolution_base.jl#L14

https://github.com/qojulia/QuantumOptics.jl/blob/master/src/schroedinger.jl#L57

Maybe we should try to expose some of SciMLSensitivity https://docs.sciml.ai/SciMLSensitivity/stable/

@Krastanov
Copy link
Collaborator

This might be of use: https://github.com/SciML/PreallocationTools.jl

@AmitRotem
Copy link
Contributor Author

AmitRotem commented Dec 29, 2022

I am also a bit worried: how does this work when another autodiff tool is used? Is this making Zygote and Enzyme happier or is it breaking things for them.

Last time I tryed Zygote didn't work with QO, and I didn't check if the promoting changed anything. Anyway, If we promote only if H is Dual it shouldn't affect other packages right?

That is true, but similarly to my comment above, this would be a fragile fix around some other underlying issue. I would be hesitant to use a library that has such special cases in its code, and such a library would be more difficult for others to continue developing. Moreover, that approach would need to add ForwardDiff as a dependency.

Figuring out why QO does not support something that is supported by the underlying DiffEq would probably lead to simpler code and support for other unrelated autodiff tools. Could you give a minimum example of a working autodiff with DiffEq?

Cheating a bit, because there is an issue with complex numbers... "Dual on Complex is ambiguous".

using LinearAlgebra, DifferentialEquations, SciMLSensitivity
import ForwardDiff as FD
import Zygote as Zy

H0 = randn(5,5)
H0-=H0'
A = randn(5,5)
A-=A'
u0 = hcat(normalize(randn(5)), normalize(randn(5)))

function Ht!(du,u,p,t)
    a,b,c = p
    du .= A*u
    du.*= a*cos(b*t+c)
    du.+= H0*u
    nothing
end

function loss(p)
    prob = ODEProblem(Ht!,u0,(0.0,1.0),p)
    sol = solve(prob)
    abs2(tr(u0'last(sol.u)))
end

loss(rand(3))
FD.gradient(loss, rand(3))
Zy.gradient(loss, rand(3))

@AmitRotem
Copy link
Contributor Author

Not sure if there is a nice way to deal with complex number here.
https://docs.sciml.ai/SciMLSensitivity/dev/optimal_control/SDE_control/ here they split the eq to real and imag parts.

If everything is nice and analytic, I think it's safe to define Dual on Complex;

function FD.Dual{T, V, N}(Z::Complex) where {T, V, N}
    complex(FD.Dual{T, V, N}(real(Z)), FD.Dual{T, V, N}(imag(Z)))
end

then we only need to promote the time-span, and the above DiffEq.jl example works with complex numbers.

I'm not sure how or why this issue do not show up with QO.jl

@AmitRotem
Copy link
Contributor Author

I will look into using PreallocationTools.jl to better handle edge cases

@Krastanov
Copy link
Collaborator

I tried to read through the DiffEq code, and it seems they do the promotions themselves already:

https://github.com/SciML/DiffEqBase.jl/blob/82a69aecf6f20062658f6a554695b56821f4f25d/src/solve.jl#L936

And they do the promotion in a way that is very similar to what you have created here:

https://github.com/SciML/DiffEqBase.jl/blob/aad604cc44d9ae2584a94936b8c8eddc29293662/src/forwarddiff.jl#L207

It would be fantastic if we can figure out why the promotion does not work by default for us. That way we would not need to add extra code to QO.jl and we will have it for free for all dynamical equations, not just the shroedinger one.

@Krastanov
Copy link
Collaborator

The DiffEq promotion does not automatically work, because their promotion happens only if there is a dual in the p parameters argument. We never use the parameters argument, so it never gets triggered. Here is an example of this not working that does not involve QuantumOptics:

function make_f(param)
    function fun!(du,u,p,t)
        du .+= param .* u
    end
end
function cost(param)
    fun! = make_f(param)
    prob = ODEProblem(fun!,[1.0,0.0],(0,.1))
    state = solve(prob,Tsit5(),save_on=false,save_end=true,save_start=false,wrap=false)[end]
    sum(state)
end

cost(1.0)
ForwardDiff.derivative(cost,1.0)  # breaks

@Krastanov
Copy link
Collaborator

My personal belief is that the "proper" solution is to rework a lot of the internals of QO.jl in order to use DiffEq.jl in a more idiomatic manner. That would significantly simplify the code.

However, that is a gigantic amount of work that I do not think should be blocking important features like the one you are currently suggesting.

Therefore, I am now in favor of the original approach you had started with, even if it re-implements features that supposedly we should get from DiffEq for free. I would suggest using a check similar to eltype(...) <: ForwardDiff.Dual so that promotion is veeeeeeeeery limited.

@AmitRotem
Copy link
Contributor Author

AmitRotem commented Dec 29, 2022

The DiffEq promotion does not automatically work, because their promotion happens only if there is a dual in the p parameters argument.
....

This reminds me of how to get ForwardDiff.jl working on an ode with complex numbers;

function loss(p)
    Tp = eltype(p)
    Ts = promote_type(Tp, eltype(u0))
    function f(u,qwe,t)
        Ht(u,p,t)
    end
    prob = ODEProblem( f, Ts.(u0), Tp.((0.0,1.0)) )
    sol = solve(prob)
    abs2(tr(first(sol.u)'last(sol.u)))
end

This bypass the promotion in DiffEqBase.jl using the type from p, as you mentioned.
I'm not sure why DiffEqBase.promote_u0 fails on complex.

I've opened an issue SciML/DiffEqBase.jl#858

FDgrad(cost1_with_dt, p0)
### test vs finite difference
#### there somtimes an issue with the gradient at p0, which is alot less accurate
@test all([test_vs_fin_diff(cost1_with_dt, p; atol=1e-6) for p=rp.(range(1e-9, 0.1, tests_repetition))])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a small unstable region around p0, where AD and finite diff differ by 1e-2. Not sure why that is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would just removing that test make sense? Is it really testing anything that the previous set of much simpler tests do not? This is about testing the autodiff, not about a study of numerical stability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
I didn't check this example vs DiffEq, but given that the other examples I compared showed less than 1e-12 difference I think we can remove this test.
And also the 1'st test?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure which one you refer to as 1st. Testing both the static and dynamic Hamiltonians seems valuable to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping the two tests you specifically cited below seems reasonable to me. You can also guide yourself by the coverage report - we should aim to have all new lines covered by tests.

@AmitRotem AmitRotem marked this pull request as ready for review January 8, 2023 17:07
## static
function get_H(p)
op = create(ba0)+destroy(ba0)
return sin(p)*op
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean keep this static test

target2 = randstate(ba2)
function cost2(par)
a,b = par
Ht(t,u) = A + a*cos(b*t)*B/10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this dynamic test

@AmitRotem
Copy link
Contributor Author

Are we happy with this?

@Krastanov
Copy link
Collaborator

I think the way it is done is quite clean, or at least as clean as possible.

I am not the biggest fan of the style of the tests: my personal philosophy is to not include tests that are sensitive to numerical instabilities (if the code being tested is unrelated to numerical instability control). But being practical, I do not think it is necessary to redo the tests or block the merge. Maybe in a couple of years there will be a test failure due to some change to default DiffEq / ForwardDiff algorithms, but that is fine.

I am a bit worried though that this adds whole 3 minutes to the test run. I would have expected testing this to take 10-20 seconds. Wouldn't both these issues (instabilities and long test runs) be fixed if we just test fewer examples. Testing fewer examples would not drop the test coverage percentage or really change much.

Minor notes:

  • There is not much point in seeding the random number generator: the generator algorithm is not stable between versions and architectures. If you insist on it, you can use StableRNG.jl but that seems like overkill
  • The comments in the test file were a bit disorganized, e.g. it jumps from Example 0 to Example 2.

Anyway, I do not see blocking issues. @david-pl , is there a policy/philosophy about what the tests need to look like. These seem good enough even if unpolished and a bit long.

More importantly, thanks for fixing some of the ergonimics with autodiff in this package. This would be much more pleasant to use for me now.

@david-pl
Copy link
Member

@AmitRotem thanks for all the work here! I have to admit that my knowledge of AD is a bit limited, so I trust @Krastanov here - thanks for the detailed review!

Regarding the tests: they are a bit of a mess throughout the package already. Also, time is a bit of an issue as the CI already takes quite long, but we'll have to look at that in a separate PR. There's no real philosophy in how to do things, except maybe that we have one test per file. So long as there are tests for new code, I'm happy!

Any mess we have in the tests can be cleaned up in the future. For that I'd simply stick to the way SciML tests are structured.

So all good from my side. @Krastanov just give me one last go ahead once you're happy with the PR and I'll merge it.

@Krastanov
Copy link
Collaborator

I will do one last pass and rerun the tests tonight and will give that go-ahead then.

@Krastanov
Copy link
Collaborator

@AmitRotem , if I understand correctly, this depends on a change you contributed to DiffEqBase in version 6.113.0. Could you add this to the Project dependencies and compat? You can also then directly do using DiffEqBase instead of getting it through the OrdinaryDiffEq namespace.

@david-pl good to merge in my opinion after adding the dependency. Details below.

The comments about the tests stand, but as you mentioned that requires an overall cleanup, probably following SafeTestsets as done in SciML.

This also introduces a dependence on a private DiffEqBase function (the promotion one), however that function has existed since 2018 and very early versions of DiffEqBase. We can not really rely on semver to detect issues with this, but it seems relatively safe (and testing will detect any possible issues in the future). Amit has added another method to that function upstream to make sure it does not break on complex numbers.

As I already mentioned. The actual changes look neat and clean. Once this is merged I will create issues to track the addition of the same features to the rest of the solvers.

@AmitRotem
Copy link
Contributor Author

Most of the time is compilation time for ForwardDiff.jl.
Even if I keep only 2 test (not repeating over randoms), for checking no error during promotion and comparing AD on QO.jl vs AD on DiffEq.jl (which i think is the bare minimum to test for), the run time (localy) goes from 77 sec to 34 sec.

add DiffEqBase to Project and compat
@Krastanov
Copy link
Collaborator

Looks good to merge.

I do not actually think there is that much need for the tests against DiffEq. We are using DiffEq internally anyway, so just checking for correctness with a quick finite differences approximation on a numerically stable example would also have been enough.

@AmitRotem
Copy link
Contributor Author

If you think this minimal version of the test is enough, I'll remove the longer version.

@Krastanov
Copy link
Collaborator

Thanks for the test cleanup! Looks good to merge to me (the only changes from the previous time I said that is that tests were streamlined).

@david-pl david-pl merged commit 1abc40f into qojulia:master Jan 25, 2023
@AmitRotem AmitRotem deleted the promote-state-and-time branch January 26, 2023 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants