Skip to content

Commit

Permalink
Merge pull request #36
Browse files Browse the repository at this point in the history
support GPU (`CUDA.jl`) acceleration for `evolution` and `spectrum`
  • Loading branch information
ytdHuang authored Nov 1, 2023
2 parents d2239f4 + 94826f7 commit b7da100
Show file tree
Hide file tree
Showing 19 changed files with 383 additions and 46 deletions.
18 changes: 18 additions & 0 deletions .buildkite/pipeline.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
steps:
- label: "HierarchicalEOM_CUDAExt"
plugins:
- JuliaCI/julia#v1:
version: "1"
- JuliaCI/julia-test#v1:
test_args: "--quickfail"
coverage: false # 1000x slowdown
agents:
queue: "juliagpu"
cuda: "*"
env:
GROUP: "HierarchicalEOM_CUDAExt"
JULIA_PKG_SERVER: "" # it often struggles with our large artifacts
# SECRET_CODECOV_TOKEN: "..."
timeout_in_minutes: 30
# Don't run Buildkite if the commit message includes the text [skip tests]
if: build.message !~ /\[skip tests\]/
5 changes: 4 additions & 1 deletion Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,11 @@ SparseArrays = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
SteadyStateDiffEq = "9672c7b4-1e72-59bd-8a11-6ac3964bc41f"

[weakdeps]
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
QuantumOptics = "6e0679c1-51ea-5a7c-ac74-d61b76210b0c"

[extensions]
HierarchicalEOM_CUDAExt = "CUDA"
HierarchicalEOM_QOExt = "QuantumOptics"

[compat]
Expand All @@ -43,9 +45,10 @@ Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
LaTeXStrings = "b964fa9f-0449-5b57-a5c2-d3ea65f4040f"
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
Pardiso = "46dd5b70-b6fb-5a00-ae2d-e8fea33afaf2"
Pkg = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
Plots = "91a5bcdd-55d7-5caf-9e0b-520d859cae80"
QuantumOptics = "6e0679c1-51ea-5a7c-ac74-d61b76210b0c"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[targets]
test = ["BenchmarkTools", "Documenter", "LaTeXStrings", "Literate", "Pardiso", "Plots", "QuantumOptics", "Test"]
test = ["BenchmarkTools", "Documenter", "LaTeXStrings", "Literate", "Pardiso", "Pkg", "Plots", "QuantumOptics", "Test"]
3 changes: 2 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,8 @@ const PAGES = Any[
"Examples" => EX_output_files,
"Benchmark Solvers" => BM_output_files,
"Extensions" => Any[
"QuantumOptics.jl" => "extensions/QuantumOptics.md"
"QuantumOptics.jl" => "extensions/QuantumOptics.md",
"CUDA.jl" => "extensions/CUDA.md"
]
],
"Library API" => "libraryAPI.md"
Expand Down
95 changes: 95 additions & 0 deletions docs/src/extensions/CUDA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# [Extension for CUDA.jl](@id doc-ext-CUDA)

This is an extension to support GPU ([`CUDA.jl`](https://github.com/JuliaGPU/CUDA.jl)) acceleration for solving the [time evolution](@ref doc-Time-Evolution) and [spectrum](@ref doc-Spectrum). This improves the execution time and memory usage especially when the HEOMLS matrix is super large.

!!! compat "Compat"
The described feature requires `Julia 1.9+`.

The functions [`evolution`](@ref doc-Time-Evolution) (only supports ODE method with time-independent system Hamiltonian) and [`spectrum`](@ref doc-Spectrum) will automatically choose to solve on CPU or GPU depend on the type of the sparse matrix in `M::AbstractHEOMLSMatrix` objects (i.e., the type of the field `M.data`).

```julia
typeof(M.data) <: SparseMatrixCSC # solve on CPU
typeof(M.data) <: CuSparseMatrixCSC # solve on GPU
```

Therefore, we wrapped several functions in `CUDA` and `CUDA.CUSPARSE` in order to return a new HEOMLS-matrix-type object with `M.data` is in the type of `CuSparseMatrix`, and also change the element type into `ComplexF32` and `Int32` (since GPU performs better in this type). The functions are listed as follows:
- `cu(M::AbstractHEOMLSMatrix)` : Translate `M.data` into the type `CuSparseMatrixCSC{ComplexF32, Int32}`
- `CuSparseMatrixCSC(M::AbstractHEOMLSMatrix)` : Translate `M.data` into the type `CuSparseMatrixCSC{ComplexF32, Int32}`

### Demonstration

The extension will be automatically loaded if user imports the package `CUDA.jl` :

```julia
using CUDA
using HierarchicalEOM
using LinearSolve # to change the solver for better GPU performance
```

### Setup

Here, we demonstrate this extension by using the example of [the single-impurity Anderson model](@ref exp-SIAM).

```julia
ϵ = -5
U = 10
Γ = 2
μ = 0
W = 10
kT = 0.5
N = 5
tier = 3

tlist = 0f0:1f-1:10f0 # same as 0:0.1:10 but in the type of `Float32`
ωlist = -10f0:1f0:10f0 # same as -10:1:10 but in the type of `Float32`

σm = [0 1; 0 0]
σz = [1 0; 0 -1]
II = [1 0; 0 1]
d_up = kron( σm, II)
d_dn = kron(-1 * σz, σm)
ρ0 = kron([1 0; 0 0], [1 0; 0 0])
Hsys = ϵ * (d_up' * d_up + d_dn' * d_dn) + U * (d_up' * d_up * d_dn' * d_dn)

bath_up = Fermion_Lorentz_Pade(d_up, Γ, μ, W, kT, N)
bath_dn = Fermion_Lorentz_Pade(d_dn, Γ, μ, W, kT, N)
bath_list = [bath_up, bath_dn]

# even HEOMLS matrix
M_even_cpu = M_Fermion(Hsys, tier, bath_list; verbose=false)
M_even_gpu = cu(M_even_cpu)

# odd HEOMLS matrix
M_odd_cpu = M_Fermion(Hsys, tier, bath_list, ODD; verbose=false)
M_odd_gpu = cu(M_odd_cpu)

# solve steady state with CPU
ados_ss = SteadyState(M_even_cpu);
```

!!! note "Note"
This extension does not support for solving [`SteadyState`](@ref doc-Stationary-State) on GPU since it is not efficient and might get wrong solutions. If you really want to obtain the stationary state with GPU, you can repeatedly solve the [`evolution`](@ref doc-Time-Evolution) until you find it.

### Solving time evolution with CPU

```julia
ados_list_cpu = evolution(M_even_cpu, ρ0, tlist; verbose=false)
```

### Solving time evolution with GPU

```julia
ados_list_gpu = evolution(M_even_gpu, ρ0, tlist; verbose=false)
```

### Solving Spectrum with CPU

```julia
dos_cpu = spectrum(M_odd_cpu, ados_ss, d_up, ωlist; verbose=false)
```

### Solving Spectrum with GPU

```julia
dos_gpu = spectrum(M_odd_gpu, ados_ss, d_up, ωlist; solver=KrylovJL_BICGSTAB(rtol=1f-10, atol=1f-12), verbose=false)
```
3 changes: 3 additions & 0 deletions docs/src/spectrum.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ The function [`spectrum`](@ref) will automatically detect the [parity](@ref doc-

`HierarchicalEOM.jl` wraps some of the functions in [LinearSolve.jl](http://linearsolve.sciml.ai/stable/), which is a very rich numerical library for solving the linear problems and provides many solvers. It offers quite a few options for the user to tailor the solver to their specific needs. The default solver (and its corresponding settings) are chosen to suit commonly encountered problems and should work fine for most of the cases. If you require more specialized methods, such as the choice of algorithm, please refer to [benchmark for LinearSolve solvers](@ref benchmark-LS-solvers) and also the documentation of [LinearSolve.jl](http://linearsolve.sciml.ai/stable/).

!!! compat "Extension for CUDA.jl"
`HierarchicalEOM.jl` provides an extension to support GPU ([`CUDA.jl`](https://github.com/JuliaGPU/CUDA.jl)) acceleration for solving the spectrum, but this feature requires `Julia 1.9+` and `HierarchicalEOM 1.1+`. See [here](@ref doc-ext-CUDA) for more details.

## [Power Spectral Density](@id doc-PSD)
Start from the spectrum for bosonic systems (power spectral density) in the time-domain. We write the system two-time correlation function in terms of the propagator ``\hat{\mathcal{G}}(t)=\exp(\hat{\mathcal{M}} t)`` for ``t>0``. The power spectral density ``S(\omega)`` can be obtained as
```math
Expand Down
4 changes: 4 additions & 0 deletions docs/src/time_evolution.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,10 @@ end
## Ordinary Differential Equation Method
The first method is implemented by solving the ordinary differential equation (ODE) as shown above. `HierarchicalEOM.jl` wraps some of the functions in [`DifferentialEquations.jl`](https://diffeq.sciml.ai/stable/), which is a very rich numerical library for solving the differential equations and provides many ODE solvers. It offers quite a few options for the user to tailor the solver to their specific needs. The default solver (and its corresponding settings) are chosen to suit commonly encountered problems and should work fine for most of the cases. If you require more specialized methods, such as the choice of algorithm, please refer to [benchmarks for DifferentialEquations solvers](@ref benchmark-ODE-solvers) and also the documentation of [`DifferentialEquations.jl`](https://diffeq.sciml.ai/stable/).

!!! compat "Extension for CUDA.jl"
`HierarchicalEOM.jl` provides an extension to support GPU ([`CUDA.jl`](https://github.com/JuliaGPU/CUDA.jl)) acceleration for solving the time evolution (only for ODE method with time-independent system Hamiltonian), but this feature requires `Julia 1.9+` and `HierarchicalEOM 1.1+`. See [here](@ref doc-ext-CUDA) for more details.


### Given the initial state as Density Operator (`AbstractMatrix` type)

See the docstring of this method:
Expand Down
73 changes: 73 additions & 0 deletions ext/HierarchicalEOM_CUDAExt.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
module HierarchicalEOM_CUDAExt

using HierarchicalEOM
import HierarchicalEOM.HeomAPI: _HandleVectorType, _HandleSteadyStateMatrix
import HierarchicalEOM.Spectrum: _HandleIdentityType
import CUDA
import CUDA: cu, CuArray
import CUDA.CUSPARSE: CuSparseMatrixCSC
import SparseArrays: sparse, SparseVector, SparseMatrixCSC

@doc raw"""
cu(M::AbstractHEOMLSMatrix)
Return a new HEOMLS-matrix-type object with `M.data` is in the type of `CuSparseMatrixCSC{ComplexF32, Int32}` for gpu calculations.
"""
cu(M::AbstractHEOMLSMatrix) = CuSparseMatrixCSC(M)

@doc raw"""
CuSparseMatrixCSC(M::AbstractHEOMLSMatrix)
Return a new HEOMLS-matrix-type object with `M.data` is in the type of `CuSparseMatrixCSC{ComplexF32, Int32}` for gpu calculations.
"""
function CuSparseMatrixCSC(M::T) where T <: AbstractHEOMLSMatrix
A = M.data
if typeof(A) <: CuSparseMatrixCSC
return M
else
colptr = CuArray{Int32}(A.colptr)
rowval = CuArray{Int32}(A.rowval)
nzval = CuArray{ComplexF32}(A.nzval)
A_gpu = CuSparseMatrixCSC{ComplexF32, Int32}(colptr, rowval, nzval, size(A))
if T <: M_S
return M_S(A_gpu, M.tier, M.dim, M.N, M.sup_dim, M.parity)
elseif T <: M_Boson
return M_Boson(A_gpu, M.tier, M.dim, M.N, M.sup_dim, M.parity, M.bath, M.hierarchy)
elseif T <: M_Fermion
return M_Fermion(A_gpu, M.tier, M.dim, M.N, M.sup_dim, M.parity, M.bath, M.hierarchy)
else
return M_Boson_Fermion(A_gpu, M.Btier, M.Ftier, M.dim, M.N, M.sup_dim, M.parity, M.Bbath, M.Fbath, M.hierarchy)
end
end
end

# for changing a `CuArray` back to `ADOs`
function _HandleVectorType(V::T, cp::Bool=false) where T <: CuArray
return Vector{ComplexF64}(V)
end

# for changing the type of `ADOs` to match the type of HEOMLS matrix
function _HandleVectorType(MatrixType::Type{TM}, V::SparseVector) where TM <: CuSparseMatrixCSC
TE = eltype(MatrixType)
return CuArray{TE}(V)
end

##### We first remove this part because there are errors when solveing steady states using GPU
# function _HandleSteadyStateMatrix(MatrixType::Type{TM}, M::AbstractHEOMLSMatrix, S::Int) where TM <: CuSparseMatrixCSC
# colptr = Vector{Int32}(M.data.colPtr)
# rowval = Vector{Int32}(M.data.rowVal)
# nzval = Vector{ComplexF32}(M.data.nzVal)
# A = SparseMatrixCSC{ComplexF32, Int32}(S, S, colptr, rowval, nzval)
# A[1,1:S] .= 0f0
#
# # sparse(row_idx, col_idx, values, row_dims, col_dims)
# A += sparse(ones(Int32, M.dim), [Int32((n - 1) * (M.dim + 1) + 1) for n in 1:(M.dim)], ones(ComplexF32, M.dim), S, S)
# return CuSparseMatrixCSC(A)
# end

function _HandleIdentityType(MatrixType::Type{TM}, S::Int) where TM <: CuSparseMatrixCSC
colptr = CuArray{Int32}(Int32(1):Int32(S+1))
rowval = CuArray{Int32}(Int32(1):Int32(S))
nzval = CUDA.ones(ComplexF32, S)
return CuSparseMatrixCSC{ComplexF32, Int32}(colptr, rowval, nzval, (S, S))
end

end
17 changes: 16 additions & 1 deletion src/ADOs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ iterate(A::ADOs, ::Nothing) = nothing

function show(io::IO, A::ADOs)
print(io,
"Auxiliary Density Operators with $(A.parity), (system) dim = $(A.dim), number N = $(A.N)\n"
"$(A.N) Auxiliary Density Operators with $(A.parity) and (system) dim = $(A.dim)\n"
)
end
function show(io::IO, m::MIME"text/plain", A::ADOs) show(io, A) end
Expand Down Expand Up @@ -206,4 +206,19 @@ function Expect(op, ados_list::Vector{ADOs}; take_real=true)
else
return exp_val
end
end

# for changing a `Vector` back to `ADOs`
function _HandleVectorType(V::T, cp::Bool=true) where T <: Vector
if cp
return Vector{ComplexF64}(V)
else
return V
end
end

# for changing the type of `ADOs` to match the type of HEOMLS matrix
function _HandleVectorType(MatrixType::Type{TM}, V::SparseVector) where TM <: SparseMatrixCSC
TE = eltype(MatrixType)
return Vector{TE}(V)
end
4 changes: 2 additions & 2 deletions src/HierarchicalEOM.jl
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,9 @@ module HierarchicalEOM

# sub-module Spectrum for HierarchicalEOM
module Spectrum
import ..HeomAPI: AbstractHEOMLSMatrix, OddParity, ADOs, spre
import ..HeomAPI: AbstractHEOMLSMatrix, OddParity, ADOs, spre, _HandleVectorType
import LinearAlgebra: I, kron
import SparseArrays: sparse, sparsevec
import SparseArrays: sparse, sparsevec, SparseMatrixCSC
import LinearSolve: LinearProblem, init, solve!, UMFPACKFactorization
import ProgressMeter: Progress, next!
import ..HeomBase: PROGBAR_OPTIONS, HandleMatrixType
Expand Down
29 changes: 17 additions & 12 deletions src/Spectrum.jl
Original file line number Diff line number Diff line change
Expand Up @@ -107,16 +107,16 @@ end
end

Size = size(M, 1)
I_total = sparse(I, Size, Size)
I_heom = sparse(I, M.N, M.N)
I_heom = sparse(one(ComplexF64) * I, M.N, M.N)
I_total = _HandleIdentityType(typeof(M.data), Size)

# equal to : transpose(sparse(vec(system_identity_matrix)))
I_dual_vec = transpose(sparsevec([1 + n * (M.dim + 1) for n in 0:(M.dim - 1)], ones(M.dim), M.sup_dim))
I_dual_vec = transpose(sparsevec([1 + n * (M.dim + 1) for n in 0:(M.dim - 1)], ones(ComplexF64, M.dim), M.sup_dim))

# operator for calculating two-time correlation functions in frequency domain
a_normal = kron(I_heom, spre(op))
a_dagger = kron(I_heom, spre(op'))
local X::Vector{ComplexF64} = a_normal * ados_vec
X = _HandleVectorType(typeof(M.data), a_normal * ados_vec)

Length = length(ω_list)
= Vector{Float64}(undef, Length)
Expand All @@ -137,7 +137,7 @@ end
end

# trace over the Hilbert space of system (expectation value)
Sω[i] = -1 * real(I_dual_vec * (a_dagger * sol.u)[1:(M.sup_dim)])
Sω[i] = -1 * real(I_dual_vec * (a_dagger * _HandleVectorType(sol.u, false))[1:(M.sup_dim)])

if SAVE
open(FILENAME, "a") do file
Expand Down Expand Up @@ -176,17 +176,17 @@ end
end

Size = size(M, 1)
I_total = sparse(I, Size, Size)
I_heom = sparse(I, M.N, M.N)
I_heom = sparse(one(ComplexF64) * I, M.N, M.N)
I_total = _HandleIdentityType(typeof(M.data), Size)

# equal to : transpose(sparse(vec(system_identity_matrix)))
I_dual_vec = transpose(sparsevec([1 + n * (M.dim + 1) for n in 0:(M.dim - 1)], ones(M.dim), M.sup_dim))
I_dual_vec = transpose(sparsevec([1 + n * (M.dim + 1) for n in 0:(M.dim - 1)], ones(ComplexF64, M.dim), M.sup_dim))

# operators for calculating two-time correlation functions in frequency domain
d_normal = kron(I_heom, spre(op))
d_dagger = kron(I_heom, spre(op'))
local X_m::Vector{ComplexF64} = d_normal * ados_vec
local X_p::Vector{ComplexF64} = d_dagger * ados_vec
X_m = _HandleVectorType(typeof(M.data), d_normal * ados_vec)
X_p = _HandleVectorType(typeof(M.data), d_dagger * ados_vec)

Length = length(ω_list)
= Vector{Float64}(undef, Length)
Expand All @@ -211,8 +211,8 @@ end
cache_p.A = M.data +
sol_p = solve!(cache_p)
end
Cω_m = d_dagger * sol_m.u
Cω_p = d_normal * sol_p.u
Cω_m = d_dagger * _HandleVectorType(sol_m.u, false)
Cω_p = d_normal * _HandleVectorType(sol_p.u, false)

# trace over the Hilbert space of system (expectation value)
Aω[i] = -1 * (
Expand All @@ -235,4 +235,9 @@ end
end

return
end

function _HandleIdentityType(MatrixType::Type{TM}, S::Int) where TM <: SparseMatrixCSC
ElType = eltype(MatrixType)
return sparse(one(ElType) * I, S, S)
end
Loading

0 comments on commit b7da100

Please sign in to comment.