Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update dpnp.linalg.matrix_power() implementation #1748

Merged
merged 12 commits into from
Mar 22, 2024
Merged

Conversation

vlad-perevezentsev
Copy link
Collaborator

This PR updates a dpnp.linalg.matrix_power() function to raise an input square matrix to the power n using dpnp.linalg.inv() and dpnp.matmul()
The changes are related to support of all types.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you filing the PR as a draft?

@vlad-perevezentsev vlad-perevezentsev self-assigned this Mar 15, 2024
@vlad-perevezentsev vlad-perevezentsev changed the title Impl matrix power Update dpnp.linalg.matrix_power() implementation Mar 15, 2024
Copy link
Contributor

github-actions bot commented Mar 15, 2024

View rendered docs @ https://intelpython.github.io/dpnp/index.html

dpnp/linalg/dpnp_utils_linalg.py Outdated Show resolved Hide resolved
dpnp/linalg/dpnp_utils_linalg.py Outdated Show resolved Hide resolved
dpnp/linalg/dpnp_utils_linalg.py Outdated Show resolved Hide resolved
dpnp/linalg/dpnp_utils_linalg.py Outdated Show resolved Hide resolved
tests/test_linalg.py Outdated Show resolved Hide resolved
tests/test_linalg.py Outdated Show resolved Hide resolved
tests/test_linalg.py Show resolved Hide resolved
tests/test_linalg.py Outdated Show resolved Hide resolved
@vlad-perevezentsev
Copy link
Collaborator Author

This table shows the performance of dpnp.linalg.matrix_power() function on Iris Xe
image

Results for floating and complexfloating types when n>0 (call dpnp.matmul()) are worse than in numpy on CPU.

@vtavana do you know about this or is it a regression?

In [1]: import dpnp
 
In [2]: import numpy
 
In [3]: na = numpy.random.randint(-10**4, 10**4, size=(4096,4096))
 
In [4]: a = numpy.array(na,dtype='f4')
 
In [5]: a_dp = dpnp.array(a, device='cpu')
 
In [6]: %timeit res = numpy.matmul(a,a)
334 ms ± 4.88 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
 
In [7]: %timeit res = dpnp.matmul(a_dp,a_dp)
380 ms ± 18.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@vtavana
Copy link
Collaborator

vtavana commented Mar 19, 2024

@vtavana do you know about this or is it a regression?

There is always some sort of variation in timing on Iris Xe. However, for this case NumPy falls back on SEGMM but dpnp falls back on SGEMM_64. and their timing is different

In [9]: %timeit numpy.matmul(a,a)
MKL_VERBOSE SGEMM(N,N,4096,4096,4096,0x7ffc219a3a68,0x7fac93ada010,4096,0x7fac93ada010,4096,0x7ffc219a3a70,0x7fabaffff010,4096) 512.61ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM(N,N,4096,4096,4096,0x7ffc219a38a8,0x7fac93ada010,4096,0x7fac93ada010,4096,0x7ffc219a38b0,0x7fabaffff010,4096) 509.24ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM(N,N,4096,4096,4096,0x7ffc219a38a8,0x7fac93ada010,4096,0x7fac93ada010,4096,0x7ffc219a38b0,0x7fabaffff010,4096) 501.30ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM(N,N,4096,4096,4096,0x7ffc219a38a8,0x7fac93ada010,4096,0x7fac93ada010,4096,0x7ffc219a38b0,0x7fabaffff010,4096) 514.48ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM(N,N,4096,4096,4096,0x7ffc219a38a8,0x7fac93ada010,4096,0x7fac93ada010,4096,0x7ffc219a38b0,0x7fabaffff010,4096) 517.97ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM(N,N,4096,4096,4096,0x7ffc219a38a8,0x7fac93ada010,4096,0x7fac93ada010,4096,0x7ffc219a38b0,0x7fabaffff010,4096) 486.50ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM(N,N,4096,4096,4096,0x7ffc219a38a8,0x7fac93ada010,4096,0x7fac93ada010,4096,0x7ffc219a38b0,0x7fabaffff010,4096) 496.99ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM(N,N,4096,4096,4096,0x7ffc219a38a8,0x7fac93ada010,4096,0x7fac93ada010,4096,0x7ffc219a38b0,0x7fabaffff010,4096) 512.76ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
506 ms ± 10.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [10]: %timeit dpnp.matmul(a_dp,a_dp)
MKL_VERBOSE SGEMM_64(N,N,4096,4096,4096,0x7fac2fffdcd8,0x7fac40000000,4096,0x7fac40000000,4096,0x7fac2fffdce0,0x7fabe4000000,4096) 499.49ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM_64(N,N,4096,4096,4096,0x7fac3a41acd8,0x7fac40000000,4096,0x7fac40000000,4096,0x7fac3a41ace0,0x7fabe4000000,4096) 501.18ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM_64(N,N,4096,4096,4096,0x7fac39c19cd8,0x7fac40000000,4096,0x7fac40000000,4096,0x7fac39c19ce0,0x7fabe4000000,4096) 508.51ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM_64(N,N,4096,4096,4096,0x7fac2fffdcd8,0x7fac40000000,4096,0x7fac40000000,4096,0x7fac2fffdce0,0x7fabe4000000,4096) 504.49ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM_64(N,N,4096,4096,4096,0x7fac3a41acd8,0x7fac40000000,4096,0x7fac40000000,4096,0x7fac3a41ace0,0x7fabe4000000,4096) 536.80ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM_64(N,N,4096,4096,4096,0x7fac39c19cd8,0x7fac40000000,4096,0x7fac40000000,4096,0x7fac39c19ce0,0x7fabe4000000,4096) 532.84ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM_64(N,N,4096,4096,4096,0x7fac2fffdcd8,0x7fac40000000,4096,0x7fac40000000,4096,0x7fac2fffdce0,0x7fabe4000000,4096) 508.24ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
MKL_VERBOSE SGEMM_64(N,N,4096,4096,4096,0x7fac3a41acd8,0x7fac40000000,4096,0x7fac40000000,4096,0x7fac3a41ace0,0x7fabe4000000,4096) 519.75ms CNR:OFF Dyn:1 FastMM:1 TID:0  NThr:6
517 ms ± 13.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@antonwolfy antonwolfy merged commit e44469c into master Mar 22, 2024
45 checks passed
@antonwolfy antonwolfy deleted the impl_matrix_power branch March 22, 2024 13:19
github-actions bot added a commit that referenced this pull request Mar 22, 2024
* Add an implementation of dpnp.linalg.matrix_power

* Update cupy tests for matrix_power

* Add dpnp tests for matrix_power

* Use add no_bool in tests to avoid singilar input matrix

* Address remarks

* Improve performance for _stacked_identity functions

* Add TestMatrixPowerBatched to cupy tests

* Update dpnp tests for matrix_power

* Efficient use of binary decomposition

---------

Co-authored-by: Anton <[email protected]> e44469c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants