Releases: nv-legate/cunumeric
v24.06.01
This is a patch release, and includes the following fixes:
- Fix for nv-legate/legate#947
- Fix package dependencies (cuda and openblas)
x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/cunumeric.
Documentation for this release can be found at https://docs.nvidia.com/cunumeric/24.06/.
v24.06.00
This release ports cuNumeric to the C++-based Legate-Core. Additionally, it includes the following new features:
np.linalg.qr
,np.linalg.svd
(single-GPU support only)- "where" argument for unary operations
np.select
np.flipup
,np.fliplr
np.cov
np.load
(initial, unoptimized implementation)np.average
np.logical_and/or.reduce
np.digitize
np.diff
np.linalg.cholesky
,np.linalg.solve
(multi-GPU support, based on cuSolverMp -- not included in conda packages, requires a manual build)- C++-based
ndarray
class (experimental support)
x86 conda packages with multi-node support (based on UCX) are available at https://anaconda.org/legate/cunumeric.
Documentation for this release can be found at https://docs.nvidia.com/cunumeric/24.06/.
Known issues
Including the nvidia
conda channel in an environment with cunumeric
may end up pulling cutensor
2.0, even though the cunumeric
packages explicitly request cutensor
1.7. This can cause error messages like this:
OSError: libcutensor.so.1: cannot open shared object file: No such file or directory
This is not an issue with cuNumeric, but with incorrect constraints on the cutensor
packages on the nvidia
channel. Please avoid including the nvidia
conda channel in any conda environment including cunumeric
.
v23.11.00
This release contains performance improvements to the variance operation, and a multi-dimensional Cholesky implementation.
Conda packages for this release are available at https://anaconda.org/legate/cunumeric.
What's Changed
🚀 New Features
- Added variance as a unary reduction by @jjwilke in #593
- Add batched cholesky implementation and tests by @jjwilke in #1029
🐛 Bug Fixes
- Replacing set with OrderedSet to avoid control-replication violations by @ipdemes in #1054
- Inline boolean operators in NumPy are bitwise, not logical by @manopapad in #1057
- Fix #1065 ("where" fails with IndexError) by @manopapad in #1067
- Fixes #1069, #1070 (minor einsum bugs) by @manopapad in #1072
📖 Documentation
- Suggest using mamba over conda by @manopapad in #1068
Full Changelog: v23.09.00...v23.11.00
v23.09.00
This release adds support for the quantile
API, and includes some performance and documentation improvements (notably a "Best Practices" guide).
Conda packages for this release are available at https://anaconda.org/legate/cunumeric.
What's Changed
🚀 New Features
- Quantile Implementation by @aschaffer in #664
🛠️ Improvements
- Add missing openmp variants to BitGenerator and UniqueReduce by @rohany in #1010
- Histogram refactor by @aschaffer in #1003
📖 Documentation
🐛 Bug Fixes
- Missing alignment on histogram call by @manopapad in #999
- Fix for control replication violation in test by @ipdemes in #1005
- Fix build instructions link by @bryevdv in #1014
- Add back None as an accepted value for axis on some type sigs by @manopapad in #1017
- If a scalar ufunc arg is cn.ndarray use its type directly by @manopapad in #1011
- Skip the docstrings for functions pulled from cloned modules by @manopapad in #1024
- Fix random test failures in CPU-only runs by @manopapad in #1025
- Don't cast histogram to int64 when density=True by @manopapad in #1042
- Explicitly cast result of shift binary operators by @manopapad in #1046
- Remove use of deprecated np.find_common_type by @manopapad in #1045
New Contributors
- @ajschmidt8 made their first contribution in #1035
Full Changelog: v23.07.00...v23.09.00
v23.07.00
This release adds support for histogram
, broadcast*
and various nan*
APIs. It also includes performance improvements to the FFT functions and cleanups in ufunc support.
Conda packages for this release are available at https://anaconda.org/legate/cunumeric.
What's Changed
🚀 New Features
- Implement broadcast routines by @bryevdv in #759
- Sanitize unary reductions that have NaNs by @shriram-jagan in #925
- Histogram Functionality by @aschaffer in #983
🛠️ Improvements
- Add ufunc methods by @bryevdv in #834
- Support of the
shape
argument inempty_like()
& Co. by @madsbk in #845 - Add support for Python 3.11 (#830) by @marcinz in #837
- Ensure ufunc/function dispatching is narrow by @seberg in #977
- Fft improvements by @mfoerste4 in #732
📖 Documentation
- Note new minimum CUDA requirements for conda packages by @manopapad in #875
🐛 Bug Fixes
- Fix bugs in concatenate and stack APIs. by @robinwnv in #844
- Fixes #858 by @manopapad in #859
- Fix concatenate and *stack APIs to support scalars(#818, #839) by @robinwnv in #866
- Avoid following compiler symlinks by @manopapad in #880
- Fix for some binary operators on float16 by @magnatelee in #889
- WAR for TBLIS compiler detection while upstream PR is pending by @manopapad in #890
- Also build CPU-only packages for haswell (#869) by @marcinz in #882
- Fix array API(#885). by @robinwnv in #910
- Fix unit tests by @magnatelee in #920
- Fix an incorrect type by @marcinz in #931
- Use correct type, to avoid int narrowing by @manopapad in #941
- Fix cunumeric.arange issues by @yimoj in #940
- Use the right type for scalar arguments by @magnatelee in #942
- Fall back to NumPy eagerly on RandomState methods by @manopapad in #959
- Fix bugs in random integer functions by @manopapad in #966
- Resolve numpy 1.25 issues by @bryevdv in #973
- Set lib_dir explicitly to lib/, even on RHEL by @manopapad in #971
- fixing putmask logic for scalar inputs by @ipdemes in #980
- fixing cuda error by @ipdemes in #978
- Change arg to LLONG_MIN to make it consistent with python. by @shriram-jagan in #986
- Missing alignment on histogram call by @manopapad in #1000
New Contributors
- @madsbk made their first contribution in #845
- @sandeepd-nv made their first contribution in #899
- @seberg made their first contribution in #977
- @shriram-jagan made their first contribution in #988
- @aschaffer made their first contribution in #983
Full Changelog: v23.03.00...v23.07.00
v23.03.00
This is the beta release of cuNumeric.
This release is focused on bug fixes, code clean-up and documentation updates, in preparation for entering beta status.
Conda packages for this release are available at https://anaconda.org/legate/cunumeric.
What's Changed
🐛 Bug Fixes
- Do reductions properly in tensor contraction tasks by @magnatelee in #803
- Seed the NumPy RNG at the start of every test by @manopapad in #792
- Fix handling of negative axis in np.repeat by @manopapad in #821
- Fix for #720 (by @lightsighter) by @manopapad in #721
- Ensure unary_func seeding is deterministic across processes by @manopapad in #825
🛠️ Improvements
- Update the architectures built in conda package by @marcinz in #770
- Use
thrust::cuda::par_nosync
if available by @magnatelee in #780 - Preemptively convert to np.ndarray on NumPy fallback by @manopapad in #802
- Removing all Legion references from the code by @magnatelee in #811
- Remove exception throwing from RNG code by @manopapad in #815
- Pin legate to a specific commit by @trxcllnt in #824
- Add support for Python 3.11 by @m3vaz in #830
📖 Documentation
Full Changelog: v23.01.00...v23.03.00
v23.01.00
This release introduces support for the put
and putmask
operations, adds an optimized implementation for the common case of advanced indexing using a single (possibly broadcasted) boolean array, includes more information in the tags of unary/binary operations on profiles (for easier cross-referencing with the source script), and adds some small improvements to OpenMP execution.
Conda packages for this release are available at https://anaconda.org/legate/cunumeric.
What's Changed
🐛 Bug Fixes
- Make the code compile with bounds checks by @magnatelee in #648
- MatVec & MatVecMul use reduction stores, not outputs by @manopapad in #646
- Set default generator based on whether ninja is available by @jjwilke in #602
- Allow args to be passed by position and name in auto_convert by @manopapad in #640
- Force positive values for log and sqrt tests by @jjwilke in #580
- Eliminate empty kernel launch in
cunumeric.unique
by @magnatelee in #675 - Make
install.py
reconfigure editable installs when build type changes by @trxcllnt in #670 - Fix for #684 by @magnatelee in #686
- Follow up on PR #671 by @ipdemes in #677
- More argument checks for
bincount
by @magnatelee in #711 - Fix a typo in unique.cu indexing by @manopapad in #713
- guard all2all from empty transfer by @mfoerste4 in #727
- src/cunumeric/item: add openmp variants for write/read tasks by @rohany in #740
- Fix CI failures due to numpy 1.24 upgrade by @manopapad in #745
- Fix timing for CuPy tests by @manopapad in #747
- Don't turn on cuNumeric debug checks on debug-rel builds by @manopapad in #753
- Move
pip uninstall
step before CMake is run instead of after. by @trxcllnt in #760 - Force conda version of cutensor by @marcinz in #765
- handle numpy 'builtins' properly for coverage by @bryevdv in #766
🚀 New Features
🛠️ Improvements
- Move test driver code to legate.core by @bryevdv in #627
- Remove --install-dir option by @bryevdv in #656
- Updates for new script-based conda env generation by @manopapad in #651
- Log operator names of unary and binary operations using annotations by @magnatelee in #679
- Regenerate
install_info.py
on every build by @trxcllnt in #705 - Fixes for buffer allocations by @magnatelee in #706
- Clean up the basic build instructions by @manopapad in #741
- Refactor benchmarks by @manopapad in #567
- Improving performance for some special cases of advanced indexing by @ipdemes in #731
- Pass
CMAKE_GENERATOR
to scikit-build by @trxcllnt in #750 - Change the default CPU architecture to haswell by @marcinz in #762
Full Changelog: v22.10.00...v23.01.00
v22.10.00
The biggest change in Release 22.10 is a new build infrastructure using CMake and scikit-build. The new build system brings several benefits including robust build dependency tracking and compliance with Python site-packages. This release includes several new search and indexing operators, fixes for several performance and correctness bugs, and provenance tracking for top-level and ndarray routines in execution profiles.
Conda packages for this release are available at https://anaconda.org/legate/cunumeric.
What's Changed
🚀 New Features
• Argwhere and flatnonzero by @mfoerste4 in #525
- added extract and place via advanced indexing by @mfoerste4 in #536
- Fill diagonal by @ipdemes in #473
- Single processor implementation for linalg.solve by @magnatelee in #568
🛠️ Improvements
- adding support for array shape () passed as an index argument in advanced indexing by @ipdemes in #486
- Refactor test driver for cpu/gpu sharding by @bryevdv in #451
- Collate test output to allow workers > 1 with verbose output by @bryevdv in #507
- Ensure test.py --use flag fully overrides USE_* envvars by @manopapad in #524
- Enhance two integration tests by @robinw0928 in #511
- Add typing to array.py by @bryevdv in #478
- Update test runner for osx by @bryevdv in #529
- Don't blindly trust user-supplied bincount.minlength by @manopapad in #523
- Make reduced-precision cuBLAS mode opt-in by @manopapad in #519
- Fix reciprocal tests for zero values and improve test value customization (#467) by @marcinz in #537
- Refactor test runner to support more pinning options by @bryevdv in #535
- Remove dead code ian bincount by @magnatelee in #546
- Make the validation condition for random distributions lenient by @magnatelee in #550
- src/cunumeric: handle high number of bins in GPU bincount by @rohany in #526
- Construct NumPy arrays correctly from 0D deferred arrays backed by region fields by @magnatelee in #551
- Collect test failure details at the end by @bryevdv in #556
- Simplify some thunk conversion helpers by @manopapad in #553
- Fix a compiler warning by @magnatelee in #555
- Add option to disable CPU pinning in tests by @bryevdv in #558
- Use the new mapper registration to enable detailed mapper logging by @magnatelee in #570
- src/cunumeric/search: make nonzero not always allocate SYS_MEM buffers by @rohany in #572
- add negative test case in test_array_split.py by @xialu00 in #545
- add some test cases for test_arg_reduce.py by @xialu00 in #575
- Testcase-add test cases for test_flip and test_indices by @xialu00 in #579
- Refactor scalar reductions to use common execution policy by @jjwilke in #573
- Sanitize k for the eye operator by @magnatelee in #586
- Add CMake build for C++ and scikit-build infrastructure for Python package installation by @jjwilke in #514
- Enhance test_block.py and test_eye.py by @robinw0928 in #578
- Testcase add test cases for test_fill.py and test_ndim.py by @xialu00 in #588
- Remove run dependency on curand by @marcinz in #520
- Use Legion Fills when possible by @manopapad in #604
- Support building with GASNet-Ex and MPI backends by @manopapad in #610
- Provenance tracking for cuNumeric operators by @magnatelee in #596
- Fix tests utils to make --directory work correctly. by @robinw0928 in #592
- Fix a compiler warning by @magnatelee in #594
- Enhance test_diag_indices.py and test_flatten.py. by @robinw0928 in #609
- cuNumeric doesn't need nested provenance tracking by @magnatelee in #617
- Add RuntimeError exception to legate.time by @robinw0928 in #618
- Stop instantiating min and max reduction ops for complex types by @magnatelee in #621
- Mark temporary conversion outputs as linear for eager storage recycling by @magnatelee in #608
- Make the negative test on fill robust across Python versions by @magnatelee in #619
- Enhance mask_indices and move_axis by @robinw0928 in #622
- src/cunumeric/matrix: stop including coll.h in solve_template.inl by @rohany in #620
🐛 Bug Fixes
- Fix performance bugs in scalar reductions by @magnatelee in #509
- Don't use internal LAPACK function names by @manopapad in #522
- Bug fixes for advanced indexing by @magnatelee in #532
- Handle the case where LAPACK_*potrf is a macro, not a function by @manopapad in #527
- fix mypy issue w/ np methods by @bryevdv in #542
- Fix buggy complex-to-bool conversions and add correctness tests for astype by @magnatelee in #549
- fixing advanced indexing operation for empty arrays by @ipdemes in #504
- Do not link curand by @marcinz in #541
- Fixing issues with advanced_indexing_kernel by @ipdemes in #557
- fixing another corner case for advanced indexing by @ipdemes in #554
- Fix OSX test shard generation by @bryevdv in #563
- fix error print in test_unary_ufunc by @jjwilke in #566
- Add NAN handling to convert() needed for some prefix routines with integer outputs. by @rkarim2 in #502
- Fixing logic for slicing by @ipdemes in #574
- Fix linalg.solve when inputs are scalars by @magnatelee in #585
- Allow casting in cn.dot, to match numpy's behavior by @manopapad in #598
- Add linalg.solve to the cmake build by @magnatelee in #603
- Invoke eye with read-write privilege, not write-discard by @manopapad in #616
- Fix a bug in scalar reduction launching kernels with empty domains by @magnatelee in #606
📖 Documentation
v22.08.00
Release 22.08.00 features a variety of random distribution implementations (backed by cuRAND), distributed prefix scan operators, and a complete implementation of sorting for multi-node multi-CPU execution. This release also includes several quality-of-life changes and bug fixes, including type annotations for all but one Python module, improvements to the parallel test driver, fixes for several operators when inputs are empty, and proper handling of ndarrays passed as array sizes or indices.
Conda packages for this release are available at https://anaconda.org/legate/cunumeric.
New Features
- Adding support for ND output regions in Advanced Indexing task by @ipdemes in #370
- added support for 'searchsorted' by @mfoerste4 in #414
- np.packbits and np.unpackbits by @magnatelee in #427
- Implementation of atleast_{1,2,3}d by @sbak5 in #404
- Implementing cunumeric.random.BitGenerator by @fduguet-nv in #254
- Adding support for some simple _indices routines by @ipdemes in #417
- adding mask_indices routine by @ipdemes in #426
- Random advanced distributions by @fduguet-nv in #470
- Distributed nd sort for cpu/omp by @mfoerste4 in #437
- Initial implementation of scan routines. by @rkarim2 in #425
- Adding support for take_along_axis and put_along_axis by @ipdemes in #436
- cunumeric.ndim by @magnatelee in #495
- Add support for curand conda package build (cherry pick #510) by @marcinz in #512
Improvements
- Don't run the resolution logic if the arrays have the same dtype by @magnatelee in #389
- Set cuda virtual package as hard run requirement for gpu conda package by @m3vaz in #398
- First pass mypy typing by @bryevdv in #387
- Generalize Dict to Mapping for newer versions of mypy by @jjwilke in #405
- Add support for using cupy in sort.py by @robinw0928 in #395
- Refactor test.py by @bryevdv in #378
- Use Numpy axis normalizations where possible by @bryevdv in #419
- More mypy by @bryevdv in #413
- adding bounds check for advanced indexing by @ipdemes in #397
- Report Elapsed Time in cholesky's output by @SeyedMir in #423
- Support -vv for more verbose test output by @bryevdv in #432
- Add typing to runtime.py by @bryevdv in #428
- Update compress/take tests for pytest by @bryevdv in #435
- Project down to a 1D store for the scalar reduction output by @magnatelee in #455
- Fallback to self = np.ndarray when necessary by @bryevdv in #431
- Add types to thunk modules by @bryevdv in #438
- allclose detail + misc tests improvements by @bryevdv in #457
- cunumeric.random - Adding Module-scoped functions by @fduguet-nv in #481
- Activate the NumPy fallback for cunumeric.random in CPU build by @magnatelee in #485
- Legacy generators for cpu build by @magnatelee in #487
- Allow CPU build to optionally use cuRAND by @magnatelee in #498
- Sanitize shapes in ndarray's constructor by @magnatelee in #496
- src/cunumeric/sort: stop using std::{inclusive, exclusive}_scan by @rohany in #499
- Update conda requirements by @manopapad in #383
- Handle dtype/casting/out properly in contractions by @manopapad in #402
- Missing / overzealous check_eager_args calls by @manopapad in #465
- Strengthen some types by @manopapad in #468
Bug Fixes
- Add missing includes to aid intellisense providers by @trxcllnt in #382
- Proper exception handling for cholesky by @magnatelee in #391
- Fixes for building with setup.py outside conda, primarily Mac by @jjwilke in #394
- Use the right API to check if the store is unbound by @magnatelee in #399
- Fix nargs for report:dump-csv by @bryevdv in #400
- Handle empty outputs correctly in advanced indexing task by @magnatelee in #396
- Fall back to NumPy in array_function and array_ufunc by @magnatelee in #424
- Fix for legate data interface by @magnatelee in #429
- Fix test_floating.py test to call sys.exit by @marcinz in #433
- Make missing pynvml an error for GPU tests by @bryevdv in #441
- Make the NumPy fallback work correctly in randint by @magnatelee in #450
- Squeeze fix by @magnatelee in #448
- Correctly prune out empty tasks in binary reduction by @magnatelee in #453
- Minor fix for indexing routines by @magnatelee in #452
- Make DeferredArray.reshape always return a deferred array by @magnatelee in #454
- Re-freezing conda compiler versions (#415) by @m3vaz in #449
- Fix for floating point predicates by @magnatelee in #466
- markdown version fix by @ipdemes in #459
- Fixup typing regressions by @bryevdv in #471
- Remove ill-defined advanced indexing test case by @magnatelee in #484
- Handle empty inputs correctly in local scan tasks by @magnatelee in #491
- Handle an unknown in a tuple correctly in reshape by @magnatelee in #490
- fix mismatched size_t/uint64_t types by @jjwilke in #475
- Allow scalar cunumeric ndarrays as array indices by @manopapad in #479
Documentation
- adding new version for documentations by @ipdemes in #447
- Updates to api_compare.py by @bryevdv in #456
- Be stricter applying CuWrapperMetadata by @bryevdv in #463
- Add custom nitpicky ref checks for cunumeric APIs by @bryevdv in #462
- Docs coverage check by @bryevdv in #469
- Fix the API reference for random functions and scan operators by @magnatelee in #497
New Contributors
- @jjwilke made their first contribution in #394
- @SeyedMir made their first contribution in #423
- @fduguet-nv made their first contribution in #254
- @rkarim2 made their first contribution in #425
- @rohany made their first contribution in #499
Full Changelog: v22.05.02...v22.08.00
v22.05.02
This hotfix release fixes issues in conda recipes.
What's Changed
- Cherry pick: Update conda requirements (#383) by @marcinz in #406
- Cherry pick: Set cuda virtual package as hard run requirement for conda gpu package (#398) by @marcinz in #407
- Cherry pick: Fix nargs for report:dump-csv (#400) by @marcinz in #408
- Re-freezing conda compiler versions by @m3vaz in #415
Full Changelog: v22.05.01...v22.05.02