forked from numba/numba
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGE_LOG
1159 lines (958 loc) · 46.5 KB
/
CHANGE_LOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Version 0.29.0
--------------
This release extends the support of recursive functions to include direct and
indirect recursion without explicit function type annotations. See new example
in `examples/mergesort.py`. Newly supported numpy features include array
stacking functions, np.linalg.eig* functions, np.linalg.matrix_power, np.roots
and array to array broadcasting in assignments.
This release depends on llvmlite 0.14.0 and supports CUDA 8 but it is not
required.
Improvements:
* PR #2130, #2137: Add type-inferred recursion with docs and examples.
* PR #2134: Add ``np.linalg.matrix_power``.
* PR #2125: Add ``np.roots``.
* PR #2129: Add ``np.linalg.{eigvals,eigh,eigvalsh}``.
* PR #2126: Add array-to-array broadcasting.
* PR #2069: Add hstack and related functions.
* PR #2128: Allow for vectorizing a jitted function. (thanks to @dhirschfeld)
* PR #2117: Update examples and make them test-able.
* PR #2127: Refactor interpreter class and its results.
Fixes:
* PR #2149: Workaround MSVC9.0 SP1 fmod bug kb982107.
* PR #2145, Issue #2009: Fixes kwargs for jitclass ``__init__`` method.
* PR #2150: Fix slowdown in objmode fallback.
* PR #2050, Issue #1259: Fix liveness problem with some generator loops.
* PR #2072, Issue #1995: Right shift of unsigned LHS should be logical.
* PR #2115, Issue #1466: Fix inspect_types() error due to mangled variable name.
* PR #2119, Issue #2118: Fix array type created from record-dtype.
* PR #2122, Issue #1808: Fix returning a generator due to datamodel error.
Version 0.28.1
--------------
This is a bug-fix release to resolve packaging issues with setuptools
dependency.
Version 0.28.0
--------------
Amongst other improvements, this version improves again the level of
support for linear algebra -- functions from the :mod:`numpy.linalg`
module. Also, our random generator is now guaranteed to be thread-safe
and fork-safe.
Improvements:
* PR #2019: Add the ``@intrinsic`` decorator to define low-level
subroutines callable from JIT functions (this is considered
a private API for now).
* PR #2059: Implement ``np.concatenate`` and ``np.stack``.
* PR #2048: Make random generation fork-safe and thread-safe, producing
independent streams of random numbers for each thread or process.
* PR #2031: Add documentation of floating-point pitfalls.
* Issue #2053: Avoid polling in parallel CPU target (fixes severe performance
regression on Windows).
* Issue #2029: Make default arguments fast.
* PR #2052: Add logging to the CUDA driver.
* PR #2049: Implement the built-in ``divmod()`` function.
* PR #2036: Implement the ``argsort()`` method on arrays.
* PR #2046: Improving CUDA memory management by deferring deallocations
until certain thresholds are reached, so as to avoid breaking asynchronous
execution.
* PR #2040: Switch the CUDA driver implementation to use CUDA's
"primary context" API.
* PR #2017: Allow ``min(tuple)`` and ``max(tuple)``.
* PR #2039: Reduce fork() detection overhead in CUDA.
* PR #2021: Handle structured dtypes with titles.
* PR #1996: Rewrite looplifting as a transformation on Numba IR.
* PR #2014: Implement ``np.linalg.matrix_rank``.
* PR #2012: Implement ``np.linalg.cond``.
* PR #1985: Rewrite even trivial array expressions, which opens the door
for other optimizations (for example, ``array ** 2`` can be converted
into ``array * array``).
* PR #1950: Have ``typeof()`` always raise ValueError on failure.
Previously, it would either raise or return None, depending on the input.
* PR #1994: Implement ``np.linalg.norm``.
* PR #1987: Implement ``np.linalg.det`` and ``np.linalg.slogdet``.
* Issue #1979: Document integer width inference and how to workaround.
* PR #1938: Numba is now compatible with LLVM 3.8.
* PR #1967: Restrict ``np.linalg`` functions to homogenous dtypes. Users
wanting to pass mixed-typed inputs have to convert explicitly, which
makes the performance implications more obvious.
Fixes:
* PR #2006: ``array(float32) ** int`` should return ``array(float32)``.
* PR #2044: Allow reshaping empty arrays.
* Issue #2051: Fix refcounting issue when concatenating tuples.
* Issue #2000: Make Numpy optional for setup.py, to allow ``pip install``
to work without Numpy pre-installed.
* PR #1989: Fix assertion in ``Dispatcher.disable_compile()``.
* Issue #2028: Ignore filesystem errors when caching from multiple processes.
* Issue #2003: Allow unicode variable and function names (on Python 3).
* Issue #1998: Fix deadlock in parallel ufuncs that reacquire the GIL.
* PR #1997: Fix random crashes when AOT compiling on certain Windows platforms.
* Issue #1988: Propagate jitclass docstring.
* Issue #1933: Ensure array constants are emitted with the right alignment.
Version 0.27.0
--------------
Improvements:
* Issue #1976: improve error message when non-integral dimensions are given
to a CUDA kernel.
* PR #1970: Optimize the power operator with a static exponent.
* PR #1710: Improve contextual information for compiler errors.
* PR #1961: Support printing constant strings.
* PR #1959: Support more types in the print() function.
* PR #1823: Support ``compute_50`` in CUDA backend.
* PR #1955: Support ``np.linalg.pinv``.
* PR #1896: Improve the ``SmartArray`` API.
* PR #1947: Support ``np.linalg.solve``.
* Issue #1943: Improve error message when an argument fails typing.4
* PR #1927: Support ``np.linalg.lstsq``.
* PR #1934: Use system functions for hypot() where possible, instead of our
own implementation.
* PR #1929: Add cffi support to ``@cfunc`` objects.
* PR #1932: Add user-controllable thread pool limits for parallel CPU target.
* PR #1928: Support self-recursion when the signature is explicit.
* PR #1890: List all lowering implementations in the developer docs.
* Issue #1884: Support ``np.lib.stride_tricks.as_strided()``.
Fixes:
* Issue #1960: Fix sliced assignment when source and destination areas are
overlapping.
* PR #1963: Make CUDA print() atomic.
* PR #1956: Allow 0d array constants.
* Issue #1945: Allow using Numpy ufuncs in AOT compiled code.
* Issue #1916: Fix documentation example for ``@generated_jit``.
* Issue #1926: Fix regression when caching functions in an IPython session.
* Issue #1923: Allow non-intp integer arguments to carray() and farray().
* Issue #1908: Accept non-ASCII unicode docstrings on Python 2.
* Issue #1874: Allow ``del container[key]`` in object mode.
* Issue #1913: Fix set insertion bug when the lookup chain contains deleted
entries.
* Issue #1911: Allow function annotations on jitclass methods.
Version 0.26.0
--------------
This release adds support for ``cfunc`` decorator for exporting numba jitted
functions to 3rd party API that takes C callbacks. Most of the overhead of
using jitclasses inside the interpreter are eliminated. Support for
decompositions in ``numpy.linalg`` are added. Finally, Numpy 1.11 is
supported.
Improvements:
* PR #1889: Export BLAS and LAPACK wrappers for pycc.
* PR #1888: Faster array power.
* Issue #1867: Allow "out" keyword arg for dufuncs.
* PR #1871: ``carray()`` and ``farray()`` for creating arrays from pointers.
* PR #1855: ``@cfunc`` decorator for exporting as ctypes function.
* PR #1862: Add support for ``numpy.linalg.qr``.
* PR #1851: jitclass support for '_' and '__' prefixed attributes.
* PR #1842: Optimize jitclass in Python interpreter.
* Issue #1837: Fix CUDA simulator issues with device function.
* PR #1839: Add support for decompositions from ``numpy.linalg``.
* PR #1829: Support Python enums.
* PR #1828: Add support for ``numpy.random.rand()``` and
``numpy.random.randn()``
* Issue #1825: Use of 0-darray in place of scalar index.
* Issue #1824: Scalar arguments to object mode gufuncs.
* Issue #1813: Let bitwise bool operators return booleans, not integers.
* Issue #1760: Optional arguments in generators.
* PR #1780: Numpy 1.11 support.
Version 0.25.0
--------------
This release adds support for ``set`` objects in nopython mode. It also
adds support for many missing Numpy features and functions. It improves
Numba's compatibility and performance when using a distributed execution
framework such as dask, distributed or Spark. Finally, it removes
compatibility with Python 2.6, Python 3.3 and Numpy 1.6.
Improvements:
* Issue #1800: Add erf(), erfc(), gamma() and lgamma() to CUDA targets.
* PR #1793: Implement more Numpy functions: np.bincount(), np.diff(),
np.digitize(), np.histogram(), np.searchsorted() as well as NaN-aware
reduction functions (np.nansum(), np.nanmedian(), etc.)
* PR #1789: Optimize some reduction functions such as np.sum(), np.prod(),
np.median(), etc.
* PR #1752: Make CUDA features work in dask, distributed and Spark.
* PR #1787: Support np.nditer() for fast multi-array indexing with
broadcasting.
* PR #1799: Report JIT-compiled functions as regular Python functions
when profiling (allowing to see the filename and line number where a
function is defined).
* PR #1782: Support np.any() and np.all().
* Issue #1788: Support the iter() and next() built-in functions.
* PR #1778: Support array.astype().
* Issue #1775: Allow the user to set the target CPU model for AOT compilation.
* PR #1758: Support creating random arrays using the ``size`` parameter
to the np.random APIs.
* PR #1757: Support len() on array.flat objects.
* PR #1749: Remove Numpy 1.6 compatibility.
* PR #1748: Remove Python 2.6 and 3.3 compatibility.
* PR #1735: Support the ``not in`` operator as well as operator.contains().
* PR #1724: Support homogenous sets in nopython mode.
* Issue #875: make compilation of array constants faster.
Fixes:
* PR #1795: Fix a massive performance issue when calling Numba functions
with distributed, Spark or a similar mechanism using serialization.
* Issue #1784: Make jitclasses usable with NUMBA_DISABLE_JIT=1.
* Issue #1786: Allow using linear algebra functions when profiling.
* Issue #1796: Fix np.dot() memory leak on non-contiguous inputs.
* PR #1792: Fix static negative indexing of tuples.
* Issue #1771: Use fallback cache directory when __pycache__ isn't writable,
such as when user code is installed in a system location.
* Issue #1223: Use Numpy error model in array expressions (e.g. division
by zero returns ``inf`` or ``nan`` instead of raising an error).
* Issue #1640: Fix np.random.binomial() for large n values.
* Issue #1643: Improve error reporting when passing an invalid spec to
``@jitclass``.
* PR #1756: Fix slicing with a negative step and an omitted start.
Version 0.24.0
--------------
This release introduces several major changes, including the ``@generated_jit``
decorator for flexible specializations as with Julia's "``@generated``" macro,
or the SmartArray array wrapper type that allows seamless transfer of array
data between the CPU and the GPU.
This will be the last version to support Python 2.6, Python 3.3 and Numpy 1.6.
Improvements:
* PR #1723: Improve compatibility of JIT functions with the Python profiler.
* PR #1509: Support array.ravel() and array.flatten().
* PR #1676: Add SmartArray type to support transparent data management in
multiple address spaces (host & GPU).
* PR #1689: Reduce startup overhead of importing Numba.
* PR #1705: Support registration of CFFI types as corresponding to known
Numba types.
* PR #1686: Document the extension API.
* PR #1698: Improve warnings raised during type inference.
* PR #1697: Support np.dot() and friends on non-contiguous arrays.
* PR #1692: cffi.from_buffer() improvements (allow more pointer types,
allow non-Numpy buffer objects).
* PR #1648: Add the ``@generated_jit`` decorator.
* PR #1651: Implementation of np.linalg.inv using LAPACK. Thanks to
Matthieu Dartiailh.
* PR #1674: Support np.diag().
* PR #1673: Improve error message when looking up an attribute on an
unknown global.
* Issue #1569: Implement runtime check for the LLVM locale bug.
* PR #1612: Switch to LLVM 3.7 in sync with llvmlite.
* PR #1624: Allow slice assignment of sequence to array.
* PR #1622: Support slicing tuples with a constant slice.
Fixes:
* Issue #1722: Fix returning an optional boolean (bool or None).
* Issue #1734: NRT decref bug when variable is del'ed before being defined,
leading to a possible memory leak.
* PR #1732: Fix tuple getitem regression for CUDA target.
* PR #1718: Mishandling of optional to optional casting.
* PR #1714: Fix .compile() on a JIT function not respecting ._can_compile.
* Issue #1667: Fix np.angle() on arrays.
* Issue #1690: Fix slicing with an omitted stop and a negative step value.
* PR #1693: Fix gufunc bug in handling scalar formal arg with non-scalar
input value.
* PR #1683: Fix parallel testing under Windows.
* Issue #1616: Use system-provided versions of C99 math where possible.
* Issue #1652: Reductions of bool arrays (e.g. sum() or mean()) should
return integers or floats, not bools.
* Issue #1664: Fix regression when indexing a record array with a constant
index.
* PR #1661: Disable AVX on old Linux kernels.
* Issue #1636: Allow raising an exception looked up on a module.
Version 0.23.1
--------------
This is a bug-fix release to address several regressions introduced
in the 0.23.0 release, and a couple other issues.
Fixes:
* Issue #1645: CUDA ufuncs were broken in 0.23.0.
* Issue #1638: Check tuple sizes when passing a list of tuples.
* Issue #1630: Parallel ufunc would keep eating CPU even after finishing
under Windows.
* Issue #1628: Fix ctypes and cffi tests under Windows with Python 3.5.
* Issue #1627: Fix xrange() support.
* PR #1611: Rewrite variable liveness analysis.
* Issue #1610: Allow nested calls between explicitly-typed ufuncs.
* Issue #1593: Fix `*args` in object mode.
Version 0.23.0
--------------
This release introduces JIT classes using the new ``@jitclass`` decorator,
allowing user-defined structures for nopython mode. Other improvements
and bug fixes are listed below.
Improvements:
* PR #1609: Speed up some simple math functions by inlining them
in their caller
* PR #1571: Implement JIT classes
* PR #1584: Improve typing of array indexing
* PR #1583: Allow printing booleans
* PR #1542: Allow negative values in np.reshape()
* PR #1560: Support vector and matrix dot product, including ``np.dot()``
and the ``@`` operator in Python 3.5
* PR #1546: Support field lookup on record arrays and scalars (i.e.
``array['field']`` in addition to ``array.field``)
* PR #1440: Support the HSA wavebarrier() and activelanepermute_wavewidth()
intrinsics
* PR #1540: Support np.angle()
* PR #1543: Implement CPU multithreaded gufuncs (target="parallel")
* PR #1551: Allow scalar arguments in np.where(), np.empty_like().
* PR #1516: Add some more examples from NumbaPro
* PR #1517: Support np.sinc()
Fixes:
* Issue #1603: Fix calling a non-cached function from a cached function
* Issue #1594: Ensure a list is homogenous when unboxing
* Issue #1595: Replace deprecated use of get_pointer_to_function()
* Issue #1586: Allow tests to be run by different users on the same machine
* Issue #1587: Make CudaAPIError picklable
* Issue #1568: Fix using Numba from inside Visual Studio 2015
* Issue #1559: Fix serializing a jit function referring a renamed module
* PR #1508: Let reshape() accept integer argument(s), not just a tuple
* Issue #1545: Improve error checking when unboxing list objects
* Issue #1538: Fix array broadcasting in CUDA gufuncs
* Issue #1526: Fix a reference count handling bug
Version 0.22.1
--------------
This is a bug-fix release to resolve some packaging issues and other
problems found in the 0.22.0 release.
Fixes:
* PR #1515: Include MANIFEST.in in MANIFEST.in so that sdist still works from
source tar files.
* PR #1518: Fix reference counting bug caused by hidden alias
* PR #1519: Fix erroneous assert when passing nopython=True to guvectorize.
* PR #1521: Fix cuda.test()
Version 0.22.0
--------------
This release features several highlights: Python 3.5 support, Numpy 1.10
support, Ahead-of-Time compilation of extension modules, additional
vectorization features that were previously only available with the
proprietary extension NumbaPro, improvements in array indexing.
Improvements:
* PR #1497: Allow scalar input type instead of size-1 array to @guvectorize
* PR #1480: Add distutils support for AOT compilation
* PR #1460: Create a new API for Ahead-of-Time (AOT) compilation
* PR #1451: Allow passing Python lists to JIT-compiled functions, and
reflect mutations on function return
* PR #1387: Numpy 1.10 support
* PR #1464: Support cffi.FFI.from_buffer()
* PR #1437: Propagate errors raised from Numba-compiled ufuncs; also,
let "division by zero" and other math errors produce a warning instead
of exiting the function early
* PR #1445: Support a subset of fancy indexing
* PR #1454: Support "out-of-line" CFFI modules
* PR #1442: Improve array indexing to support more kinds of basic slicing
* PR #1409: Support explicit CUDA memory fences
* PR #1435: Add support for vectorize() and guvectorize() with HSA
* PR #1432: Implement numpy.nonzero() and numpy.where()
* PR #1416: Add support for vectorize() and guvectorize() with CUDA,
as originally provided in NumbaPro
* PR #1424: Support in-place array operators
* PR #1414: Python 3.5 support
* PR #1404: Add the parallel ufunc functionality originally provided in
NumbaPro
* PR #1393: Implement sorting on arrays and lists
* PR #1415: Add functions to estimate the occupancy of a CUDA kernel
* PR #1360: The JIT cache now stores the compiled object code, yielding
even larger speedups.
* PR #1402: Fixes for the ARMv7 (armv7l) architecture under Linux
* PR #1400: Add the cuda.reduce() decorator originally provided in NumbaPro
Fixes:
* PR #1483: Allow np.empty_like() and friends on non-contiguous arrays
* Issue #1471: Allow caching JIT functions defined in IPython
* PR #1457: Fix flat indexing of boolean arrays
* PR #1421: Allow calling Numpy ufuncs, without an explicit output, on
non-contiguous arrays
* Issue #1411: Fix crash when unpacking a tuple containing a Numba-allocated array
* Issue #1394: Allow unifying range_state32 and range_state64
* Issue #1373: Fix code generation error on lists of bools
Version 0.21.0
--------------
This release introduces support for AMD's Heterogeneous System Architecture,
which allows memory to be shared directly between the CPU and the GPU.
Other major enhancements are support for lists and the introduction of
an opt-in compilation cache.
Improvements:
* PR #1391: Implement print() for CUDA code
* PR #1366: Implement integer typing enhancement proposal (NBEP 1)
* PR #1380: Support the one-argument type() builtin
* PR #1375: Allow boolean evaluation of lists and tuples
* PR #1371: Support array.view() in CUDA mode
* PR #1369: Support named tuples in nopython mode
* PR #1250: Implement numpy.median().
* PR #1289: Make dispatching faster when calling a JIT-compiled function
from regular Python
* Issue #1226: Improve performance of integer power
* PR #1321: Document features supported with CUDA
* PR #1345: HSA support
* PR #1343: Support lists in nopython mode
* PR #1356: Make Numba-allocated memory visible to tracemalloc
* PR #1363: Add an environment variable NUMBA_DEBUG_TYPEINFER
* PR #1051: Add an opt-in, per-function compilation cache
Fixes:
* Issue #1372: Some array expressions would fail rewriting when involved
the same variable more than once, or a unary operator
* Issue #1385: Allow CUDA local arrays to be declared anywhere in a function
* Issue #1285: Support datetime64 and timedelta64 in Numpy reduction functions
* Issue #1332: Handle the EXTENDED_ARG opcode.
* PR #1329: Handle the ``in`` operator in object mode
* Issue #1322: Fix augmented slice assignment on Python 2
* PR #1357: Fix slicing with some negative bounds or step values.
Version 0.20.0
--------------
This release updates Numba to use LLVM 3.6 and CUDA 7 for CUDA support.
Following the platform deprecation in CUDA 7, Numba's CUDA feature is no
longer supported on 32-bit platforms. The oldest supported version of
Windows is Windows 7.
Improvements:
* Issue #1203: Support indexing ndarray.flat
* PR #1200: Migrate cgutils to llvmlite
* PR #1190: Support more array methods: .transpose(), .T, .copy(), .reshape(), .view()
* PR #1214: Simplify setup.py and avoid manual maintenance
* PR #1217: Support datetime64 and timedelta64 constants
* PR #1236: Reload environment variables when compiling
* PR #1225: Various speed improvements in generated code
* PR #1252: Support cmath module in CUDA
* PR #1238: Use 32-byte aligned allocator to optimize for AVX
* PR #1258: Support numpy.frombuffer()
* PR #1274: Use TravisCI container infrastructure for lower wait time
* PR #1279: Micro-optimize overload resolution in call dispatch
* Issue #1248: Improve error message when return type unification fails
Fixes:
* Issue #1131: Handling of negative zeros in np.conjugate() and np.arccos()
* Issue #1188: Fix slow array return
* Issue #1164: Avoid warnings from CUDA context at shutdown
* Issue #1229: Respect the writeable flag in arrays
* Issue #1244: Fix bug in refcount pruning pass
* Issue #1251: Fix partial left-indexing of Fortran contiguous array
* Issue #1264: Fix compilation error in array expression
* Issue #1254: Fix error when yielding array objects
* Issue #1276: Fix nested generator use
Version 0.19.2
--------------
This release fixes the source distribution on pypi. The only change is in the
setup.py file. We do not plan to provide a conda package as this release is
essentially the same as 0.19.1 for conda users.
Version 0.19.1
--------------
* Issue #1196:
* fix double-free segfault due to redundant variable deletion in the
Numba IR (#1195)
* fix use-after-delete in array expression rewrite pass
Version 0.19.0
--------------
This version introduces memory management in the Numba runtime, allowing to
allocate new arrays inside Numba-compiled functions. There is also a rework
of the ufunc infrastructure, and an optimization pass to collapse cascading
array operations into a single efficient loop.
.. warning::
Support for Windows XP and Vista with all compiler targets and support
for 32-bit platforms (Win/Mac/Linux) with the CUDA compiler target are
deprecated. In the next release of Numba, the oldest version of Windows
supported will be Windows 7. CPU compilation will remain supported
on 32-bit Linux and Windows platforms.
Known issues:
* There are some performance regressions in very short running ``nopython``
functions due to the additional overhead incurred by memory management.
We will work to reduce this overhead in future releases.
Features:
* Issue #1181: Add a Frequently Asked Questions section to the documentation.
* Issue #1162: Support the ``cumsum()`` and ``cumprod()`` methods on Numpy
arrays.
* Issue #1152: Support the ``*args`` argument-passing style.
* Issue #1147: Allow passing character sequences as arguments to
JIT-compiled functions.
* Issue #1110: Shortcut deforestation and loop fusion for array expressions.
* Issue #1136: Support various Numpy array constructors, for example
numpy.zeros() and numpy.zeros_like().
* Issue #1127: Add a CUDA simulator running on the CPU, enabled with the
NUMBA_ENABLE_CUDASIM environment variable.
* Issue #1086: Allow calling standard Numpy ufuncs without an explicit
output array from ``nopython`` functions.
* Issue #1113: Support keyword arguments when calling numpy.empty()
and related functions.
* Issue #1108: Support the ``ctypes.data`` attribute of Numpy arrays.
* Issue #1077: Memory management for array allocations in ``nopython`` mode.
* Issue #1105: Support calling a ctypes function that takes ctypes.py_object
parameters.
* Issue #1084: Environment variable NUMBA_DISABLE_JIT disables compilation
of ``@jit`` functions, instead calling into the Python interpreter
when called. This allows easier debugging of multiple jitted functions.
* Issue #927: Allow gufuncs with no output array.
* Issue #1097: Support comparisons between tuples.
* Issue #1075: Numba-generated ufuncs can now be called from ``nopython``
functions.
* Issue #1062: ``@vectorize`` now allows omitting the signatures, and will
compile the required specializations on the fly (like ``@jit`` does).
* Issue #1027: Support numpy.round().
* Issue #1085: Allow returning a character sequence (as fetched from a
structured array) from a JIT-compiled function.
Fixes:
* Issue #1170: Ensure ``ndindex()``, ``ndenumerate()`` and ``ndarray.flat``
work properly inside generators.
* Issue #1151: Disallow unpacking of tuples with the wrong size.
* Issue #1141: Specify install dependencies in setup.py.
* Issue #1106: Loop-lifting would fail when the lifted loop does not
produce any output values for the function tail.
* Issue #1103: Fix mishandling of some inputs when a JIT-compiled function
is called with multiple array layouts.
* Issue #1089: Fix range() with large unsigned integers.
* Issue #1088: Install entry-point scripts (numba, pycc) from the conda
build recipe.
* Issue #1081: Constant structured scalars now work properly.
* Issue #1080: Fix automatic promotion of booleans to integers.
Version 0.18.2
--------------
Bug fixes:
* Issue #1073: Fixes missing template file for HTML annotation
* Issue #1074: Fixes CUDA support on Windows machine due to NVVM API mismatch
Version 0.18.1
--------------
Version 0.18.0 is not officially released.
This version removes the old deprecated and undocumented ``argtypes`` and
``restype`` arguments to the ``@jit`` decorator. Function signatures
should always be passed as the first argument to ``@jit``.
Features:
* Issue #960: Add inspect_llvm() and inspect_asm() methods to JIT-compiled
functions: they output the LLVM IR and the native assembler source of the
compiled function, respectively.
* Issue #990: Allow passing tuples as arguments to JIT-compiled functions
in ``nopython`` mode.
* Issue #774: Support two-argument round() in ``nopython`` mode.
* Issue #987: Support missing functions from the math module in nopython
mode: frexp(), ldexp(), gamma(), lgamma(), erf(), erfc().
* Issue #995: Improve code generation for round() on Python 3.
* Issue #981: Support functions from the random and numpy.random modules
in ``nopython`` mode.
* Issue #979: Add cuda.atomic.max().
* Issue #1006: Improve exception raising and reporting. It is now allowed
to raise an exception with an error message in ``nopython`` mode.
* Issue #821: Allow ctypes- and cffi-defined functions as arguments to
``nopython`` functions.
* Issue #901: Allow multiple explicit signatures with ``@jit``. The
signatures must be passed in a list, as with ``@vectorize``.
* Issue #884: Better error message when a JIT-compiled function is called
with the wrong types.
* Issue #1010: Simpler and faster CUDA argument marshalling thanks to a
refactoring of the data model.
* Issue #1018: Support arrays of scalars inside Numpy structured types.
* Issue #808: Reduce Numba import time by half.
* Issue #1021: Support the buffer protocol in ``nopython`` mode.
Buffer-providing objects, such as ``bytearray``, ``array.array`` or
``memoryview`` support array-like operations such as indexing and iterating.
Furthermore, some standard attributes on the ``memoryview`` object are
supported.
* Issue #1030: Support nested arrays in Numpy structured arrays.
* Issue #1033: Implement the inspect_types(), inspect_llvm() and inspect_asm()
methods for CUDA kernels.
* Issue #1029: Support Numpy structured arrays with CUDA as well.
* Issue #1034: Support for generators in nopython and object mode.
* Issue #1044: Support default argument values when calling Numba-compiled
functions.
* Issue #1048: Allow calling Numpy scalar constructors from CUDA functions.
* Issue #1047: Allow indexing a multi-dimensional array with a single integer,
to take a view.
* Issue #1050: Support len() on tuples.
* Issue #1011: Revive HTML annotation.
Fixes:
* Issue #977: Assignment optimization was too aggressive.
* Issue #561: One-argument round() now returns an int on Python 3.
* Issue #1001: Fix an unlikely bug where two closures with the same name
and id() would compile to the same LLVM function name, despite different
closure values.
* Issue #1006: Fix reference leak when a JIT-compiled function is disposed of.
* Issue #1017: Update instructions for CUDA in the README.
* Issue #1008: Generate shorter LLVM type names to avoid segfaults with CUDA.
* Issue #1005: Properly clean up references when raising an exception from
object mode.
* Issue #1041: Fix incompatibility between Numba and the third-party
library "future".
* Issue #1053: Fix the size attribute of CUDA shared arrays.
Version 0.17.0
--------------
The major focus in this release has been a rewrite of the documentation.
The new documentation is better structured and has more detailed coverage
of Numba features and APIs. It can be found online at
http://numba.pydata.org/numba-doc/dev/index.html
Features:
* Issue #895: LLVM can now inline nested function calls in ``nopython`` mode.
* Issue #863: CUDA kernels can now infer the types of their arguments
("autojit"-like).
* Issue #833: Support numpy.{min,max,argmin,argmax,sum,mean,var,std}
in ``nopython`` mode.
* Issue #905: Add a ``nogil`` argument to the ``@jit`` decorator, to
release the GIL in ``nopython`` mode.
* Issue #829: Add a ``identity`` argument to ``@vectorize`` and
``@guvectorize``, to set the identity value of the ufunc.
* Issue #843: Allow indexing 0-d arrays with the empty tuple.
* Issue #933: Allow named arguments, not only positional arguments, when
calling a Numba-compiled function.
* Issue #902: Support numpy.ndenumerate() in ``nopython`` mode.
* Issue #950: AVX is now enabled by default except on Sandy Bridge and
Ivy Bridge CPUs, where it can produce slower code than SSE.
* Issue #956: Support constant arrays of structured type.
* Issue #959: Indexing arrays with floating-point numbers isn't allowed
anymore.
* Issue #955: Add support for 3D CUDA grids and thread blocks.
* Issue #902: Support numpy.ndindex() in ``nopython`` mode.
* Issue #951: Numpy number types (``numpy.int8``, etc.) can be used as
constructors for type conversion in ``nopython`` mode.
Fixes:
* Issue #889: Fix ``NUMBA_DUMP_ASSEMBLY`` for the CUDA backend.
* Issue #903: Fix calling of stdcall functions with ctypes under Windows.
* Issue #908: Allow lazy-compiling from several threads at once.
* Issue #868: Wrong error message when multiplying a scalar by a non-scalar.
* Issue #917: Allow vectorizing with datetime64 and timedelta64 in the
signature (only with unit-less values, though, because of a Numpy limitation).
* Issue #431: Allow overloading of cuda device function.
* Issue #917: Print out errors occurred in object mode ufuncs.
* Issue #923: Numba-compiled ufuncs now inherit the name and doc of the
original Python function.
* Issue #928: Fix boolean return value in nested calls.
* Issue #915: ``@jit`` called with an explicit signature with a mismatching
type of arguments now raises an error.
* Issue #784: Fix the truth value of NaNs.
* Issue #953: Fix using shared memory in more than one function (kernel or
device).
* Issue #970: Fix an uncommon double to uint64 conversion bug on CentOS5
32-bit (C compiler issue).
Version 0.16.0
--------------
This release contains a major refactor to switch from llvmpy to `llvmlite <https://github.com/numba/llvmlite>`_
as our code generation backend. The switch is necessary to reconcile
different compiler requirements for LLVM 3.5 (needs C++11) and Python
extensions (need specific compiler versions on Windows). As a bonus, we have
found the use of llvmlite speeds up compilation by a factor of 2!
Other Major Changes:
* Faster dispatch for numpy structured arrays
* Optimized array.flat()
* Improved CPU feature selection
* Fix constant tuple regression in macro expansion code
Known Issues:
* AVX code generation is still disabled by default due to performance
regressions when operating on misaligned NumPy arrays. We hope to have a
workaround in the future.
* In *extremely* rare circumstances, a `known issue with LLVM 3.5 <http://llvm.org/bugs/show_bug.cgi?id=21423>`_
code generation can cause an ELF relocation error on 64-bit Linux systems.
Version 0.15.1
--------------
(This was a bug-fix release that superceded version 0.15 before it was
announced.)
Fixes:
* Workaround for missing __ftol2 on Windows XP.
* Do not lift loops for compilation that contain break statements.
* Fix a bug in loop-lifting when multiple values need to be returned to
the enclosing scope.
* Handle the loop-lifting case where an accumulator needs to be updated when
the loop count is zero.
Version 0.15
------------
Features:
* Support for the Python ``cmath`` module. (NumPy complex functions were
already supported.)
* Support for ``.real``, ``.imag``, and `.conjugate()`` on non-complex
numbers.
* Add support for ``math.isfinite()`` and ``math.copysign()``.
* Compatibility mode: If enabled (off by default), a failure to compile in
object mode will fall back to using the pure Python implementation of the
function.
* *Experimental* support for serializing JIT functions with cloudpickle.
* Loop-jitting in object mode now works with loops that modify scalars that
are accessed after the loop, such as accumulators.
* ``@vectorize`` functions can be compiled in object mode.
* Numba can now be built using the `Visual C++ Compiler for Python 2.7 <http://aka.ms/vcpython27>`_
on Windows platforms.
* CUDA JIT functions can be returned by factory functions with variables in
the closure frozen as constants.
* Support for "optional" types in nopython mode, which allow ``None`` to be a
valid value.
Fixes:
* If nopython mode compilation fails for any reason, automatically fall back
to object mode (unless nopython=True is passed to @jit) rather than raise
an exeception.
* Allow function objects to be returned from a function compiled in object
mode.
* Fix a linking problem that caused slower platform math functions (such as
``exp()``) to be used on Windows, leading to performance regressions against
NumPy.
* ``min()`` and ``max()`` no longer accept scalars arguments in nopython mode.
* Fix handling of ambigous type promotion among several compiled versions of a
JIT function. The dispatcher will now compile a new version to resolve the
problem. (issue #776)
* Fix float32 to uint64 casting bug on 32-bit Linux.
* Fix type inference to allow forced casting of return types.
* Allow the shape of a 1D ``cuda.shared.array`` and ``cuda.local.array`` to be
a one-element tuple.
* More correct handling of signed zeros.
* Add custom implementation of ``atan2()`` on Windows to handle special cases
properly.
* Eliminated race condition in the handling of the pagelocked staging area
used when transferring CUDA arrays.
* Fix non-deterministic type unification leading to varying performance.
(issue #797)
Version 0.14
------------
Features:
* Support for nearly all the Numpy math functions (including comparison,
logical, bitwise and some previously missing float functions) in nopython mode.
* The Numpy datetime64 and timedelta64 dtypes are supported in nopython mode
with Numpy 1.7 and later.
* Support for Numpy math functions on complex numbers in nopython mode.
* ndarray.sum() is supported in nopython mode.
* Better error messages when unsupported types are used in Numpy math functions.
* Set NUMBA_WARNINGS=1 in the environment to see which functions are compiled
in object mode vs. nopython mode.
* Add support for the two-argument pow() builtin function in nopython mode.
* New developer documentation describing how Numba works, and how to
add new types.
* Support for Numpy record arrays on the GPU. (Note: Improper alignment of dtype
fields will cause an exception to be raised.)
* Slices on GPU device arrays.
* GPU objects can be used as Python context managers to select the active
device in a block.
* GPU device arrays can be bound to a CUDA stream. All subsequent operations
(such as memory copies) will be queued on that stream instead of the default.
This can prevent unnecessary synchronization with other streams.
Fixes:
* Generation of AVX instructions has been disabled to avoid performance bugs
when calling external math functions that may use SSE instructions,
especially on OS X.
* JIT functions can be removed by the garbage collector when they are no
longer accessible.
* Various other reference counting fixes to prevent memory leaks.
* Fixed handling of exception when input argument is out of range.
* Prevent autojit functions from making unsafe numeric conversions when
called with different numeric types.
* Fix a compilation error when an unhashable global value is accessed.
* Gracefully handle failure to enable faulthandler in the IPython Notebook.
* Fix a bug that caused loop lifting to fail if the loop was inside an
``else`` block.
* Fixed a problem with selecting CUDA devices in multithreaded programs on
Linux.
* The ``pow()`` function (and ``**`` operation) applied to two integers now
returns an integer rather than a float.
* Numpy arrays using the object dtype no longer cause an exception in the
autojit.
* Attempts to write to a global array will cause compilation to fall back
to object mode, rather than attempt and fail at nopython mode.
* ``range()`` works with all negative arguments (ex: ``range(-10, -12, -1)``)
Version 0.13.4
--------------
Features:
* Setting and deleting attributes in object mode
* Added documentation of supported and currently unsupported numpy ufuncs
* Assignment to 1-D numpy array slices
* Closure variables and functions can be used in object mode
* All numeric global values in modules can be used as constants in JIT
compiled code
* Support for the start argument in enumerate()
* Inplace arithmetic operations (+=, -=, etc.)
* Direct iteration over a 1D numpy array (e.g. "for x in array: ...")
in nopython mode
Fixes:
* Support for NVIDIA compute capability 5.0 devices (such as the GTX 750)
* Vectorize no longer crashes/gives an error when bool\_ is used as return type
* Return the correct dictionary when globals() is used in JIT functions
* Fix crash bug when creating dictionary literals in object
* Report more informative error message on import if llvmpy is too old
* Temporarily disable pycc --header, which generates incorrect function
signatures.
Version 0.13.3
--------------
Features:
* Support for enumerate() and zip() in nopython mode
* Increased LLVM optimization of JIT functions to -O1, enabling automatic
vectorization of compiled code in some cases
* Iteration over tuples and unpacking of tuples in nopython mode
* Support for dict and set (Python >= 2.7) literals in object mode
Fixes:
* JIT functions have the same __name__ and __doc__ as the original function.
* Numerous improvements to better match the data types and behavior of Python
math functions in JIT compiled code on different platforms.
* Importing Numba will no longer throw an exception if the CUDA driver is
present, but cannot be initialized.
* guvectorize now properly supports functions with scalar arguments.
* CUDA driver is lazily initialized
Version 0.13.2
--------------
Features:
* @vectorize ufunc now can generate SIMD fast path for unit strided array
* Added cuda.gridsize
* Added preliminary exception handling (raise exception class)
Fixes:
* UNARY_POSITIVE
* Handling of closures and dynamically generated functions
* Global None value
Version 0.13.1
--------------
Features:
* Initial support for CUDA array slicing
Fixes:
* Indirectly fixes numbapro when the system has a incompatible CUDA driver
* Fix numba.cuda.detect
* Export numba.intp and numba.intc
Version 0.13
------------
Features:
* Opensourcing NumbaPro CUDA python support in `numba.cuda`
* Add support for ufunc array broadcasting
* Add support for mixed input types for ufuncs
* Add support for returning tuple from jitted function
Fixes:
* Fix store slice bytecode handling for Python2
* Fix inplace subtract
* Fix pycc so that correct header is emitted
* Allow vectorize to work on functions with jit decorator
Version 0.12.2
--------------
Fixes:
* Improved NumPy ufunc support in nopython mode
* Misc bug fixes
Version 0.12.1
--------------
This version fixed many regressions reported by user for the 0.12 release.
This release contains a new loop-lifting mechanism that specializes certains
loop patterns for nopython mode compilation. This avoid direct support
for heap-allocating and other very dynamic operations.
Improvements:
* Add loop-lifting--jit-ing loops in nopython for object mode code. This allows
functions to allocate NumPy arrays and use Python objects, while the tight
loops in the function can still be compiled in nopython mode. Any arrays that
the tight loop uses should be created before the loop is entered.
Fixes:
* Add support for majority of "math" module functions
* Fix for...else handling
* Add support for builtin round()
* Fix tenary if...else support
* Revive "numba" script
* Fix problems with some boolean expressions
* Add support for more NumPy ufuncs
Version 0.12
------------
Version 0.12 contains a big refactor of the compiler. The main objective for
this refactor was to simplify the code base to create a better foundation for
further work. A secondary objective was to improve the worst case performance
to ensure that compiled functions in object mode never run slower than pure
Python code (this was a problem in several cases with the old code base). This
refactor is still a work in progress and further testing is needed.
Main improvements:
* Major refactor of compiler for performance and maintenance reasons
* Better fallback to object mode when native mode fails