Release Release v0.8.0 · m4rs-mt/ILGPU

The new stable version offers significant performance and code quality improvements of the generated kernel programs.

Added support for on-the-fly specialization of kernels using dynamic partial evaluation.
Added support for dynamic shared memory (CPU & Cuda backends).
Added new KernelConfig structure to specify launch dimensions for explicitly grouped kernels.
Added new Index1 structure to avoid name clashes with new System.Index structure.
Added additional tuple conversion methods to Index2 and Index3 types.
Added new EntryPointDescription structure to specify an entry point and its index type.
Added RuntimeKernelConfig structure to combine static and dynamic information about a particular kernel launch.
Added support for linear arrays in local memory.
Added support for enum-value interop (#66).
Reworked explicitly grouped kernel launchers to use the new KernelConfig structure instead of GroupedIndex types.
Simplified static Grid and Group properties.
Removed all GroupedIndex types.
Updated the whole compilation pipeline to enable more aggressive optimizations.
Significantly improved performance of emitted PTX and OpenCL code by enabling more aggressive optimizations and clever code generation (#70).
Added Support for "unmanaged" C# structures in the scope of buffers and views.
Reworked PTX backend to support all API changes and to fix several critical code-generation issues. This also includes emission of PTX instructions that mimic the Cuda compiler (#68).
Reworked OpenCL backend to support all API changes and to fix several
critical code-generation issues (#67, #72, #73, #74, #78, #85, #88, #91, #92).
New debug information input module to support the latest PDB format updates.
Considerably improved error messages using debug information. (#86)
Reduced memory consumption during the compilation process.
Performance improvements of the internal compilation pipeline.
Improved performance of kernel launchers.
Extended CudaAPI to supported paged-lock host-memory allocation functions.
Extended ExchangeBuffer to use new page-locked memory allocation (if available).
Added new IR-rewriter API to perform more advanced IR transformations.
Adapted all existing transformations to use the new rewriter API.
Reduced memory consumption of all nodes by compressing information.
Redesigned several IR nodes to support global program transformations.
Reworked implementation of GetSubView in the context of generic and multidimensional array views (#19).
Fixed several issues in the scope of address-space inference.
Fixed critical code generation issues that could occur when replacing values.

Special thanks to @MoFtZ for contributing to this release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v0.8.0