-
Notifications
You must be signed in to change notification settings - Fork 36
/
CHANGELOG
153 lines (125 loc) · 6.04 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
Next version (work in progress)
- Global and local thread size dividers now perform a ceiled division by default
- Verbose mode now always prints the parameter configuration before compiling and running
- Added additional OpenCL information printing to screen and to JSON
Version 2.7.0
- CLTune now automatically ensures global size is a multiple of the local workgroup size
- Added GetBestResult() to the tuner's API to retrieve the best parameters programmatically
- Changed std::initalizer_list in the AddParameters API to std::vector
- Fixed a bug in the simulated annealing search method
Version 2.6.0
- Changed timing measurements to now also include the (varying) kernel launch overhead
- It is now possible to set OpenCL compiler options through the env variable CLTUNE_BUILD_OPTIONS
- Added support for compilation under Visual Studio 2013 (MSVC++ 12.0)
- Added an option to build a static version of the library
Version 2.5.0
- Updated to version 8.0 of the CLCudaAPI header
- Made it possible to configure the number of times each kernel is run (to average results)
Version 2.4.0
- Made it possible to run the unit-tests independently of the provided OpenCL kernel samples
- Added an option to compile in verbose mode for additional diagnostic messages (-DVERBOSE=ON)
- Now using version 6.0 of the CLCudaAPI header
- Fixed the RPATH settings on OSX
- Added Appveyor continuous integration and increased coverage of the Travis builds
Version 2.3.1
- Fixed a bug where an output buffer could not be used as input at the same time
- Fixed computing the validation error for half-precision fp16 data-types
Version 2.3.0
- Added support for 'short' and 'cl_half' data-types as kernel buffer and scalar arguments
- Fixed a bug where failed results would still show up in the tuning results
- Made MSVC link the run-time libraries statically
Version 2.2.0
- Added two new simpler samples of using the tuner (vector-add and convolution)
- Updated the general documentation
- Added API documentation
- Now using version 5.0 of the CLCudaAPI header
Version 2.1.0
- Added exports to be able to create a DLL on Windows (thanks to Marco Hutter)
- Added command-line OpenCL platform selection in the examples (thanks to William J Shipman)
Version 2.0.0
- Added support for machine learning models. These models can be trained on a small fraction of the
tuning configurations and can be used to predict the remainder. Two models are supported:
* Linear regression
* A 3-layer neural network
- Now using version 4.0 of the CLCudaAPI header (previously known as Claduc)
- Added experimental support for CUDA kernels
- Added support for MSVC (Visual Studio) 2015
- Using Catch instead of GTest for unit-testing
- Various minor fixes
Version 1.7.1
- Added additional device properties to JSON-output
Version 1.7.0
- Now using the Claduc C++11 interface to OpenCL
- Added a method to print all tuning results in JSON-format to file
Version 1.6.4
- Reduced the requirements from GCC 4.8.0 to 4.7.0
- Fixes various warnings on Clang
Version 1.6.3
- Reduced the requirements from GCC 4.9.0 to 4.8.0
- Minor updates to the CMake file
Version 1.6.2
- Fixed another exception-related bug
- Further improved reporting of failed runs
- Updated C++11 OpenCL API
Version 1.6.1
- Fixed a couple of issues related to exceptions
- Improved reporting of failed runs
Version 1.6.0
- Much cleaner API due to Pimpl idiom: only cltune.h header is now required
- Replaced Khronos' cl.hpp with a custom C++11 version tailored for CLTune
- Code clean-up / reorganisation
- Added an option to add fixed defines to reference kernels
- Added an option to load a kernel from string instead of from file
- Added support for size_t OpenCL buffers
Version 1.5.1
- Improved the GEMM example to support the Intel MIC (Xeon Phi) accelerators
- Updated compiler check and compiler flags
- Adds support for multiple OpenCL kernel files at once (e.g. when wanting to include a header file)
- Adds support for the std::complex data-types
- Fixed some compilation warnings regarding size_t conversions
- Updated the FindOpenCL.cmake file
Version 1.5.0
- OpenCL local work size and memory size constraints are now automatically handled
- Greatly improved the new 2D convolution example:
* Filter coefficients are now dynamic
* Added support for local memory padding
* In-lined the convolution header into the kernels and host code
* Fixed various bugs
- Moved the examples to separate subfolders
- Uses chrono timers as seed in favor of random device
- Bugfix for simulated annealing when 2 variables can only change together.
Version 1.4.1
- Added 2D convolution as an example
- Added command-line arguments to the GEMM search-method sample
- Fixed a CUDA 7 related bug in the GEMM kernel
- Fixed a logging bug in the PSO search technique
Version 1.4.0
- Added the particle swarm optimisation (PSO) search technique
- Updated the example GEMM kernel
Version 1.3.2
- Now prints OpenCL version when running on a device
- Added install targets to CMake
- Moved header files around and renamed the main include to "cltune.h"
- Catches OpenCL exceptions and skips those configurations
Version 1.3.1
- Fixed simulated annealing's random number generation
- Added new FindOpenCL CMake script
- Added option to print database-formatted output of best results
Version 1.3.0
- Allow users to select a search strategy through the API
- Added support for the simulated annealing search method
- Added a sample using simulated annealing
- Added an option to output the search process to file
Version 1.2.0
- Added the interface to customize search algorithms
- Initially added full-search and random-search as basic search algorithms
Version 1.1.0
- User-defined parameter constraints are now fully customizable by accepting arbitrary functions on
an arbitrary combination of parameters.
- Re-factored the code to use more C++11 features: auto, smart pointers, constexpr, class enums, ...
Version 1.0.1
- Replaced one more occurrence of a pointer with an std::shared_ptr
- Re-added OpenCL class constructor exception test
- Updated license information
Version 1.0.0
- Initial release to GitHub