Diplomarbeit:
Implementing a Generic Auto-Tuning Framework for OpenCL
Thema
This thesis will propose an application- and code-generic auto-tuning framework for OpenCL programs. This enables the programmer to mark up any variable within any OpenCL code as tuning parameter with pragmas. The auto-tuner will process OpenCL host and kernel code that has been marked up by the user with pragmas declaring tuning parameters in the code. These can simply be values of program code variables, but the user will also be able to declare variable types as tuning parameter or even different memory access functions.
Moreover, this auto-tuning framework will enable the user to incorporate knowledge about the OpenCL code and the systems, the application is auto-tuned on. For example, limiting the local size to a multiple of 32 is advantageous on NVIDIA- GPUs; many programmers are aware of this. The framework will offer the ability to apply value constraints for each tuning parameter in the shape of boolean C++ expressions, e.g., local size % 32 == 0, to guarantee that the framework only considers local sizes which are a multiple of 32. Value constraints are a concise technique to reduce the search space and thus to accelerate the tuning process. Further, the framework will enable the programmer to arrange tuning parameters in different order groups which are tuned in a user-defined order; this contributes again to search space reduction. For example, program code for CPUs should first be tuned for the SIMD parallelism and afterwards for parallelism on thread level. Order groups enable the programmer to indicate that variable types (in order to utilize SIMD hardware) can be tuned independently and before tuning the global size (representing the number of started threads) by assigning the tuning parameter ”variable type” to a lower group as ”global size”.
Umfang
Diplomarbeit (6 Monate Bearbeitungszeit).
Student
Markus Damerau

