Research Foci
Algorithmic skeletons
SkelCL – A Skeleton Library for Heterogeneous Systems
http://skelcl.uni-muenster.de
GPU computing
Parallel computing
Dr. Michel Steuwer

CV
Academic Education
- Ph.D. studies in computer science
- Computer science graduate program (Diploma degree)
Positions
- Research Associate at the University of Edinburgh
- Visiting researcher at the University of Edinburgh
- Research associate at the University of Münster
- Visiting researcher at the University of Edinburgh
- Visiting researcher at the University of Edinburgh
- Visiting researcher at the University of Edinburgh
- Student assistant at the University of Münster
Publications
Research Articles in Edited Proceedings (Conferences)
- Haidl, M, Steuwer, M, Dirks, H, Humernbrum, T, and Gorlatch, S. . “Towards Composable GPU Programming: Programming GPUs with Eager Actions and Lazy Views.” in Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, edited by Q Chen and Z Huang. New York, NY: ACM Press. doi: 10.1145/3026937.3026942.
Research Article (Book Contributions)
- Kessler, Christoph, Gorlatch, Sergei, Enmyren, Johan, Dastgeer, Usman, Steuwer, Michel, and Kegel, Philipp. . “Skeleton Programming for Portable Many-Core Computing.” in Programming Multicore and Many-core Computing Systems, edited by Sabri Pllana and Fatos Xhafa. John Wiley & Sons.
- Haidl, M, Steuwer, M, Humernbrum, T, and Gorlatch, S. . “Multi-Stage Programming for GPUs in Modern C++ using PACXX.” contribution to the The 9th Annual Workshop on General Purpose Processing Using Graphics Processing Unit, GPGPU '16, Barcelona, Spain New York, NY, USA: ACM Press. doi: 10.1145/2884045.2884049.
- Steuwer, Michel. . “Improving Programmability and Performance Portability on Many-Core Processors.” Dissertation thesis, University of Münster.
Research Articles (Journals)
- Olejnik, Michael, Steuwer, Michel, Dybowski, J.Nikolaj, Gorlatch, Sergei, and Heider, Dominik. . “gCUP: Rapid GPU-based HIV-1 Coreceptor Usage Prediction for Next-Generation Sequencing.” Bioinformatics 30 (22): 3272–3273. doi: 10.1093/bioinformatics/btu535.
- Steuwer, Michel Haidl Michael, and Breuer, Stefan Gorlatch Sergei. . “High-Level Programming of Stencil Computations on Multi-GPU Systems using the SkelCL Library.” Parallel Processing Letters 24 (03): 1441005. doi: 10.1142/S0129626414410059.
- Steuwer, Michel Gorlatch Sergei. . “SkelCL: a high-level extension of OpenCL for multi-GPU systems.” The Journal of Supercomputing 69 (1): 25–33. doi: 10.1007/s11227-014-1213-y.
- Steuwer, Michel Friese Malte, and Albers, Sebastian Gorlatch Sergei. . “Introducing and Implementing the Allpairs Skeleton for Programming Multi-GPU Systems.” International Journal of Parallel Programming 42 (4): 601–618. doi: 10.1007/s10766-013-0265-6.
Research Articles in Edited Proceedings (Conferences)
- Fumero, Juan Jose, Steuwer, Michel, and Christophe, Dubach. . “A Composable Array Function Interface for Heterogeneous Computing in Java.” in Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming New York, NY, USA: ACM Press. doi: 10.1145/2627373.2627381.
- Gorlatch, Sergei Steuwer Michel. . “Towards High-Level Programming for Systems with Many Cores.” in Perspectives of Systems Informatics - 9th International Andrei Ershov Memorial Conference, PSI 2014, Lecture Notes in Computer Schience, edited by Alexander Marchuk and Andrey Terekhov. Springer.
- Breuer, Stefan, Steuwer, Michel, and Gorlatch, Sergei. . “Extending the SkelCL Skeleton Library for Stencil Computations on Multi-GPU Systems.” in Proceedings of the 1st International Workshop on High-Performance Stencil Computations, edited by A Größlinger and H Köstler. Wien: International Workshop on High-Performance Stencil Computations.
Research Articles (Journals)
- Kegel, Philipp, Steuwer, Michel, and Gorlatch, Sergei. . “dOpenCL: Towards uniform programming of distributed heterogeneous multi-/many-core systems.” Journal of Parallel and Distributed Computing 73 (12): 1639–1648. doi: 10.1016/j.jpdc.2013.07.021.
- Steuwer, Michel Gorlatch Sergei. . “High-Level Programming for Medical Imaging on Multi-GPU Systems using the SkelCL Library.” Procedia Computer Science 18: 749–758. doi: 10.1016/j.procs.2013.05.239.
Research Articles in Edited Proceedings (Conferences)
- Steuwer, Michel Gorlatch Sergei. . “SkelCL: Enhancing OpenCL for High-Level Programming of Multi-GPU Systems.” in Parallel Computing Technologies - 12th International Conference (PaCT 2013), Vol. 7979 of Lecture Notes in Computer Science, edited by Malyshkin Victor. Springer. doi: 10.1007/978-3-642-39958-9_24.
Research Article (Book Contributions)
- Kegel, Philipp, Steuwer, Michel, and Gorlatch, Sergei. . “Uniform High-Level Programming of Many-Core and Multi-GPU Systems.” in Transition of HPC Towards Exascale Computing, Vol. 24 of Advances in Parallel Computing, edited by Erik D'Hollander, Jack Dongarra, Ian Foster, Lucio Grandinetti and Gerhard Joubert. IOS Press. doi: 10.3233/978-1-61499-324-7-159.
Research Articles in Edited Proceedings (Conferences)
- Kegel, Philipp, Steuwer, Michel, and Gorlatch, Sergei. . “dOpenCL: Towards a uniform programming approach for distributed heterogeneous multi-/many-core systems.” in Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012 Wiley-IEEE Computer Society Press. doi: 10.1109/IPDPSW.2012.16.
- Steuwer, Michel Gorlatch Sergei, and Buß, Matthias Breuer Stefan. . “Using the SkelCL Library for High-Level GPU Programming of 2D Applications.” in Euro-Par 2012: Parallel Processing Workshops - BDMC, CGWS, HeteroPar, HiBB, OMHI, Paraphrase, PROPER, Resilience, UCHPC, VHPC, Rhodes Islands, Greece, August 27-31, 2012. Revised Selected Papers, Vol. 7640 of Lecture Notes in Computer Science, edited by Ioannis Caragiannis, Michael Alexander, Rosa M. Badia, Mario Cannataro, Alexandru Costan, Marco Danelutto, Frederic Desprez, Bettina Krammer, Julio Sahuquillo, Stephen L. Scott and Josef Weidendorfer. Rhodes Islands, Greece: Springer. doi: 10.1007/978-3-642-36949-0_41.
- Steuwer, Michel, Kegel, Philipp, and Gorlatch, Sergei. . “A High-Level Programming Approach for Distributed Systems with Accelerators.” in New Trends in Software Methodologies, Tools and Techniques - Proceedings of the Eleventh SoMeT '12, edited by Hamido Fujita and Roberto Revetria. Amsterdam: IOS Press. doi: 10.3233/978-1-61499-125-0-430.
- Steuwer, Michel, Kegel, Philipp, and Gorlatch, Sergei. . “Towards High-Level Programming of Multi-GPU Systems Using the SkelCL Library.” in Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2012, edited by IEEE. Shanghai: Wiley-IEEE Press. doi: 10.1109/IPDPSW.2012.229.
Research Article (Book Contributions)
- Kessler, Christoph, Gorlatch, Sergei, Enmyren, Johan, Dastgeer, Usman, Steuwer, Michel, and Kegel, Philipp. . “Skeleton Programming for Portable Many-Core Computing.” in Programming Multi-core and Many-core Computing Systems, Parallel and Distributed Computing, edited by Sabri Pllana and Fatos Xhafa. Wiley-Blackwell.
- Steuwer, Michel, Kegel, Philipp, and Gorlatch, Sergei. . “SkelCL - A Portable Skeleton Library for High-Level GPU Programming.” in 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW) Wiley-IEEE Press. doi: 10.1109/IPDPS.2011.269.
- Steuwer, Michel, Kegel, Philipp, and Gorlatch, Sergei. . Angewandte Mathematik und Informatik, Vol. 04/10 - I, SkelCL - A Portable Multi-GPU Skeleton Library Münster: University Münster.
Doctoral AbstractThesis
Improving Programmability and Performance Portability on Many-Core Processors
- Supervisor
- Prof. Dr. Sergei Gorlatch
- Doctoral Subject
- Informatik
- Doctoral Degree
- Dr. rer. nat.
- Awarded by
- Department 10 – Mathematics and Computer Science
Computer processors have radically changed in the recent 20 years with multi- and many-core architectures emerging to address the in- creasing demand for performance and energy efficiency. Multi-core CPUs and Graphics Processing Units (GPUs) are currently widely programmed with low-level, ad-hoc, and unstructured programming models, like multi-threading or OpenCL/CUDA. Developing functionally correct applications using these approaches is challenging as they do not shield programmers from complex issues of parallelism, like deadlocks or non-determinism. Developing optimized parallel programs is an even more demanding task – even for experienced programmers. Optimizations are often applied ad-hoc and exploit specific hardware features making them non-portable.
In this thesis we address these two challenges of programmability and performance portability for modern parallel processors.
In the first part of the thesis, we present the SkelCL programming model which addresses the programmability challenge. SkelCL introduces three main high-level features which simplify GPU programming:
1) parallel container data types simplify the data management in GPU systems;
2) regular patterns of parallel programming (a. k. a., algorithmic skeletons) simplify the programming by expressing parallel computation in a structured way;
3) data distributions simplify the programming of multi-GPU systems by automatically managing data across all the GPUs in the system.
We present a C++ library im- plementation of our programming model and we demonstrate in an experimental evaluation that SkelCL greatly simplifies GPU programming without sacrificing performance.
In the second part of the thesis, we present a novel compilation technique which addresses the performance portability challenge. We introduce a novel set of high-level and low-level parallel patterns along with a set of rewrite rules which systematically express high-level algorithmic implementation choices as well as low-level, hardware- specific optimizations. By applying the rewrite rules pattern-based programs are transformed from a single portable high-level representation into hardware-specific low-level expressions from which efficient OpenCL code is generated. We formally prove the soundness of our approach by showing that the rewrite rules do not change the program semantics. Furthermore, we experimentally confirm that our novel compilation technique can transform a single portable expression into highly efficient code for three different parallel processors, thus, providing performance portability.Teaching
- Project seminar: Projektseminar: Design und Implementation einer High-Level API zur Programmierung heterogener Clustersysteme [100222]
(in cooperation with Sergei Gorlatch and )
- Seminar: Seminar High-Level-Programmierung paralleler und verteilter Rechnersysteme [100262]
(in cooperation with Sergei Gorlatch and ) - Project seminar: Projektseminar High-Level-Programmierung von Online-Spielen in Future Generation Networks [100277]
(in cooperation with Sergei Gorlatch, and Frank Glinka)
- V/Ü: Multi-core und GPU: Parallele Programmierung [104279]
(in cooperation with Sergei Gorlatch) - Lecture/Practical: Einführung in C/C++ [104283]
- V/Ü: Betriebssysteme [103810]
(in cooperation with Sergei Gorlatch) - Seminar: Seminar "Heterogene parallele Systeme" [103839]
(in cooperation with Sergei Gorlatch, Sebastian Albers and Philipp Kegel)
- V/Ü: Multi-core und GPU: Parallele Programmierung [102289]
(in cooperation with Sergei Gorlatch) - Project seminar: Projektseminar: High-Level Programmierung heterogener paralleler Systeme [102274]
(in cooperation with Sergei Gorlatch)
- V/Ü: Betriebssysteme [102018]
(in cooperation with Sergei Gorlatch) - Seminar: Seminar: Technische Aspekte des Cloud-Computings [102056]
(in cooperation with Sergei Gorlatch and Dominique Meiländer)
- V/Ü: Multi-core und GPU: Parallele Programmierung [102435]
(in cooperation with Sergei Gorlatch) - Project seminar: Projektseminar: Internet- und GPU-basiertes Cloud Computing [102454]
(in cooperation with Sergei Gorlatch, Dominique Meiländer and Philipp Kegel)
- Project seminar: Projektseminar: Design und Implementation einer High-Level API zur Programmierung heterogener Clustersysteme [100222]
Supervised Theses
Summer Semester 2014
- Bachelor Thesis: Evaluation of the Skeleton Library FastFlow
- Bachelor Thesis: A parallel Implementation of the T-CUP Software with the SkelCL Library
Winter Semester 2013/14
- Master Thesis: Development of a Divide & Conquer Skeleton for SkelCL
- Master Thesis: A GPU-based Classification Framework for HIV-Resistance Prediction
- Master Thesis: Extending the SkelCL Library with a Skeleton for Stencil Computations
- Bachelor Thesis: Autotuning of the Work-Group Size of OpenCL Programs
Summer Semester 2013
- Master Thesis: A Model for Predicting Work Distribution in Heterogeneous Systems and its Implementation in the SkelCL Library
- Bachelor Thesis: Implementation of the Needleman-Wunsch Algorithem and the Breath-First-Search with the SkelCL-Library
- Bachelor Thesis: Evaluation of the Skeleton Library SkePU
Winter Semester 2012/13
- Master Thesis: Extending the Skeleton Library SkelCL with a Skeleton for All-Pairs Computations
- Bachelor Thesis: Implementing the LU-Decomposition and the Mersenne-Twister with the SkelCL Library
- Bachelor Thesis: Performance Analysis of SkelCL using B+-Tree Traversal and 3D Jacobi Stencil
- Diploma Thesis: Simulation and Analysis of Twodimensional Turbulences on Parallel Computerarchitectures
Summer Semester 2012
- Diploma Thesis: Extending the SkelCL Library with Multidimensional Data Types
Summer Semester 2011
- Bachelor Thesis: Analyse the Usage of GPUs for Implementing Radixsort
- Bachelor Thesis: Extending the SkelCL Library with Iterators
- Bachelor Thesis: Improving the MapOverlap Skeleton in SkelCL
- Bachelor Thesis: Development of a Library for Manipulating Source Code of C-based Languages