SPP 2363


Artificial intelligence is indisputably among the fastest developing and most demanded topics of our time. This technology makes everyday life easier and changes society as well as the workplace. While IT companies, and academic groups from the fields of computer science and mathematics rapidly adopted the new field, natural sciences such as biochemistry or chemistry only now begin to gradually explore the potential of machine learning (ML) methods.

Our goal is to develop and apply modern ML algorithms in their entire range to molecular problems. While current approaches already help, for example, to determine molecular properties and to screen molecules virtually, future molecular machine learning should use generative models to suggest molecules with specific properties and activities, develop and optimize reactions independently, and evaluate and interpret analytical data within seconds. The first step is the design of molecular representations that increase the understanding of ML and enable robust and comparable applications. In clever combination with state-of-the-art machine learning algorithms, problems such as small data sets, highly complex questions and large experimental errors can be overcome and previously unknown molecular relationships can be found. Ultimately, applications that are highly valuable in everyday laboratory work should be converted in easy-to-use software suites and experimental scientists should be trained on them. Thus, this priority program will help to modernize an entire subject area. To achieve this, it is necessary to unite existing innovative efforts in the fields of biochemistry, chemistry, computer science, mathematics and pharmacy in order to use all available knowledge on the one hand and to combine the most modern methods of the theoretical and practical world to develop advanced machine learning models and methods on the other. This program will fulfill the AI strategy of the Bundesregierung (Federal Government) and can establish Germany internationally as the No. 1 location for molecular machine learning.


While machine learning finds many applications in various overlapping fields, this program focuses on molecular machine learning. This excludes the modelling of protein surfaces, properties of entire materials and periodic systems if these are not still predominantly governed by the molecular constituents (e.g. molecular crystals). This also excludes projects that target the development or improvement of heterogeneous catalysts without explicitly describing them by their molecular structure. However, it includes the development and utilization of molecular representations as long as they are easily available and do not have to be calculated in a time-consuming fashion. The usage and development of representations generated by computationally costly or time-consuming methods are therefore excluded since this will slow down the whole development process and therefore will not be durable in the timescale of this SPP.  Also excluded are projects which are aiming on chemical or pharmacological knowledge gaining without specifically using molecular structures e.g. their molecular representations.

While another spotlight lies on the application of previously developed machine learning algorithms, the mere construction of mathematically new algorithms would be beyond the scope of this project. This does not exclude, however, the development of novel methods or improvements derivatized from these models, especially when targeting domain-specific challenges. On the other hand it does exclude the pure optimization of the models’ performance without aiming on specific chemical or molecular targets. The interaction and cooperation of individual groups with partners from the industrial research is possible and desired within this program but it has to be stated that these collaborations are not the major goal within the funding period.


Figure 1: Envisioned roadmap for the planed program. Going from understanding underlaying principles and data to the development of toolkits for everyday applications.
© Philipp Pflüger, Uni MS

The first funding period aims on improving methodologies to guide molecular machine learning and has the goal to understand underlying principles (Figure 1). Therefore, new representations need to be developed, datasets have to be generated and methods need to be adapted, based on knowledge from the chemical and computer scientific domain. Within these topics, projects designed to gain deep knowledge (ExAI) about chemical and chemoinformatic relationships are highly appreciated. In addition, first feasibility studies should be carried out, examining state-of-the-art concepts on various applications. Establishing strong interactions and exchange within the SPP/Community and training the talents of the future will be crucial for success!

The focus of the second funding period aims on taking prior knowledge to develop these applications further and to guide those into software tools, usable in scientists’ every-day work. These tools shall not only get applied in the molecular machine learning domain but should impact different areas of chemistry as well as pharmacology. As developments in the field of molecular machine learning continue to accelerate, it is necessary that, if required by the state of knowledge, all topics addressed can be eligible for funding within both periods.