Projects - 2nd Period

  • Design of Photocatalytic Systems for CO2 Reduction Driven by Synergistic Cooperation of Machine Learning and Automated Labs (Friederich, Jung, Bräse)

    Prof. Dr. Pascal Friederich, Nicole Jung, Prof. Dr. Stefan Bräse

    The project “Design of photocatalytic systems for CO2 reduction driven by synergistic cooperation of machine learning and automated labs” aims to accelerate the discovery of efficient photosensitizers and catalysts for the reduction of CO2 to valuable chemicals such as carbon monoxide and methane, addressing both climate change and sustainable feedstock generation. Conventional approaches to catalyst development rely on time-consuming trial-and-error experimentation, limiting the ability to explore the vast chemical space of potential candidates and limiting the gain of knowledge from the conducted experiments. This project overcomes these limitations by combining machine learning and explainable artificial intelligence methods with the automated experimental platform ChemASAP. Specifically, the project will integrate automated synthesis and testing in a self-driving laboratory with state-of-the-art ML methods, enabling the rapid and systematic exploration of photosensitizer and catalyst candidates.
    The project is structured into three key objectives: (1) development of a machine learning framework to predict essential properties of photosensitizers and catalysts, such as redox potential, photostability, and selectivity, using graph neural networks (GNNs) with built-in explainability features; (2) exploration of a large chemical space to identify promising molecules, using active learning and data- and explanation-driven optimization algorithms to efficiently guide experiments; and (3) integration of the self-driving laboratory to perform closed-loop optimization of compounds and catalytic conditions, where experimental data continuously updates the ML model to refine both molecular structures and reaction conditions.
    The research combines the cheminformatics and chemistry expertise of Nicole Jung and Stefan Bräse and the machine learning know-how of Pascal Friederich. Two jointly supervised PhD students will contribute: One focusing on molecular templates, automated synthesis, and laboratory automation, while the other on computational tasks, including high-throughput DFT calculations, as well as development, training, and integration of explainable graph neural network models and optimization algorithms. Both PhDs will be strongly supported by the automation and software development teams established at KIT. By fully automating both experimental and computational aspects, this project will not only accelerate the discovery of catalysts for CO2 reduction but also generate broadly reusable tools and methodologies, benefiting the molecular machine learning and chemistry communities at large.
    The findings are expected to advance the field of AI-driven catalysis, providing insights into efficient CO2 reduction and contributing to broader efforts in carbon capture and sustainable chemical production.

  • Development and Application of Improved Ligand Descriptors and Representations for Inverse Catalyst Design (Strieth-Kalthoff, Däschlein-Gessner)

    Prof. Dr. Felix Strieth-Kalthoff, Prof. Dr. Viktoria Däschlein-Gessner

    The development of new transition metal catalysts and the corresponding ligands is of crucial importance for synthetic organic chemistry and a decisive factor for the development of new reactions and improved synthesis protocols. The design of tailor-made ligands for the specific requirements of different reactions is an extremely time and cost-intensive process. The development of structure activity relationships can accelerate the optimization of catalysts. However, the simple use of individual parameters (descriptors) to quantify steric or electronic ligand properties has proven to be too one-dimensional to reliably and quantitatively describe the complex structure-activity relationships. The use of multivariate regression analyses and the application of machine learning methods have recently enabled first advances in the prediction of improved ligands. The goal of this research project is the inverse design of homogeneous catalysts using machine learning methods. Therefore, different strategies will be developed and tested for the design of ylide substituted phosphines (YPhos) for palladium- and gold-catalyzed reactions. In the first step, we will develop new descriptors for the improved description of ligands and their properties. These descriptors aim to explicitly quantify secondary metal-ligand interactions, thereby overcoming the distinction between monophosphines and bidentate ligands, thus enabling the screening of a broader ligand space. The new descriptors will be generated by means of quantum chemical calculations and will be verified on the basis of experimental studies in order to subsequently enable a reliable prediction of the ligand properties that are decisive for the respective reaction. To predict optimal YPhos ligands for different reactions, the next step involves creating a large-scale virtual YPhos library. Searching this library for the optimal ligands will be made possible through the development of various ligand representations. Easily interpretable descriptors based on physicochemical properties, as well as powerful deep-learning representations, will be used and compared in terms of their efficiency. Additionally, the models will be combined with a classifier trained on experimental data to predict the synthesizability of the ligands. This approach ensures realistic predictions and thus the applicability of the models for inverse ligand design in experiment. In the final step, this will be evaluated using selected reactions in gold and palladium catalysis. Based on experimental data, ideal catalysts will be predicted using machine learning methods and subsequently be synthesized and evaluated.

  • Development and Application of ML Tools for Energy Transfer Catalysed Photocycloadditions (Jorner, Glorius)

    Prof. Dr. Kjell Jorner, Prof. Dr. Frank Glorius

    Cycloaddition reactions are pivotal tools in synthetic chemistry for constructing complex molecular architectures. Visible light-mediated energy transfer (EnT) photocatalysis has emerged as a transformative method for cycloadditions, enabling access to the triplet excited state (T1) under mild conditions. By harnessing this excited state, EnT catalysis introduces unique reaction modes that are inaccessible from the ground state, thereby facilitating the formation of biologically relevant, C(sp3)-rich three-dimensional molecular structures. However, the reactivity and selectivity of substrates in EnT-catalysed reactions remain challenging to predict, given the limited mechanistic understanding and scarcity of experimental data.
    This project aims to systematically address these challenges through three main objectives: (i) the curation of a comprehensive and balanced dataset of EnT-catalysed reactions (ii) the development of physically relevant descriptors to accurately capture triplet state reactivity, (iii) development of data-driven models for predicting selectivity. By leveraging these models, we will provide valuable mechanistic insights that enable the generalisation of selectivity trends across a diverse array of reaction conditions and substrates. The predictive models will be furnished into user-friendly tools that will equip synthetic chemists with a robust predictive framework that enables precise prediction of reaction outcomes. This will broaden the applicability of EnT catalysis, facilitating its adoption in complex molecular synthesis and advancing innovations in drug discovery and materials science.

  • DREAM: Developing Robust Evaluation and Analysis Methodologies for Chemical Reactions (Glorius)

    Prof. Dr. Frank Glorius

    Cycloaddition reactions are pivotal tools in synthetic chemistry for constructing complex molecular architectures. Visible light-mediated energy transfer (EnT) photocatalysis has emerged as a transformative method for cycloadditions, enabling access to the triplet excited state (T1) under mild conditions. By harnessing this excited state, EnT catalysis introduces unique reaction modes that are inaccessible from the ground state, thereby facilitating the formation of biologically relevant, C(sp3)-rich three-dimensional molecular structures. However, the reactivity and selectivity of substrates in EnT-catalysed reactions remain challenging to predict, given the limited mechanistic understanding and scarcity of experimental data.
    This project aims to systematically address these challenges through three main objectives: (i) the curation of a comprehensive and balanced dataset of EnT-catalysed reactions (ii) the development of physically relevant descriptors to accurately capture triplet state reactivity, (iii) development of data-driven models for predicting selectivity. By leveraging these models, we will provide The project aims to develop a standardized, data-driven workflow for designing and executing more informative substrate scope studies in synthetic organic chemistry. The workflow will accelerate the assessment of novel reactions while generating reliable, comprehensive datasets for the chemical community. We propose a three-part approach: (1) creating a systematic classification of functional groups with predictive models for reaction compatibility, (2) developing accessible computational tools to analyze reaction centers and optimize substrate selection, and (3) integrating these approaches into a comprehensive workflow for predicting reaction outcomes. The methodology will be validated through case studies across different reaction classes and laboratory settings and packaged as user-friendly, open-source software. Our ultimate goal is to transform how synthetic methodology is evaluated and reported, accelerating the adoption of new synthetic methods in both academic and industrial settings.valuable mechanistic insights that enable the generalisation of selectivity trends across a diverse array of reaction conditions and substrates. The predictive models will be furnished into user-friendly tools that will equip synthetic chemists with a robust predictive framework that enables precise prediction of reaction outcomes. This will broaden the applicability of EnT catalysis, facilitating its adoption in complex molecular synthesis and advancing innovations in drug discovery and materials science.

  • Efficient Semiempirical Quantum Mechanical Method with Adaptive Learning (Grimme)

    Prof. Dr. Stefan Grimme

    The project focuses on advancing computational chemistry by an adaptive-learning-driven semi-empirical quantum mechanical (SQM) method tailored to improve efficiency and accuracy in general molecular modeling. This involves creating a new, lightweight tight-binding model, termed g0-xTB (adaptive g0-xTB), which will leverage machine learning to dynamically adjust its parameterization. This work involves three main tasks. The first is developing the g0-xTB base Hamiltonian and optimizing conventionally a global parameter set to ensure broad applicability and robust performance across diverse chemical spaces. This g0-xTB method can already be applied to various chemical problems where high speed and robustness is required. The second task is implementing adaptive learning for on-the-fly training, e.g., during MD runs. This will be achieved with automatic differentiation techniques like our previous dxtb to enable easily dynamic parameter adjustments. Finally, the new ag0-xTb method is applied in various chemical and methodological applications. These include conformer ensemble generation, reaction network exploration, and large-scale screening of catalysts and drug candidates. The project will make key outcomes available via open-source codes on GitHub, ensuring broad accessibility and reproducibility.

  • Foundational Implicit Solvent Machine Learning Potentials for Organic Molecules (Zavadlav)

    Prof. Dr. Julija Zavadlav

    Machine learning (ML), particularly the development of ML potentials, has dramatically advanced molecular modeling by enabling the prediction of potential energy with the accuracy of ab initio methods at a fraction of the computational cost. However, further improvements in computational efficiency are needed to tackle large spatiotemporal scales and high-throughput screening studies. The use of implicit solvent models can provide the necessary speed-ups, especially for simulations where the solvent constitutes a significant portion of the computational domain. To date, implicit solvent ML potentials have been developed for water. This project aims to extend these models to include non-aqueous solvents, opening the doors to numerous applications in chemical engineering, as well as in various fields of physical and medicinal chemistry that were previously inaccessible. On the other hand, the proposed novel ML architectures and training strategies, incorporating concepts from foundation models, will be of great value to data-driven molecular modeling. The project's software tool development will serve the broader scientific and industrial community by lowering the entry barrier to deploying implicit solvent ML potentials in everyday molecular simulations.

  • GML4Space: Generative Machine Learning Operating on Chemical Fragment Spaces (Rarey)

    Prof. Dr. Mathias Rarey

    In early-phase drug discovery, several methods for the identification of novel, small organic bioactive compounds exist, including chemical similarity searching by topology and shape, pharmacophore matching, molecular docking, and, nowadays, supervised machine learning (ML) models.
    Traditionally, the search process was performed on large catalogues of small molecules, either experimentally or computationally (high-throughput or virtual screening). Due to the sheer size of chemical space, new approaches, especially fragment-based and de novo design, are promising alternatives. Here, small fragment binders are located first and either combined or grown to larger molecules afterwards. Recently, de novo design based on generative ML has received significant attention. The disadvantage of these approaches is that all designed compounds have to be individually synthesized, which is time- and cost-consuming.
    In parallel to the rise of ML, the concepts of combinatorial chemistry and chemical fragment spaces emerged. On their bases, compound vendors like Enamine or WuXi created large make-on-demand compound collections. Today, Enamine REAL contains about 50 billion compounds, others even trillions and higher. Since the spaces are too large to be handled molecule by molecule, combinatorial algorithms have emerged to search and navigate these collections. While solutions to handle chemical fragment spaces exist for many search scenarios, the combination of fragment spaces with generative ML is widely unexplored. 
    This project aims at combining generative ML of molecules and chemical fragment spaces. A cascade of new methods enabling the efficient use of supervised ML on chemical fragment spaces will be developed. In a first phase, generic optimization algorithms will be combined with ML models to identify bioactive molecules in fragment spaces. Next, new techniques to describe chemical matter as molecules from fragment spaces will be developed. These encodings ensure that all compounds described are indeed contained in the search space. At the same time, they sensitively model molecular similarity aspects. Thereby, generative machine learning can directly operate on chemical fragment spaces, creating only those molecules contained in a predefined search space like Enamine REAL. In a final phase, explainable ML techniques will be used to extract knowledge about the importance of individual fragments in compounds and apply them directly to select optimized bioactive compounds. After careful validation, a series of new ML approaches directly operating on chemical fragment spaces will emerge.  

  • Highlighting Molecular Similarity Using Explainable AI (Koch, Risse)

    Prof. Dr. Oliver Koch, Prof. Dr. Benjamin Risse

    This project aims to advance molecular machine learning by developing explainable AI tools for analyzing molecular similarity. Building on previous work, we propose to (1) enhance neural fingerprint methods using improved graph neural network (GNN) architectures, (2) adapt and benchmark explainable AI (XAI) techniques for highlighting key structural features responsible for molecular similarity, and (3) integrate these methods into a user-friendly software platform for medicinal chemists and drug designers. The resulting tool will enable users to train custom neural fingerprints, benchmark their performance, and perform similarity-based virtual screening with interpretable results.

  • High Throughput Enabled Optimization of Machine Learning for Property Targeted Spiropyran Design (Reuter, Hecht)

    Prof. Dr. Karsten Reuter, Prof. Dr. Stefan Hecht

    This project proposes a machine learning (ML) assisted approach to systematically advance the targeted design of spiropyran photoswitches, aiming to optimize their functional properties for diverse applications. Building on promising findings in the first funding period, we will develop a fully automated high-throughput synthesis and analysis platform that facilitates the rapid generation and evaluation of a comprehensive library of spiropyran derivatives. Central to this platform is our newly developed dynamic spiropyran exchange reaction, which will allow us to create a substantially enlarged mixed library of photoswitches, thus enabling the efficient synthesis and characterization of structurally diverse photoswitches without isolating individual derivatives. This high-throughput approach will provide a rich dataset of molecular structures and their corresponding switching properties, facilitating the development of advanced ML models for precise prediction and targeted optimization in photoswitch design. We will explore known and new substitution patterns on the spiropyran scaffold, expanding into previously inaccessible regions of chemical space to create photoswitches with enhanced and possibly unique properties. To deepen theoretical insights into spiropyran and merocyanine behavior, we will employ machine learning force fields (MLFFs) to develop new computational descriptors, which offer a more cost-effective alternative to ab initio methods. This will enable detailed mapping of potential energy surfaces and precise estimation of thermal half-lives of the metastable merocyanine isomers, providing a refined theoretical foundation for the design of optimized photoswitches. In addition, we aim to extend ML models to incorporate environmental factors such as viscosity and pH, which are essential for designing molecular photoswitches tailored to complex settings, including biological environments and 3D printing matrices. By combining high-throughput experimental methods with cutting-edge ML-based theoretical modeling, this project will generate an integrated understanding of structure-property relationships, which will guide the rational design and synthesis of next-generation photoswitches specifically engineered for real-world applications across technology and science domains.

  • Machine Learning Approaches for Faster Discovery and Adaptation of Enzymes for Difficult Chemical Reactions. Phase II: Predicting and Expanding the Enzymatic Reaction cope to Include New-to-nature Reactions (MacBioSyn) (Davari)

    Dr. Mehdi Davari

    The biocatalytic synthesis of chemicals is essential for green, sustainable chemistry but remains underutilized due to limited enzyme activity and diversity. Expanding the catalytic repertoire of enzymes to perform transformations that are challenging through traditional synthetic methods — such as site-selective modifications of unactivated molecules — holds vast potential for the production of valuable compounds, including those relevant to pharmaceuticals and fine chemicals.
    A major obstacle in this endeavor lies in accurately predicting enzyme function, substrate scope, and reaction specificity. Traditional experimental approaches require labor-intensive biodiversity screening, often limiting discovery throughput. To overcome this, our project explores advanced machine learning (ML) strategies to systematically identify and characterize enzymes capable of catalyzing non-natural and synthetically valuable reactions.
    During Phase I of the MacBioSyn project, we developed an ML-powered discovery platform incorporating a high-throughput (HT) screening workflow and a curated enzyme–substrate dataset comprising enzymes and chemically diverse substrates. This allowed us to establish predictive models for enzyme activity and specificity, supporting the targeted discovery of promising biocatalysts.
    Building on these foundations, Phase II will focus on expanding the capabilities of this ML framework to predict entirely novel enzymatic transformations, including non-native reactions using unconventional substrates. Our goal is to identify and engineer highly versatile enzymes capable of performing previously inaccessible chemical modifications.
    This integrated computational–experimental approach will accelerate the identification of new-to-nature biocatalysts and enable the development of environmentally friendly methods for synthesizing complex molecules. Ultimately, the insights and tools generated will help redefine the role of enzymes in synthetic chemistry and broaden their application across industrial and pharmaceutical sectors.

  • Machine Learning for Asymmetric and Electrochemical 3d Transition Metal-catalyzed C–H Activations (Ackermann)

    Prof. Dr. Lutz Ackermann

    Transition metal-catalyzed C–H activation has surfaced as a powerful strategy for molecular synthesis, with applications in drug discovery, late-stage diversification, and material sciences, among others. The development of challenging asymmetric 3d metal-catalyzed C–H activations continues misrepresented. Here, the multi-parameter reaction optimization represents a major drawback. During the last few years, data-driven strategies have emerged as an increasingly viable toolbox in molecular sciences. Within the SPP 2363, we seek to create and implement machine learning predictive models to propel the field of asymmetric 3d transition metal catalyzed C–H activation, for the construction of C-center as well as the axial-chirality of
    biologically relevant compounds. In this regard, the use of a reliable and robust dataset is of extreme importance. Here, the use of High-Throughput Experimentation (HTE) will be implemented to assist in the rapid creation and expansion of an experimental database.
    Thereafter, state-of-the-art (SOTA) machine learning algorithms and GNN will be employed in the generation of predictive models for the accurate prediction of reactivity and selectivity of electrocatalytic 3d transition metal-catalyzed C–H activations. However, their success relies on the precise definition of three-dimensional descriptors between other and accurate molecular representations. Hence, within the SPP 2363 the synergetic relation between experimental chemistry and computer sciences will be explored for the benchmarking and development of accurate predictive models for the synthesis of medicinal relevant compounds through asymmetric catalysis. Such breakthroughs will contribute to establishing Germany in the vanguard of artificial intelligence applied in asymmetric catalysis on an international level worldwide.

  • Machine Learning for Molecular-Precision Design of Multifunctional Materials (Bocklitz, Presselt)

    Prof. Dr. Thomas Bocklitz, Dr. Martin Presselt

    The behavior of molecules at interfaces and within membranes is critical to many fields, from photoenergy conversion to cell membrane biochemistry and pharmacology. Although synthetic chemistry has developed sophisticated methods for molecular functionalization, predicting molecular behavior at interfaces and in complex or biological environments remains a challenge. Here, machine learning (ML) can make a significant impact, offering potential for predictive models of molecular behavior in these settings — yet it requires extensive, diverse datasets that are often challenging to generate.
    Our project addresses this need by investigating the behavior of functionalized molecules, both in pure form and in mixtures, within molecular monolayers at the air-water interface. These two-dimensional layers serve as ideal model systems, allowing us to integrate molecular modeling with ML to gain insights into molecular behavior in complex environments. Collaborating within SPP 2363, we aim to develop modular ML methods that systematically analyze supramolecular structure formation, ultimately creating new ML models based on these insights. This project will generate reusable ML tools and methodologies that bridge molecular chemistry and ML. Expected outcomes include a software suite for the automated production of molecular layers and innovative molecular representations. The necessary large, independent datasets will be produced through the combined use of roll-to-roll and Langmuir-Blodgett techniques, both of which offer precise molecular layer formation at the air-water interface. A novel suite of diverse in-situ characterization techniques will provide essential spectral and microscopic data.
    Through this project, we will establish a new platform for the fabrication, characterization, and data-driven modeling of molecular ensembles at interfaces. The resulting data will provide critical insights into the behavior of functionalized molecules in complex, dynamic environments and support the advancement of AI models in molecular prediction.

  • Machine Learning for Organocatalysis in the Small Data Regime (Schreiner)

    Prof. Dr. Peter R. Schreiner

    The proposal focuses on advancing machine learning (ML) applications in organocatalysis using curated, high-quality small experimental datasets. Key goals include refining enantioselective catalyst design, especially for challenging reactions like the Corey-Bakshi-Shibata (CBS) reduction and the Dakin-West reaction, using a "key-intermediate-graph" approach. The project will further extend ML applications to ketimine reductions and meso-anhydride desymmetrization, addressing limitations in the current state-of-the-art. We plan on creating a user-interactive ML system for real-time predictions and enhancing dataset quality through modular peptide design. Our efforts also include (inter)national collaborations and making our ML platform available to other groups via bench-top implementations.

  • Molecular Descriptors in Matrix Completion Methods (Jirasek, Leitte)

    Prof. Dr. Fabian Jirasek, Prof. Dr. Heike Leitte

    Matrix completion methods (MCMs), which are established in recommender systems, are also promising for the prediction of fluid properties of mixtures. These MCMs can be trained in a completely data-driven way on sparse mixture data, whereby they uncover and exploit structure and similarities among the components. In this project, we will extend purely data-driven MCMs to hybrid models by incorporating molecular descriptors in their training, specifically molecular graphs and molecular class affiliations, which we will learn based on a systematic analysis of mixture data using visual data analytics. Furthermore, we will extend the hybrid MCMs to predict fundamental pair interactions and develop XAI methods to create an understanding of the ML models and of what matters on the molecular level for describing mixture behavior.

  • Multi-fidelity, Active Learning Strategies for Exciton Transfer in Cryptophyte Antenna Complexes (Zaspel, Kleinekathöfer)

    Prof. Dr. Peter Zaspel, Prof. Dr. Ulrich Kleinekathöfer

    Multiscale simulation of light-harvesting complexes is key to fundamental research in photosynthesis and solar cell design. Among the biological species performing light harvesting are cryptophyte algae. 
    The absorption of the sunlight in these algae takes place in phycobiliproteins by pigment molecules termed bilins. Due to the flexible nature of the proteins, the simulation of the light-harvesting process requires accurately calculating excitonic properties (excitation energies, couplings, transition dipole moments, etc.) for over hundreds of thousands to millions of bilin conformations. Nowadays, large parallel computers allow for initial studies at full scale. Still, such studies are only computationally feasible, if excitonic properties are evaluated at rather low, i.e., cheap to compute, levels of quantum chemical theory. This strongly limits the expressiveness of the results.
    The overarching goal of this project is to enable highly accurate multiscale simulations of light-harvesting complexes by replacing computationally expensive quantum chemical calculations at high level of theory by cheap to evaluate machine learning (ML) models. To assure true efficiency of the approach, all ML models have to be constructed such that a minimum computational effort is required to build training data giving models of low prediction error. The target is to go away from investing arbitrary amounts of computing time into training data generation, while reporting fast model predictions, over to an approach of efficiency in both the model construction and model evaluations.
    The main objective of the second funding phase is to bring multi-fidelity machine learning (MFML) into “production'', that is the approach is generalized to more diverse chemical properties, improved in efficiency in presence of less clear data hierarchies and automatized in an active learning (AL) based target-error adaptive construction. MFML techniques are then provided as a community-wide available software package for further exploration beyond the original intended application. In addition, for bimolecular learning, the aim is to go beyond an explorative study to show true impact on challenging data with massive cost reductions and better generalizability of models for, e.g., coupling energies. Within this project, both MFML and bimolecular learning will be applied and tested for three phycobiliproteins from cryptophyte algae, namely PC612, PC645, and PE566.

  • Overcoming the Limits of Remote Functionalization Through Machine Learning Guided Catalyst Identification (Schoenebeck)

    Prof. Dr. Franziska Schoenebeck

    Remote functionalization is an important strategy in homogeneous catalysis and synthesis that allows for the functionalization of a molecule at a site distant from its initial activation. A so-called chain walking process facilitates the migration of the metal from the initiation to the remote site. Although significant progress has been made in the field, several challenges remain with regard to scope, selectivity and generality of the transformation and catalysts. In this context, we have previously shown an unprecedented speed of a Pd(I) dimer catalyst in remote cross-coupling (arylation). While most catalysts require reaction times of at least 12 h and often elevated temperatures, the dinuclear Pd(I) accomplished this feat in 10 min at room temperature and exclusive selectivity. The current limitation of this method is the requirement of an ortho-fluorinated substrate, without which the selectivity for remote coupling cannot be reached. Here, we propose a machine learning (ML) approach to find suitable candidates within a large ligand library that are not only able to form Pd(I) dimers (which will allow for a speedy reaction) but also trigger high selectivity in terms of promoting the product of remote functionalization independent of any substrate restrictions.

  • Reliable Scoring and Synthesis of Bioactive Molecules Beyond Combinatorial Chemical Space (Meiler, Stadler)

    Prof. Dr. Jens Meiler, Prof. Dr. Peter F. Stadler

    Computer-aided drug design faces three intertwined computational challenges: the ligand must be functional in its interaction with the designated target, it must be synthetically accessible, and it must be sufficiently different from known compounds to avoid e.g. competing patents. In the first funding phase we have demonstrated that these goals can be achieved in principle by means of an evolutionary search algorithm. In the second phase we will further strengthen the search abilities in vast chemical spaces going beyond the limits of simple combinatorial make-on-demand libraries. To address the need for a more reliable scoring regime for molecules, we propose a novel machine learning model based on electron density predictions with which we intend to divide protein-ligand complexes into distinctive interactions. These interactions should form the basis of a robust scoring function rooted in physicochemical rules. In addition, we found that target specific chemical space increases docking scores tremendously. This allows reduced sampling and therefore faster runtimes without lower hit rates. However, it does reintroduce the need for chemical synthesis. The search process itself will therefore be restricted to enforce synthesizability by modifying the cut-and-join crossover developed in the first phase to respect the transformation patterns of chemical reactions, invoking graph transformation grammars. Search trees leading to good candidates, moreover, will be systematically simplified using a data-driven approach to further reduce synthetic pathways to reduce the cost and complexity of synthesis.