Molecular Machine Learning (MML)

During the last decade, modern machine learning has found its way into all areas of chemistry. Some long‐standing challenges, such as computer‐aided synthesis planning (CASP), have been successfully addressed, while other issues have barely been touched.

Machine learning (ML) is a method for automatically building up statistical models which are able to recognize patterns in underlying data and can apply these to unseen inputs. What at first sounds simple, has developed rapidly over the past decade, and increasingly complex models are competing for higher predictive power. These developments have allowed deep learning to find its way into almost all areas of daily life and have led to major changes in society. Current developments show that ML will have an immense influence on, for example, health care, crime prevention, and traffic.

However, as a result of their high complexity, modern ML models also entail specific risks and pitfalls. Both algorithms as well as data can be anthropogenically influenced, thereby leading to misinterpreting models (algorithmic bias), and models can take noise as relevant data without being able to generalize the underlying problem, thereby resulting in poor performance (overfitting). In addition, most models do not allow for the path from the input to the generated output to be traced and, thus, cannot be interpreted (black box). As a consequence of these special risks, sufficient verifications of the results have to be carried out, and models as well as methods need to be selected and questioned carefully.

Since problems in the field of chemistry and drug design are often complex pattern‐recognition tasks, it is not surprising that ML has recently received increasing attention. However, in contrast to the expectation that developments in this field have led to fundamental changes in e.g., synthesis current approaches are often limited or leave significant potential for improvements. The reasons for this are manifold. Molecular structures themselves cannot be interpreted directly by ML models, so that computer‐readable representations are required. In addition, the meaningful construction and comprehension of modern neural networks (NNs) requires a high level of knowledge in both chemistry and computer science, which is rarely present in one research group. All this leaves enormous potential for new approaches and future developments.

P. M. Pflüger, F. Glorius, Molecular Machine Learning: The Future of Synthetic Chemistry?, Angew. Chem. Int. Ed. 2020, 59, 18860-18865; Angew. Chem. 2020, 132, 19020-19025. DOI