“We must learn to understand chemistry as data science”
“Molecular Machine Learning” (MML) is a new branch of research with the potential to change chemical research. Prof. Frank Glorius, coordinator of the new Priority Programme “Molecular Machine Learning” (SPP 2363), funded by the German Research Foundation (DFG), and Philipp Pflüger, who is working on his PhD in Chemistry and helped to develop the programme, explain in this interview with Christina Hoppenbrock what MML means, what opportunities and challenges this new field of research presents, and what working in chemistry will be like in tomorrow’s world.
What overall role does machine learning play in chemistry?
Frank Glorius: Most of the issues we are faced with in chemistry, and in synthetic chemistry in particular, require the ability to recognise complex patterns. We have to recognise connections between the structures of different molecules, reactions or analytical data. Although chemists are already doing a lot, we’re reaching our limits when dealing with large quantities of data – for example thousands of chemical reactions. In such cases, machine learning can help in recognising important patterns and, based on that, in drawing up generally valid models. The predictions made by these models can then be used by chemists in the laboratory to gauge the yields from reactions.
Philipp Pflüger: Chemistry, and industry in particular, are undergoing a real transformation here. We see appropriate algorithms and models providing us with opportunities to work more efficiently and solve problems faster. Machine learning can take over certain tasks for us, or can make them easier. As a result, chemists can focus more on creative, innovative tasks.
And what does the word “molecular” in “molecular machine learning” refer to?
Frank Glorius: The concept of machine learning has been in existence since the 1950s, and since computers have become ever more powerful, various disciplines have been making use of these “self-learning algorithms”. This is relatively easy especially in text recognition or in processing numerical values, i.e. numbers, as programmes can directly process these types of data. It’s different for molecules and reactions, though. There is no standard, machine-readable format in which we can feed our material into a programme. Although there are one or two approaches to doing so, this so-called encoding of molecules still presents us with enormous challenges.
Philipp Pflüger: MML can help us to understand and use chemical data – but first we have to explore these technical methods, adapt them to chemical problems and sometimes develop them anew. Only then can we see what self-learning algorithms can do for chemistry and, in particular, see where any limitations are.
As a rule, chemists have no training in IT. Who is to develop the necessary computer programmes?
Frank Glorius: That’s true – our curricula don’t provide for much teaching on how to use programming languages, which means that PhD students have to learn a lot at a relatively late stage. But it can’t, and shouldn’t, be the aim for all chemists to become software developers. What our experience teaches us here is that only collaboration can provide the key to success in such an interdisciplinary field.
But how is the necessary expertise pooled?
Frank Glorius: It was clear to us that a platform would be needed to bring together IT specialists, chemists, pharmacists, mathematicians and many others besides. We took a systematic look at the people working in this field in Germany and were able to set up an initial network. The feedback which we received was overwhelmingly positive. The SPP 2363 now offers experts from all fields an opportunity to exchange views and ideas on MML, discover synergies and, ultimately, actively shape what is an important area.
To what extent do students benefit from it?
Philipp Pflüger: I hope that we students can learn a lot from one another. In my experience, many young PhD students want to take a detailed look at topics from other fields in order to learn things that are fundamentally new to them. In research groups, which are often homogeneous, this is not easy to do, and it requires students to show a great deal of initiative of their own. I’m convinced that programmes such as the SPP will support this interest and broaden the horizons of budding researchers.
What can young people studying chemistry today expect? Will a chemist’s job be different in future from today?
Frank Glorius: Just as a chemist’s job was very different 20 years ago from what it is today, it will be just as different again in ten years’ time. Of course, technical developments will play a decisive role here. As we can already see in some areas, in future we will have methods providing support for chemists at every step in their everyday working lives. We need to become open to all these tools and allow change to happen. Technology always has the associated risk of failure, and we should implement a culture of error tolerance. We need to learn to see chemistry as a data science so as to be able to look at our results systematically. This data awareness is slowly developing in industry and we as a science community are also beginning to consider the value of our data as a whole.
Philipp Pflüger, you are a young chemist. This ‘tomorrow’s world’ is going to be your everyday working life …
Philipp Pflüger: I have the good fortune to be able to get to know both sides of the same area: chemistry work in the lab, in all its many facets, and cheminformatics with new perspectives and approaches. I see incredible potential in a combination of the two for the development of new tools. I’d very much like to be a part of this change that’s happening now, and I hope to be able to support my colleagues in the labs with creative, user-friendly solutions. My hope is that we can work together to realise the full potential of MML. In my opinion, this task also includes developing future applications in such a way that chemists do not need any IT knowledge – the software developed should be an intuitive tool.
What role does SPP 2363 play in preparing young people?
Philipp Pflüger: Excellent research is sustained by well-trained young people. So it’s incredibly important to me that this priority programme – which is geared primarily to research – should also make a change to teaching. I hope that researchers involved will develop an interest in integrating topics such as data science or cheminformatics in curricula, and I would like to see us PhD students contributing to this. In this respect, the SPP will serve as a platform for exchanging ideas, sharing teaching experiences and reducing inhibitions. In the field of computer science, as well as chemistry, we have already given some initial seminars on subjects such as cheminformatics and machine learning. Again and again, we were delighted to see how much interest the students showed in interdisciplinary research. My hope is that the SPP will provide us with an opportunity to build up long-term collaborations which will then sustain joint teaching on the part of various disciplines.
Frank Glorius: What’s important here, in addition to teaching, is training PhD students and future professors. Our SPP gives junior researchers a chance to pursue their own development in this modern field.
Specialists in synthetic chemistry often say that there is an aesthetic inherent in molecular reactions. This beauty or elegance – which is not immediately apparent to non-specialists – is, for many chemists, a spur to creating and optimising synthetic pathways …
Philipp Pflüger: … Of course aesthetics and elegance are central elements in organic chemistry, and sometimes we see ourselves as molecular architects. But synthetic chemistry, as an application-oriented science, has to be elegant per se in order to be efficient and usable. Here we could describe many a quest for elegance more as a necessity and less as an intrinsic spur – although designing methods and syntheses always requires a special creativity. The same is true for chess, and the game and some of the moves it entails are said to have a certain aesthetic – and with the arrival of the chess computer we were able to see that artificial intelligences (AI) do indeed play differently from the way people do. These algorithms develop entirely new strategies and moves with their own elegance, and as a result we are now at a point where we train people by means of AI. In the long term, this trend – in which we can learn from computers and combine our “creativities” – can also be expected to happen in chemistry.
Frank Glorius: At the moment we can’t yet say when that will happen because – in chemistry especially – this particular field is only just being developed. As always, when a hype or a trend comes up, there are some people who overestimate the technology – and others who underestimate it. The truth often lies somewhere in between, and we can only do our best to explore these new tools responsibly and then develop them further. I take a pragmatic view: humans and machines have different strengths and weaknesses. We humans have to make sure that we combine artificial intelligence sensibly with what already exists. With this aim in mind, a symbiosis of human skills and artificial intelligence would be very elegant indeed.
The Priority Programme “Molecular Machine Learning” starts in spring 2022. Even before it kicks off, the organizers are looking for students who would like to become involved. Anyone interested should contact Prof. Frank Glorius (firstname.lastname@example.org).
Save the date:
13 January 2022, 3 pm (online via Zoom): 4th International Mini-Symposium on Molecular Machine Learning (MML) 2022
The international series of symposia on Molecular Machine Learning brings together leading international scientists from fields of research such as automation, reaction prediction, computer-assisted reaction planning, pharmaceuticals and data-based molecular design. The conferences, organized by Prof. Frank Glorius, aim to present the most up-to-date research and ideas, thus contributing towards a strong, transparent machine learning community in chemistry. Anyone interested is very welcome to attend.