Neural Fingerprints

Although machine learning methods have been applied in drug design for decades, artificial intelligence is expected to have an ever-increasing impact on drug discovery projects.In particular, the increasing success of new deep learning methods and the growing availability of bioactivity data raises hopes for a fundamental change in the way drugs are developed. It is expected that computers will be able to design and predict new molecules with desired properties and at least a partial transfer of decision-making power to machine intelligence could be possible.¹ For a long time, molecular fingerprints were used as an input for training artificial neural networks with data related to molecules. Molecular fingerprints convert molecules into numeric vectors of fixed length. Rather than encoding the complete molecule, this representation captures structural characteristics and chemical properties. Graph Neural Networks (GNNs) have been the focus of much attention lately because of their ability learn to encode molecules without requiring precomputed features. GNNs were introduced as a neural network-based alternative to traditional fingerprints. Up to now, only some basic properties have been investigated and these graph convolutional based neural network fingerprints have never been evaluated in similarity search.

We addressed this deficit by training neural networks and subsequently evaluating the performance of their fingerprints in similarity search.² We expect neural networks to combine information about the molecular space of already known bioactive compounds together with the information on the molecular structure of the query and by doing so enrich the fingerprint. Graph Convolution Network (GCN) and a much simpler multilayer perceptron (MLP) which uses a traditional precomputed fingerprint as input were trained for target prediction on an on an exhaustive kinase bioactivity dataset [4]. We show that the novel idea of using activations of trained neural networks as neural fingerprints for similarity search clearly outperforms traditional ones like the ECFP4 even when specific kinase targets are not included in the training data. Surprisingly, the GCN-based fingerprint performed significantly worse in similarity search than the MLP-based fingerprint.

The main aim is the development of a domain-specific neural fingerprint that is independent of a precalculated fingerprint and shows superior performance in similarity search. So, an alternative to the graph convolutional network is needed. From the application perspective, domain-specific neural fingerprints should be developed for different applicability domains. Janosch Menke is working on this research topic (Janosch).

References

Schneider, P., et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug. Discov. 2020, 19, 353-364.
Menke, J.; Koch, O. Using Domain-specific Fingerprints Generated Through Neural Networks to Enhance Ligand-based Virtual Screening. 2020 ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.12894800