This is a very interesting paper in which the authors use a message passing neural network on a carefully selected data set to predict antibacterial activity against E. coli. Then they apply their model to other data sets, while also prioritizing molecules that are different (via minimum Tanimoto similarity) from existing antibiotics, to find candidates for new antibiotics.

First, we trained a deep neural network model to predict growth inhibition of Escherichia coli using a collection of 2,335 molecules. Second, we applied the resulting model to several discrete chemical libraries, comprising >107 million molecules, to identify potential lead compounds with activity against E. coli. After ranking the compounds according to the model’s predicted score, we lastly selected a list of candidates based on a pre-specified prediction score threshold, chemical structure, and availability.

Next, all 2,335 compounds from the primary training dataset were binarized as hit or non-hit. After binarization, we used these data to train a binary classification model that predicts the probability of whether a new compound will inhibit the growth of E. coli based on its structure. For this purpose, we utilized a directed-message passing deep neural network model (Yang et al., 2019b), which translates the graph representation of a molecule into a continuous vector via a directed bond-based message passing approach. This builds a molecular representation by iteratively aggregating the features of individual atoms and bonds. The model operates by passing “messages” along bonds that encode information about neighboring atoms and bonds. By applying this message passing operation multiple times, the model constructs higher-level bond messages that contain information about larger chemical substructures. The highest-level bond messages are then combined into a single continuous vector representing the entire molecule. Given the limited amount of data available for training the model, it was important to ensure that the model could generalize without overfitting. Therefore, we augmented the learned representation with molecular features computed by RDKit (Landrum, 2006) (Table S2A), yielding a hybrid molecular representation. We further increased the algorithm’s robustness by utilizing an ensemble of classifiers and estimating hyperparameters with Bayesian optimization. The resulting model achieved a receiver operating characteristic curve-area under the curve (ROC-AUC) of 0.896 on the test data

While there has been much controversy regarding actual drug discovery via neural network generative models, it is interesting to note that these researchers use an approach of creating a predictive model and then using it to screen large data bases of existing molecules. In this sense, machine learning is a tool to be added to those already in the molecular screening toolkit.

Code is available at https://github.com/chemprop/chemprop

The abstract of A Deep Learning Approach to Antibiotic Discovery is below.

Due to the rapid emergence of antibiotic-resistant bacteria, there is a growing need to discover new antibiotics. To address this challenge, we trained a deep neural network capable of predicting molecules with antibacterial activity. We performed predictions on multiple chemical libraries and discovered a molecule from the Drug Repurposing Hub—halicin—that is structurally divergent from conventional antibiotics and displays bactericidal activity against a wide phylogenetic spectrum of pathogens including Mycobacterium tuberculosis and carbapenem-resistant Enterobacteriaceae. Halicin also effectively treated Clostridioides difficile and pan-resistant Acinetobacter baumannii infections in murine models. Additionally, from a discrete set of 23 empirically tested predictions from >107 million molecules curated from the ZINC15 database, our model identified eight antibacterial compounds that are structurally distant from known antibiotics. This work highlights the utility of deep learning approaches to expand our antibiotic arsenal through the discovery of structurally distinct antibacterial molecules.

H/T Chicago Ji