Taking SMILES representations of small molecules, converting them to 2d drawings and adding domain information as image channels, an Inception-ResNet deep convolutional neural network (CNN) is used to solve various cheminformatics problems. This work builds on previous work using only images in Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models. Domain information included atomic number, bond order, partial charge, hybridization, and valence.
Compared to multilayer perceptrons with Morgan fingerprint data and a graph convolutional network, the augmented chemception model resulted in mixed performance. This is what is usually the case for cheminformatics problems. Various flavors of graph networks tend to perform best, but not always and no specific type of graph network outperforms others on all tasks. Such results suggest that ensembling would be the desired approach. However, the method presented here is both interesting and useful.
The authors did not provide code, but Esben Jannik Bjerrum did in Learn how to teach your computer to “See” Chemistry: Free Chemception models with RDKit and Keras.
Below is the abstract of How Much Chemistry Does a Deep Neural Network Need to Know to Make Accurate Predictions?
The meteoric rise of deep learning models in computer vision research, having achieved human-level accuracy in image recognition tasks is firm evidence of the impact of representation learning of deep neural networks. In the chemistry domain, recent advances have also led to the development of similar CNN models, such as Chemception, that is trained to predict chemical properties using images of molecular drawings. In this work, we investigate the effects of systematically removing and adding localized domain-specific information to the image channels of the training data. By augmenting images with only 3 additional basic information, and without introducing any architectural changes, we demonstrate that an augmented Chemception (AugChemception) outperforms the original model in the prediction of toxicity, activity, and solvation free energy. Then, by altering the information content in the images, and examining the resulting model’s performance, we also identify two distinct learning patterns in predicting toxicity/activity as compared to solvation free energy. These patterns suggest that Chemception is learning about its tasks in the manner that is consistent with established knowledge. Thus, our work demonstrates that advanced chemical knowledge is not a pre-requisite for deep learning models to accurately predict complex chemical properties.