Applying generative neural network models to explore the vast space of possible small molecules has been the focus of much activity in recent years. Originally, SMILES strings were used to represent molecules and both variational autoencoders (VAEs) and generative adversarial networks (GANs), sometimes combined with reinforcement learning (RL) have been used to generate novel molecules. In this work, the authors combine a GAN and RL but use a graph to represent molecules. Thus the neural networks are Relational Graph Convolutional Networks (Modeling Relational Data with Graph Convolutional Networks – Schlichtkrull et al. 2017).
This approach appears to work well (see Tables 2 & 3 in the paper). However, it does come with the following caveat.
A central limitation of our current formulation of MolGANs is their susceptibility to mode collapse: both the GAN and the RL objective do not encourage generation of diverse and non-unique outputs whereby the model tends to be pulled towards a solution that only involves little sample variability. This ultimately results in the generation of only a handful of different molecules if training is not stopped early.
We note that the paper includes many useful references to VAE and GAN approaches to generating novel molecules.
Below is the abstract of MolGAN: An implicit generative model for small molecular graphs.
Deep generative models for graph-structured data offer a new angle on the problem of chemical synthesis: by optimizing differentiable models that directly generate molecular graphs, it is possible to side-step expensive search procedures in the discrete and vast space of chemical structures. We introduce MolGAN, an implicit, likelihood-free generative model for small molecular graphs that circumvents the need for expensive graph matching procedures or node ordering heuristics of previous likelihood-based methods. Our method adapts generative adversarial networks (GANs) to operate directly on graph-structured data. We combine our approach with a reinforcement learning objective to encourage the generation of molecules with specific desired chemical properties. In experiments on the QM9 chemical database, we demonstrate that our model is capable of generating close to 100% valid compounds. MolGAN compares favorably both to recent proposals that use string-based (SMILES) representations of molecules and to a likelihood-based method that directly generates graphs, albeit being susceptible to mode collapse.