In this paper, the authors use a genetic algorithm operating on the SELFIES (SELF-referencIng Embedded Strings) representation of molecules to explore the vast space of small molecules. A neural network is used to guide the exploration process. Also, fitness functions are constructed to generate molecules with specific properties.
Our generator is a genetic algorithm with a population of molecules m. In each generation, the fitness of all molecules is evaluated as a linear combination of molecular properties J(m) and the discriminator score D(m):
F(m) = J(m) + beta*D(m)
Random mutations of high fitness (best performing) molecules replace inferior members, while best performing molecules continue to a subsequent generation. The probability of replacing a molecule is evaluated using a smooth logistic function based on a ranking of fitness among the molecules of a generation. At the end of each generation, a neural network based discriminator is trained jointly on molecules generated by the GA and a reference data set. The fitness evaluation accounts for the discriminator predictions for each molecule. Therefore, the discriminator plays a role in the selection of the subsequent population.
Below is the abstract of Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space.
Challenges in natural sciences can often be phrased as optimization problems. Machine learning techniques have recently been applied to solve such problems. One example in chemistry is the design of tailor-made organic materials and molecules, which requires efficient methods to explore the chemical space. We present a genetic algorithm (GA) that is enhanced with a neural network (DNN) based discriminator model to improve the diversity of generated molecules and at the same time steer the GA. We show that our algorithm outperforms other generative models in optimization tasks. We furthermore present a way to increase interpretability of genetic algorithms, which helped us to derive design principles.