Quantitative And Machine Learning Asset Analysis

Value 4 Value

Click on the Bitcoin icon to send Sats via the lightning network.

Finding Similar Stocks Via Fast GPU Based Nearest Neighbors with Faiss

There are many ways to find stocks with similar behavior based on how one defines similarity and the data used. In this article we use a 12 period channel where, for each period, we have (current adjusted close price – minimum value)/(maximum value – minimum value). Maximum and minimum values are computed for the adjusted close prices for the past 21 trading days (representing a trading month), then 42 days, …, 252 days. Our channel will then be normalized so that all values are in the interval [0, 1]. We use the Euclidean distance measure as our similarity.

After transforming our data into normalized channels, our task then becomes finding the K nearest neighbors. We will use the Faiss Python library.

Fast GPU Based Nearest Neighbors with Faiss

Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed by Facebook AI Research.

Paper: Seq2seq Fingerprint: An Unsupervised Deep Molecular Embedding for Drug Discovery – Xu et al 2017

In this paper, we propose a novel unsupervised molecular embedding method, providing a continuous feature vector for each molecule to perform further tasks, e.g., solubility classification. In the proposed method, a multi-layered Gated Recurrent Unit (GRU) network is used to map the input molecule into a continuous feature vector of fixed dimensionality, and then another deep GRU network is employed to decode the continuous vector back to the original molecule.

Paper: A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility – Tang et al 2020

In this paper, we describe a self-attention-based message-passing neural network (SAMPN) model, which is a modification of Deepchem’s MPN [16] and is state-of-the-art in deep learning. It directly learns the most relevant features of each QSAR/QSAPR task in the learning process and assigns the degree of importance for substructures to improve the interpretability of prediction.

Paper: A Deep Learning Approach to Antibiotic Discovery – Stokes et al 2020

This is a very interesting paper in which the authors use a message passing neural network on a carefully selected data set to predict antibacterial activity against E. coli. Then they apply their model to other data sets, while also prioritizing molecules that are different (via minimum Tanimoto similarity) from existing antibiotics, to find candidates for new antibiotics.

At no cost to you, Machine Learning Applied earns a commission from qualified purchases when you click on the links below.