Finding Similar Stocks Via Fast GPU Based Nearest Neighbors with Faiss
There are many ways to find stocks with similar behavior based on how one defines similarity and the data used. In this article we use a 12 period channel where, for each period, we have (current adjusted close price – minimum value)/(maximum value – minimum value). Maximum and minimum values are computed for the adjusted close prices for the past 21 trading days (representing a trading month), then 42 days, …, 252 days. Our channel will then be normalized so that all values are in the interval [0, 1]. We use the Euclidean distance measure as our similarity.
After transforming our data into normalized channels, our task then becomes finding the K nearest neighbors. We will use the Faiss Python library.
Fast GPU Based Nearest Neighbors with Faiss
Faiss is a library for efficient similarity search and clustering of dense vectors. It contains algorithms that search in sets of vectors of any size, up to ones that possibly do not fit in RAM. It also contains supporting code for evaluation and parameter tuning. Faiss is written in C++ with complete wrappers for Python/numpy. Some of the most useful algorithms are implemented on the GPU. It is developed by Facebook AI Research.
Hyperparameter Search With GPyOpt: Part 3 – Keras (CNN) Classification and Ensembling
GPyOpt is a Python open-source library for Bayesian Optimization developed by the Machine Learning group of the University of Sheffield. It is based on GPy, a Python framework for Gaussian process modelling.
In this article, we demonstrate how to use this package to perform hyperparameter search for a classification problem with Keras.
Ebook: Deep Learning for Toxicity and Disease Prediction – Gong, Zhang, Chen (eds) 2020
Deep Learning for Toxicity and Disease Prediction is a collection of 12 articles published as a free ebook.
Hyperparameter Search With GPyOpt: Part 2 – XGBoost Classification and Ensembling
GPyOpt is a Python open-source library for Bayesian Optimization developed by the Machine Learning group of the University of Sheffield. It is based on GPy, a Python framework for Gaussian process modelling.
In this article, we demonstrate how to use this package to perform hyperparameter search for a classification problem with XGBoost.
Paper: Seq2seq Fingerprint: An Unsupervised Deep Molecular Embedding for Drug Discovery – Xu et al 2017
In this paper, we propose a novel unsupervised molecular embedding method, providing a continuous feature vector for each molecule to perform further tasks, e.g., solubility classification. In the proposed method, a multi-layered Gated Recurrent Unit (GRU) network is used to map the input molecule into a continuous feature vector of fixed dimensionality, and then another deep GRU network is employed to decode the continuous vector back to the original molecule.
Paper: CheMixNet: Mixed DNN Architectures for Predicting Chemical Properties using Multiple Molecular Representations – Paul et al 2018
In this interesting paper, the authors use a multi input neural network to predict various small molecule properties in which one branch is a multilayer perceptron with MACCS fingerprints and the other is one of a RNN, 1D CNN, 1D CNN-RNN with SMILES.
Paper: A Tutorial on Bayesian Optimization – Frazier 2018
In this tutorial, we describe how Bayesian optimization works, including Gaussian process regression and three common acquisition functions: expected improvement, entropy search, and knowledge gradient. We conclude with a discussion of Bayesian optimization software and future research directions in the field.
Hyperparameter Search With GPyOpt: Part 1 – Scikit-learn Classification and Ensembling
GPyOpt is a Python open-source library for Bayesian Optimization developed by the Machine Learning group of the University of Sheffield. It is based on GPy, a Python framework for Gaussian process modelling.
In this article, we demonstrate how to use this package to do hyperparameter search for a classification problem with Scikit-learn.
Paper: Keep It Simple: Graph Autoencoders Without Graph Convolutional Networks – Salha et al 2019
In this work, the authors present a simpler alternative, “a simple linear model w.r.t. the adjacency matrix of the graph”, to graph convolution encoders for graph autoencoders.
Hyperparameter Search (And Pruning) With Optuna: Part 5 – Keras (CNN) Classification and Ensembling
In addition to using the tree-structured Parzen algorithm via Optuna to find hyperparameters for a CNN with Keras for the the MNIST handwritten digits data set classification problem, we add asynchronous successive halving, a pruning algorithm, to halt training when preliminary results are unpromising.
Paper: A self-attention based message passing neural network for predicting molecular lipophilicity and aqueous solubility – Tang et al 2020
In this paper, we describe a self-attention-based message-passing neural network (SAMPN) model, which is a modification of Deepchem’s MPN [16] and is state-of-the-art in deep learning. It directly learns the most relevant features of each QSAR/QSAPR task in the learning process and assigns the degree of importance for substructures to improve the interpretability of prediction.
Hyperparameter Search (And Pruning) With Optuna: Part 4 – XGBoost Classification and Ensembling
In addition to using the tree-structured Parzen algorithm via Optuna to find hyperparameters for XGBoost for the the MNIST handwritten digits data set classification problem, we add asynchronous successive halving, a pruning algorithm, to halt training when preliminary results are unpromising.
Paper: A Deep Learning Approach to Antibiotic Discovery – Stokes et al 2020
This is a very interesting paper in which the authors use a message passing neural network on a carefully selected data set to predict antibacterial activity against E. coli. Then they apply their model to other data sets, while also prioritizing molecules that are different (via minimum Tanimoto similarity) from existing antibiotics, to find candidates for new antibiotics.
Hyperparameter Search With Optuna: Part 3 – Keras (CNN) Classification and Ensembling
In this article, we use the tree-structured Parzen algorithm via Optuna to find hyperparameters for a convolutional neural network (CNN) with Keras for the the MNIST handwritten digits data set classification problem.
At no cost to you, Machine Learning Applied earns a commission from qualified purchases when you click on the links below.