Quantitative And Machine Learning Asset Analysis

1. Introduction

We present the following:

  • Straightforward quantitative analysis.
  • Machine learning projections of returns and prices.

For stocks and ETFs from US exchanges subject to the following requirements: minimum price of $0.01 and minimum volume of 100 for the formation periods. Closed end funds are not included.

2. Calendar Returns

Calendar returns is an analysis of close-to-close returns for specific calendar periods: year, quarter, month, and week. Results are computed for: Minimum, 25th Percentile, Median, 75th Percentile, and Maximum. They are available as CSV files and for each asset there is a HTML file that contains all results as interactive bar charts. Data coverage includes stocks and ETFs for which there is at least 5 full years of data. The earliest data point is 19991231.

3. Quantitative Analysis

All files that contain the word ‘Array’ are composed of the following:

  • Single Moving Average (SMA) – (current price – N day average)/N day average, where N = 21, 42, 63, …, 231, 252, formed into an array.
  • Dual Moving Average (DMA) – Same as SMA with 21 day average substituted for the current price.
  • Bollinger Band (BB) – Same as SMA with the denominator replaced with the N day standard deviation.
  • Breakout Shifted (BOS) – (current price – N day low)/(N day high – N day low) where N is the same as used for the SMA input. Note that all prices are close prices.
  • Percent Return – Percent return values for the N days described above.
  • Standard Deviation
  • Slopes
  • Dynamic Time Warping Nearest Neighbors
  • K-Medoids Clusters With Dynamic Time Warping Distances

For all files that contain the words ‘Word Letters’, the arrays are converted into letters, such that a value > 0 = ‘A’, value <= 0 = ‘B’, and the arrays of letters are combined into words. In the file Percent Returns.csv, we include values for year-to-date, quarter-to-date, and month-to-date in addition to those of the N day percent returns.

3A. Cluster Analysis

Cluster analyses are performed by using the arrays noted above as inputs to the K-Means algorithm. The number of clusters is computed as int(number of data points/100). Results are aggregated into the file ‘K-Means Clusters.csv’. Cluster centroids are presented as matrices in CSV files as well as interactive HTML files. We refer you to our article Portfolio Diversification Via K-means for information about how to interpret cluster analysis results.

4. Machine Learning Projections

We use machine learning algorithms to match current input data with that of the recent past. As we know the future prices for past prices, we use these matched future prices to form percent returns and zscore (standard deviation units) returns. These returns are then divided into percentiles, thus yielding statistical ranges of possible returns for current assets. Price projections are derived from percent return projections. Additionally, we present % up and % down, median gain and loss, as well as expectancy. We define expectancy as (up fraction*median gain) – (down fraction*abs(median loss)). Thus we replace vague notions such as outperform, hold, bull/bear indicators, etc. with statistically defined projections and precisely defined time periods. Also, we show how past projections actually performed, thus establishing a statistical track record. We project returns for 21 and 63 days forward. For the formation days used in the calibration data, we have 60 and 120 days for the 21 and 63 day projections respectively. We use the unsupervised k nearest neighbors (KNN) algorithm, with Euclidean distance, for SMA, DMA, and BB inputs with appropriate scaling. Outliers are defined as distances more than two standard deviations from the mean distance and are noted in the results. The supervised random forest (RF) algorithm is used for the following combinations of inputs: SMA-DMA-BOS and BB-BOS. Note that we do not employ RF as a regression method to predict a return value, rather we convert returns into discrete classes and use RF as a classification algorithm. Doing so, allows us to balance the calibration data and to explicitly incorporate uncertainty. Rather than predicting a class, we use predicted class probabilities which are then used to obtain historic numerical returns proportionately.

4A. Assessment Of Machine Learning Projections

We tested the quality of the machine learning projections for the following dates: ‘2018-06-04’, ‘2018-10-09’, ‘2019-02-19’, ‘2019-06-26’, ‘2019-10-31’, ‘2020-03-11’, ‘2020-07-17’, ‘2020-11-20’, ‘2021-04-01’, ‘2021-08-09’, ‘2021-12-14’, ‘2022-04-22’, ‘2022-08-30’, and ‘2023-01-06’. The assessment used projected return percentiles as thresholds to determine what percent of projections met the threshold criterion. The assessment csv file is in the zip file:

5. Data Availability, Data Use, And Value 4 Value

NOTE: Due to lack of interest, we have suspended updating the analyses noted below. If you want updated data, contact us by email: contact AT gcbcventures DOT co.

Quantitative and projection data are updated each market day. Calendar return data is updated at the end of each noted period. Data is available for download from our Proton Drive. New analytics will be announced on the homepage with a link to a description. Snapshots of analytics can be found at our Substack page: Quantitative And Machine Learning Asset Analysis. All data presented here is available for free with no restrictions. If you find such information useful, we ask that you provide value in return via the Value 4 Value links on the homepage. We would also appreciate you spreading the word about our work.