Dynamic Time Warping (DTW)
We apply Dynamic Time Warping (DTW) to transformed time series of prices, as explained below, to find nearest neighbors for current stocks and ETFs.
DTW can be defined as:
In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a one-dimensional sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications.
In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restriction and rules:
- Every index from the first sequence must be matched with one or more indices from the other sequence, and vice versa
- The first index from the first sequence must be matched with the first index from the other sequence (but it does not have to be its only match)
- The last index from the first sequence must be matched with the last index from the other sequence (but it does not have to be its only match)
- The mapping of the indices from the first sequence to indices from the other sequence must be monotonically increasing, and vice versa, i.e. if j > i are indices from the first sequence, then there must not be two indices l > k in the other sequence, such that index i s matched with index l and index j is matched with index k, and vice versa
We can plot each match between the sequences 1 : M and 1 : N as a path in a M x N matrix from (1,1) to (M,N), such that each step is one of (0,1), (1,0), (1,1). In this formulation, we see that the number of possible matches is the Delannoy number.
The optimal match is denoted by the match that satisfies all the restrictions and the rules and that has the minimal cost, where the cost is computed as the sum of absolute differences, for each matched pair of indices, between their values.
The sequences are “warped” non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in time series classification. Although DTW measures a distance-like quantity between two given sequences, it doesn’t guarantee the triangle inequality to hold.
From An introduction to Dynamic Time Warping by Romain Tavenard:
Comparison between DTW and Euclidean distance. Note that, for the sake of visualization, time series are shifted vertically, but one should imagine that feature value ranges (y-axis values) match.
Here, we are computing similarity between two time series using either Euclidean distance (left) or Dynamic Time Warping (DTW, right) … In both cases, the returned similarity is the sum of distances between matched features. Note how DTW matches distinctive patterns of the time series, which is likely to result in a more sound similarity assessment than when using Euclidean distance that matches timestamps regardless of the feature values.
We use the Python package: Time series distances: Dynamic Time Warping (fast DTW implementation in C) created by the DTAI group at KU Leuven.
Input Data
For each of the number of days 21, 63, 126, 189, 252, we compute:
- zscore: (price – mean)/standard deviation
- fractional return series: (current price – previous price)/previous price
then apply the piecewise aggregate approximation (PAA). The PAA algorithm takes a time series and slices it into P contiguous, non-overlapping pieces, each of length L. Then each piece is averaged, thus reducing dimensionality and smoothing the data.
Formation Days | Number of Pieces | Length of Each Piece |
---|---|---|
21 | 7 | 3 |
63 | 9 | 7 |
126 | 9 | 14 |
189 | 9 | 21 |
252 | 12 | 21 |
The number of pieces were chosen to keep the dimensions low enough to avoid well known issues regarding interpreting distances in high dimensions.
Results
50 nearest neighbors are computed for stocks only, ETFs only, and stocks and ETFs combined. They are computed each market day and are available as CSV files in the file Dynamic Time Warping Nearest Neighbors YYYY-MM-DD.zip that can be downloaded from our Proton Drive,
Below is a sample.
Name | Close | Volume | Symbol 1 | Distance 1 | Symbol 2 | Distance 2 | Symbol 3 | Distance 3 | Symbol 4 | Distance 4 | Symbol 5 | Distance 5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | Agilent Technologies | 124.55 | 2262387 | DEO | 0.4709 | HIBB | 0.4739 | BUD | 0.4836 | MNTS | 0.5014 | NWN | 0.5386 |
AA | Alcoa | 32.68 | 11864982 | SPH | 0.6819 | SBET | 0.6992 | WRN | 0.7226 | BZUN | 0.7374 | CINF | 0.7567 |
AAC | Ares Acquisition Corp. | 10.56 | 173961 | LMB | 0.3217 | USFD | 0.3352 | ESQ | 0.3448 | WEAV | 0.349 | BTWN | 0.3774 |
AACG | ATA | 1.4 | 3006 | HAIN | 0.3733 | TEF | 0.4448 | PRQR | 0.4633 | UTZ | 0.4671 | BUD | 0.5034 |
AADI | Aadi Bioscience | 5.86 | 87703 | LGND | 0.6201 | MHLD | 0.6236 | REVB | 0.8043 | KRMD | 0.8145 | TUYA | 0.8222 |
Value 4 Value
DTW results are available for free with no restrictions. If you find such information useful, we ask that you provide value in return via the Value 4 Value links on the homepage. We would also appreciate you spreading the word about our work.