Select Page

Dynamic Time Warping (DTW)

We apply Dynamic Time Warping (DTW) to transformed time series of prices, as explained below, to find nearest neighbors for current stocks and ETFs.

DTW can be defined as:

In time series analysisdynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a one-dimensional sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications.

In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restriction and rules:

• Every index from the first sequence must be matched with one or more indices from the other sequence, and vice versa
• The first index from the first sequence must be matched with the first index from the other sequence (but it does not have to be its only match)
• The last index from the first sequence must be matched with the last index from the other sequence (but it does not have to be its only match)
• The mapping of the indices from the first sequence to indices from the other sequence must be monotonically increasing, and vice versa, i.e. if j > i are indices from the first sequence, then there must not be two indices l > k in the other sequence, such that index i s matched with index l and index j is matched with index k, and vice versa

We can plot each match between the sequences 1 : M and 1 : N as a path in a M x N matrix from (1,1) to (M,N), such that each step is one of (0,1), (1,0), (1,1). In this formulation, we see that the number of possible matches is the Delannoy number.

The optimal match is denoted by the match that satisfies all the restrictions and the rules and that has the minimal cost, where the cost is computed as the sum of absolute differences, for each matched pair of indices, between their values.

The sequences are “warped” non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in time series classification. Although DTW measures a distance-like quantity between two given sequences, it doesn’t guarantee the triangle inequality to hold.

From An introduction to Dynamic Time Warping by Romain Tavenard:

Comparison between DTW and Euclidean distance. Note that, for the sake of visualization, time series are shifted vertically, but one should imagine that feature value ranges (y-axis values) match.

Here, we are computing similarity between two time series using either Euclidean distance (left) or Dynamic Time Warping (DTW, right) … In both cases, the returned similarity is the sum of distances between matched features. Note how DTW matches distinctive patterns of the time series, which is likely to result in a more sound similarity assessment than when using Euclidean distance that matches timestamps regardless of the feature values.

We use the Python package: Time series distances: Dynamic Time Warping (fast DTW implementation in C) created by the DTAI group at KU Leuven.

Input Data

For each of the number of days 21, 63, 126, 189, 252, we compute:

• zscore: (price – mean)/standard deviation
• fractional return series: (current price – previous price)/previous price

then apply the piecewise aggregate approximation (PAA). The PAA algorithm takes a time series and slices it into P contiguous, non-overlapping pieces, each of length L. Then each piece is averaged, thus reducing dimensionality and smoothing the data.

Formation DaysNumber of PiecesLength of Each Piece
2173
6397
126914
189921
2521221

The number of pieces were chosen to keep the dimensions low enough to avoid well known issues regarding interpreting distances in high dimensions.

Results

50 nearest neighbors are computed for stocks only, ETFs only, and stocks and ETFs combined. They are computed each market day and are available as CSV files in the file Dynamic Time Warping Nearest Neighbors YYYY-MM-DD.zip that can be downloaded from our Proton Drive,

Below is a sample.

NameCloseVolumeSymbol 1Distance 1Symbol 2Distance 2Symbol 3Distance 3Symbol 4Distance 4Symbol 5Distance 5
AAgilent Technologies124.552262387DEO0.4709HIBB0.4739BUD0.4836MNTS0.5014NWN0.5386
AAAlcoa32.6811864982SPH0.6819SBET0.6992WRN0.7226BZUN0.7374CINF0.7567
AACAres Acquisition Corp.10.56173961LMB0.3217USFD0.3352ESQ0.3448WEAV0.349BTWN0.3774
AACGATA1.43006HAIN0.3733TEF0.4448PRQR0.4633UTZ0.4671BUD0.5034