Dynamic Time Warping Nearest Neighbors

Dynamic Time Warping (DTW)

We apply Dynamic Time Warping (DTW) to transformed time series of prices, as explained below, to find nearest neighbors for current stocks and ETFs.

DTW can be defined as:

In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a one-dimensional sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications.

In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restriction and rules:

Every index from the first sequence must be matched with one or more indices from the other sequence, and vice versa

The first index from the first sequence must be matched with the first index from the other sequence (but it does not have to be its only match)

The last index from the first sequence must be matched with the last index from the other sequence (but it does not have to be its only match)

The mapping of the indices from the first sequence to indices from the other sequence must be monotonically increasing, and vice versa, i.e. if j > i are indices from the first sequence, then there must not be two indices l > k in the other sequence, such that index i s matched with index l and index j is matched with index k, and vice versa

We can plot each match between the sequences 1 : M and 1 : N as a path in a M x N matrix from (1,1) to (M,N), such that each step is one of (0,1), (1,0), (1,1). In this formulation, we see that the number of possible matches is the Delannoy number.

The optimal match is denoted by the match that satisfies all the restrictions and the rules and that has the minimal cost, where the cost is computed as the sum of absolute differences, for each matched pair of indices, between their values.

The sequences are “warped” non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in time series classification. Although DTW measures a distance-like quantity between two given sequences, it doesn’t guarantee the triangle inequality to hold.

From An introduction to Dynamic Time Warping by Romain Tavenard:

Comparison between DTW and Euclidean distance. Note that, for the sake of visualization, time series are shifted vertically, but one should imagine that feature value ranges (y-axis values) match.

Here, we are computing similarity between two time series using either Euclidean distance (left) or Dynamic Time Warping (DTW, right) … In both cases, the returned similarity is the sum of distances between matched features. Note how DTW matches distinctive patterns of the time series, which is likely to result in a more sound similarity assessment than when using Euclidean distance that matches timestamps regardless of the feature values.

We use the Python package: Time series distances: Dynamic Time Warping (fast DTW implementation in C) created by the DTAI group at KU Leuven.

Input Data

For each of the number of days 21, 63, 126, 189, 252, we compute:

zscore: (price – mean)/standard deviation
fractional return series: (current price – previous price)/previous price

then apply the piecewise aggregate approximation (PAA). The PAA algorithm takes a time series and slices it into P contiguous, non-overlapping pieces, each of length L. Then each piece is averaged, thus reducing dimensionality and smoothing the data.

Formation Days	Number of Pieces	Length of Each Piece
21	7	3
63	9	7
126	9	14
189	9	21
252	12	21

The number of pieces were chosen to keep the dimensions low enough to avoid well known issues regarding interpreting distances in high dimensions.

Results

50 nearest neighbors are computed for stocks only, ETFs only, and stocks and ETFs combined. They are computed each market day and are available as CSV files in the file Dynamic Time Warping Nearest Neighbors YYYY-MM-DD.zip that can be downloaded from our Proton Drive,

Below is a sample.

	Name	Close	Volume	Symbol 1	Distance 1	Symbol 2	Distance 2	Symbol 3	Distance 3	Symbol 4	Distance 4	Symbol 5	Distance 5
A	Agilent Technologies	124.55	2262387	DEO	0.4709	HIBB	0.4739	BUD	0.4836	MNTS	0.5014	NWN	0.5386
AA	Alcoa	32.68	11864982	SPH	0.6819	SBET	0.6992	WRN	0.7226	BZUN	0.7374	CINF	0.7567
AAC	Ares Acquisition Corp.	10.56	173961	LMB	0.3217	USFD	0.3352	ESQ	0.3448	WEAV	0.349	BTWN	0.3774
AACG	ATA	1.4	3006	HAIN	0.3733	TEF	0.4448	PRQR	0.4633	UTZ	0.4671	BUD	0.5034
AADI	Aadi Bioscience	5.86	87703	LGND	0.6201	MHLD	0.6236	REVB	0.8043	KRMD	0.8145	TUYA	0.8222

Value 4 Value

DTW results are available for free with no restrictions. If you find such information useful, we ask that you provide value in return via the Value 4 Value links on the homepage. We would also appreciate you spreading the word about our work.

Dynamic Time Warping (DTW)

Input Data

Results

Value 4 Value

Affiliate Disclaimer