Introduction

In this article, we cluster stock price time series with hierarchical clustering and Euclidean, correlation, and Jensen-Shannon distances to answer two questions regarding portfolio diversification. How diversified is a given portfolio? How can a diversified portfolio be constructed?

Procedure

For the Euclidean distance, we follow the first 3 steps articulated in Portfolio Diversification Via K-means.
Additionally, we use two other distances: correlation (1 – correlation coefficient) and Jensen-Shannon (a modification of the Kullback-Liebler divergence). For these distances, we use the raw adjusted close price. Please see Portfolio Diversification Via K-means for all other parameters.

Kmeans is technically a data space partition algorithm, while hierarchical clustering is a true cluster algorithm. Specifically, we use agglomerative clustering, which is a bottom up approach to determine when to create new clusters or add data to existing clusters. Hierarchical cluster algorithms are distinguished by the type of linkage employed, which guides the construction of clusters. For the Euclidean distance we use the following linkages:
1. single
2. complete
3. average
4. ward
5. centroid
6. median
Note that the latter 3 are only valid with the Euclidean distance, so only the first 3 were used for the correlation and Jensen-Shannon distances.

We used Scipy for all cluster computations.

Results

Cluster number assignments can be downloaded here: Cluster Number Assignments Hierarchical.
 

Number of Points in Cluster - Correlation Distance

Single LinkageComplete LinkageAverage Linkage
44237022015
17011187
1461400
1431319
1423307
1368147
123719
121915
12038
11778
11337
11302
1991
1941
1591

 

Number of Points in Cluster - Euclidean Distance

Single LinkageComplete LinkageAverage LinkageWard LinkageCentroid LinkageMedian Linkage
4423826161162820194410
1766925627133010
15485724444963
14924124353152
14453474072552
1233181337101
120512428131
11928224821
11426020611
11345516911
11342815511
11091914011
1831713911
166213111
16229011

 

Number of Points in Cluster - Jensen-Shannon Distance

Single LinkageComplete LinkageAverage Linkage
442322364369
1179623
110714
110310
1795
1643
1213
192
172
161
141
121
111
111
111

 
As can be seen, the linkage method has a decisive impact on cluster quality. Unfortunately, there is no a priori method of determining a linkage method. One must try as many as feasible, and choose the one that suits a particular purpose. Here, we would choose the method that yields the most evenly populated clusters.
 
For additional remarks about how to interpret cluster results for portfolio diversification, please see Portfolio Diversification Via K-means.