Introduction
In this article, we cluster stock price time series with hierarchical clustering and Euclidean, correlation, and Jensen-Shannon distances to answer two questions regarding portfolio diversification. How diversified is a given portfolio? How can a diversified portfolio be constructed?
Procedure
For the Euclidean distance, we follow the first 3 steps articulated in Portfolio Diversification Via K-means.
Additionally, we use two other distances: correlation (1 – correlation coefficient) and Jensen-Shannon (a modification of the Kullback-Liebler divergence). For these distances, we use the raw adjusted close price. Please see Portfolio Diversification Via K-means for all other parameters.
Kmeans is technically a data space partition algorithm, while hierarchical clustering is a true cluster algorithm. Specifically, we use agglomerative clustering, which is a bottom up approach to determine when to create new clusters or add data to existing clusters. Hierarchical cluster algorithms are distinguished by the type of linkage employed, which guides the construction of clusters. For the Euclidean distance we use the following linkages:
1. single
2. complete
3. average
4. ward
5. centroid
6. median
Note that the latter 3 are only valid with the Euclidean distance, so only the first 3 were used for the correlation and Jensen-Shannon distances.
We used Scipy for all cluster computations.
Results
Cluster number assignments can be downloaded here: Cluster Number Assignments Hierarchical.
Number of Points in Cluster - Correlation Distance
Single Linkage | Complete Linkage | Average Linkage |
---|---|---|
4423 | 702 | 2015 |
1 | 701 | 1187 |
1 | 461 | 400 |
1 | 431 | 319 |
1 | 423 | 307 |
1 | 368 | 147 |
1 | 237 | 19 |
1 | 219 | 15 |
1 | 203 | 8 |
1 | 177 | 8 |
1 | 133 | 7 |
1 | 130 | 2 |
1 | 99 | 1 |
1 | 94 | 1 |
1 | 59 | 1 |
Number of Points in Cluster - Euclidean Distance
Single Linkage | Complete Linkage | Average Linkage | Ward Linkage | Centroid Linkage | Median Linkage |
---|---|---|---|---|---|
4423 | 826 | 1611 | 628 | 2019 | 4410 |
1 | 766 | 925 | 627 | 1330 | 10 |
1 | 548 | 572 | 444 | 496 | 3 |
1 | 492 | 412 | 435 | 315 | 2 |
1 | 445 | 347 | 407 | 255 | 2 |
1 | 233 | 181 | 337 | 10 | 1 |
1 | 205 | 124 | 281 | 3 | 1 |
1 | 192 | 82 | 248 | 2 | 1 |
1 | 142 | 60 | 206 | 1 | 1 |
1 | 134 | 55 | 169 | 1 | 1 |
1 | 134 | 28 | 155 | 1 | 1 |
1 | 109 | 19 | 140 | 1 | 1 |
1 | 83 | 17 | 139 | 1 | 1 |
1 | 66 | 2 | 131 | 1 | 1 |
1 | 62 | 2 | 90 | 1 | 1 |
Number of Points in Cluster - Jensen-Shannon Distance
Single Linkage | Complete Linkage | Average Linkage |
---|---|---|
4423 | 2236 | 4369 |
1 | 1796 | 23 |
1 | 107 | 14 |
1 | 103 | 10 |
1 | 79 | 5 |
1 | 64 | 3 |
1 | 21 | 3 |
1 | 9 | 2 |
1 | 7 | 2 |
1 | 6 | 1 |
1 | 4 | 1 |
1 | 2 | 1 |
1 | 1 | 1 |
1 | 1 | 1 |
1 | 1 | 1 |
As can be seen, the linkage method has a decisive impact on cluster quality. Unfortunately, there is no a priori method of determining a linkage method. One must try as many as feasible, and choose the one that suits a particular purpose. Here, we would choose the method that yields the most evenly populated clusters.
For additional remarks about how to interpret cluster results for portfolio diversification, please see Portfolio Diversification Via K-means.