The k-medoids problem is a clustering problem similar to k-means. The name was coined by Leonard Kaufman and Peter J. Rousseeuw with their PAM (Partitioning Around Medoids) algorithm. Both the k-means and k-medoids algorithms are partitional (breaking the dataset up into groups) and attempt to minimize the distance between points labeled to be in a cluster and a point designated as the center of that cluster. In contrast to the k-means algorithm, k-medoids chooses actual data points as centers (medoids or exemplars), and thereby allows for greater interpretability of the cluster centers than in k-means, where the center of a cluster is not necessarily one of the input data points (it is the average between the points in the cluster). Furthermore, k-medoids can be used with arbitrary dissimilarity measures, whereas k-means generally requires Euclidean distance for efficient solutions. Because k-medoids minimizes a sum of pairwise dissimilarities instead of a sum of squared Euclidean distances, it is more robust to noise and outliers than k-means.
The medoid of a cluster is defined as the object in the cluster whose average dissimilarity to all the objects in the cluster is minimal, that is, it is a most centrally located point in the cluster.
We use the Fast k-medoids clustering in Python package:
This package contains a number of K-Medoids implementations. Of particular use are the more recent algorithms that provide significant reductions in computing time. We employ FasterPAM k-medoids clustering:
This is an accelerated version of PAM clustering, that eagerly performs any swap found, and contains the O(k) improvement to find the best swaps faster.
Erich Schubert, Peter J. RousseeuwFast and Eager k-Medoids Clustering:O(k) Runtime Improvement of the PAM, CLARA, and CLARANS AlgorithmsInformation Systems (101), 2021, 101804https://doi.org/10.1016/j.is.2021.101804 (open access)
Cluster results are computed for stocks only, ETFs only, and stocks and ETFs combined with 50 data points per cluster. They are computed each market day and are available as CSV files in the file Dynamic Time Warping Nearest Neighbors And K-Medoids Clusters YYYY-MM-DD.zip that can be downloaded from our Proton Drive.
If you find such information useful, we ask that you provide value in return via the Value 4 Value links on the homepage. We would also appreciate you spreading the word about our work.