A Nearly Tight Analysis of Greedy k-means++

A Google TechTalk, presented by Václav Rozhoň, 2023-04-13 Abstract: The famous k-means algorithm of Arthur and Vassilvitskii is the most popular practical algorithm for solving the k-means problem. The algorithm is very simple and computes the k output centers as follows: it samples the first center as a uniformly random point in the dataset and each of the following k−1 centers is then always sampled with probability proportional to the squared distance to the currently closest center. Amazingly, the k-means algorithm is known to return a Θ(log k) approximate solution in expectation. In their seminal work, Arthur and Vassilvitskii asked about the guarantees of its following greedy variant: in every step, we sample ℓ candidate centers instead of one and then pick the one that minimizes the new cost. This is also how k-means is implemented in e.g. the popular Scikit-learn library. We analyze greedy k-means : We prove that it is an O(ℓ^3 * log^3 k)-approximation algorithm and provide a near-matching lower bound. Joint work with Christoph Grunau, Ahmet Alper Özüdoğru, Jakub Tětek arxiv: Bio: Vaclav Rozhon is a PhD student at ETH Zurich advised by Mohsen Ghaffari. He works mostly on distributed and parallel algorithms; he also creates YouTube videos about algorithms (channel name: polylog). He has a young child and thus no hobbies. A Google Talk Series on Algorithms, Theory, and Optimization

0 views