Abstract
This paper studies the large-scale subspace clustering (LSSC) problem with
million data points. Many popular subspace clustering methods cannot directly
handle the LSSC problem although they have been considered as state-of-the-art
methods for small-scale data points. A basic reason is that these methods often
choose all data points as a big dictionary to build huge coding models, which
results in a high time and space complexity. In this paper, we develop a
learnable subspace clustering paradigm to efficiently solve the LSSC problem.
The key idea is to learn a parametric function to partition the
high-dimensional subspaces into their underlying low-dimensional subspaces
instead of the expensive costs of the classical coding models. Moreover, we
propose a unified robust predictive coding machine (RPCM) to learn the
parametric function, which can be solved by an alternating minimization
algorithm. In addition, we provide a bounded contraction analysis of the
parametric function. To the best of our knowledge, this paper is the first work
to efficiently cluster millions of data points among the subspace clustering
methods. Experiments on million-scale datasets verify that our paradigm
outperforms the related state-of-the-art methods in both efficiency and
effectiveness.