Abstract
In high-dimensional linear models, the sparsity assumption is typically made,
stating that most of the parameters are equal to zero. Under the sparsity
assumption, estimation and, recently, inference have been well studied.
However, in practice, sparsity assumption is not checkable and more importantly
is often violated; a large number of covariates might be expected to be
associated with the response, indicating that possibly all, rather than just a
few, parameters are non-zero. A natural example is a genome-wide gene
expression profiling, where all genes are believed to affect a common disease
marker. We show that existing inferential methods are sensitive to the sparsity
assumption, and may, in turn, result in the severe lack of control of Type-I
error. In this article, we propose a new inferential method, named CorrT, which
is robust to model misspecification such as heteroscedasticity and lack of
sparsity. CorrT is shown to have Type I error approaching the nominal level for
\textit{any} models and Type II error approaching zero for sparse and many
dense models.
In fact, CorrT is also shown to be optimal in a variety of frameworks:
sparse, non-sparse and hybrid models where sparse and dense signals are mixed.
Numerical experiments show a favorable performance of the CorrT test compared
to the state-of-the-art methods.