Abstract
Active learning strategically selects informative unlabeled data points and
queries their ground truth labels for model training. The prevailing assumption
underlying this machine learning paradigm is that acquiring these ground truth
labels will optimally enhance model performance. However, this assumption may
not always hold true or maximize learning capacity, particularly considering
the costly labor annotations required for ground truth labels. In contrast to
traditional ground truth labeling, this paper proposes salutary labeling, which
automatically assigns the most beneficial labels to the most informative
samples without human annotation. Specifically, we utilize the influence
function, a tool for estimating sample influence, to select newly added samples
and assign their salutary labels by choosing the category that maximizes their
positive influence. This process eliminates the need for human annotation.
Extensive experiments conducted on nine benchmark datasets demonstrate the
superior performance of our salutary labeling approach over traditional active
learning strategies. Additionally, we provide several in-depth explorations and
practical applications of large language model (LLM) fine-tuning.