Abstract
Fairness is an essential factor for machine learning systems deployed in
high-stake applications. Among all fairness notions, individual fairness,
following a consensus that `similar individuals should be treated similarly,'
is a vital notion to guarantee fair treatment for individual cases. Previous
methods typically characterize individual fairness as a prediction-invariant
problem when perturbing sensitive attributes, and solve it by adopting the
Distributionally Robust Optimization (DRO) paradigm. However, adversarial
perturbations along a direction covering sensitive information do not consider
the inherent feature correlations or innate data constraints, and thus mislead
the model to optimize at off-manifold and unrealistic samples. In light of
this, we propose a method to learn and generate antidote data that
approximately follows the data distribution to remedy individual unfairness.
These on-manifold antidote data can be used through a generic optimization
procedure with original training data, resulting in a pure pre-processing
approach to individual unfairness, or can also fit well with the in-processing
DRO paradigm. Through extensive experiments, we demonstrate our antidote data
resists individual unfairness at a minimal or zero cost to the model's
predictive utility.