Scholarship list
Conference paper
Characterizing the Influence of Graph Elements
Date presented 05/03/2023
The International Conference on Learning Representations, 05/01/2023–05/05/2023, Kigali, Rwanda
Influence function, a method from robust statistics, measures the changes of model parameters or some functions about model parameters concerning the removal or modification of training instances. It is an efficient and useful post-hoc method for studying the interpretability of machine learning models without the need for expensive model re-training. Recently, graph convolution networks (GCNs), which operate on graph data, have attracted a great deal of attention. However, there is no preceding research on the influence functions of GCNs to shed light on the effects of removing training nodes/edges from an input graph. Since the nodes/edges in a graph are interdependent in GCNs, it is challenging to derive influence functions for GCNs. To fill this gap, we started with the simple graph convolution (SGC) model that operates on an attributed graph and formulated an influence function to approximate the changes in model parameters when a node or an edge is removed from an attributed graph. Moreover, we theoretically analyzed the error bound of the estimated influence of removing an edge. We experimentally validated the accuracy and effectiveness of our influence estimation function. In addition, we showed that the influence function of an SGC model could be used to estimate the impact of removing training nodes/edges on the test performance of the SGC without re-training the model. Finally, we demonstrated how to use influence functions to guide the adversarial attacks on GCNs effectively.
Conference paper
Fairness-Aware Unsupervised Feature Selection
Date presented 11/2021
ACM International Conference on Information and Knowledge Management, 11/01/2021–11/05/2021, Online
Feature selection is a prevalent data preprocessing paradigm for various learning tasks. Due to the expensive cost of acquiring supervision information, unsupervised feature selection sparks great interests recently. However, existing unsupervised feature selection algorithms do not have fairness considerations and suffer from a high risk of amplifying discrimination by selecting features that are over associated with protected attributes such as gender, race, and ethnicity. In this paper, we make an initial investigation of the fairness-aware unsupervised feature selection problem and develop a principled framework, which leverages kernel alignment to find a subset of high-quality features that can best preserve the information in the original feature space while being minimally correlated with protected attributes. Specifically, different from the mainstream in-processing debiasing methods, our proposed framework can be regarded as a model-agnostic debiasing strategy that eliminates biases and discrimination before downstream learning algorithms are involved. Experimental results on multiple real-world datasets demonstrate that our framework achieves a good trade-off between utility maximization and fairness promotion.
Conference paper
Towards Novel Target Discovery Through Open-Set Domain Adaptation
Date presented 10/2021
IEEE/CVF International Conference on Computer Vision (ICCV), 10/11/2021–10/17/2021, Online
Open-set domain adaptation (OSDA) considers that the target domain contains samples from novel categories unobserved in external source domain. Unfortunately, existing OSDA methods always ignore the demand for the information of unseen categories and simply recognize them as "unknown" set without further explanation. This motivates us to understand the unknown categories more specifically by exploring the underlying structures and recovering their interpretable semantic attributes. In this paper, we propose a novel framework to accurately identify the seen categories in target domain, and effectively recover the semantic attributes for unseen categories. Specifically, structure preserving partial alignment is developed to recognize the seen categories through domain-invariant feature learning. Attribute propagation over visual graph is designed to smoothly transit attributes from seen to unseen categories via visual-semantic mapping. Moreover, two new cross-main benchmarks are constructed to evaluate the proposed framework in the novel and practical challenge. Experimental results on open-set recognition and semantic recovery demonstrate the superiority of the proposed method over other compared baselines.
Conference paper
Deep Clustering-based Fair Outlier Detection
Date presented 08/2021
ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 08/14/2021–08/18/2021, Online
In this paper, we focus on the fairness issues regarding unsupervised outlier detection. Traditional algorithms, without a specific design for algorithmic fairness, could implicitly encode and propagate statistical bias in data and raise societal concerns. To correct such unfairness and deliver a fair set of potential outlier candidates, we propose Deep Clustering based Fair Outlier Detection (DCFOD) that learns a good representation for utility maximization while enforcing the learnable representation to be subgroup-invariant on the sensitive attribute. Considering the coupled and reciprocal nature between clustering and outlier detection, we leverage deep clustering to discover the intrinsic cluster structure and out-of-structure instances. Meanwhile, an adversarial training erases the sensitive pattern for instances for fairness adaptation. Technically, we propose an instance-level weighted representation learning strategy to enhance the joint deep clustering and outlier detection, where the dynamic weight module re-emphasizes contributions of likely-inliers while mitigating the negative impact from outliers. Demonstrated by experiments on eight datasets comparing to 17 outlier detection algorithms, our DCFOD method consistently achieves superior performance on both the outlier detection validity and two types of fairness notions in outlier detection.
Conference paper
SelfDoc: Self-Supervised Document Represen-tation Learning
Date presented 06/2021
IEEE Conference on Computer Vision and Pattern Recognition, 06/20/2021–06/25/2021, virtual / Nashville, TN
Conference paper
On Dyadic Fairness: Exploring and Mitigating Bias in Graph Connections
Date presented 05/2021
International Conference on Learning Representations, 05/03/2021–05/07/2021, virtual
Conference paper
Tweet Sentiment Analysis of the 2020 U.S. Presidential Election
WWW '21: The Web Conference 2021, 04/19/2021–04/23/2021, Ljubljana, Slovenia
In this paper, we conducted a tweet sentiment analysis of the 2020 U.S. Presidential Election between Donald Trump and Joe Biden. Specially, we identied the Multi-Layer Perceptron classier as the methodology with the best performance on the Sanders Twitter benchmark dataset. We collected a sample of over 260,000 tweets related to the 2020 U.S. Presidential Election from the Twitter website via Twitter API, processed feature extraction, and applied Multi- Layer Perceptron to classify these tweets with a positive or negative sentiment. From the results, we concluded that (1) contrary to popular poll results, the candidates had a very close negative to positive sentiment ratio, (2) negative sentiment is more common and prominent than positive sentiment within the social media domain, (3) some key events can be detected by the trends of sentiment on
social media, and (4) sentiment analysis can be used as a low-cost and easy alternative to gather political opinion.