Hongfu Liu

Assistant Professor of Computer Science

Data Mining

Machine Learning

Data analytics in terms of cluster analysis

Outlier detection

Transfer learning

Feature selection and fair machine learning

Conference paper

Characterizing the Influence of Graph Elements

by Zizhang Chen, Hongfu Liu and Pengyu Hong

Date presented 05/03/2023

The International Conference on Learning Representations, 05/01/2023–05/05/2023, Kigali, Rwanda

Influence function, a method from robust statistics, measures the changes of model parameters or some functions about model parameters concerning the removal or modification of training instances. It is an efficient and useful post-hoc method for studying the interpretability of machine learning models without the need for expensive model re-training. Recently, graph convolution networks (GCNs), which operate on graph data, have attracted a great deal of attention. However, there is no preceding research on the influence functions of GCNs to shed light on the effects of removing training nodes/edges from an input graph. Since the nodes/edges in a graph are interdependent in GCNs, it is challenging to derive influence functions for GCNs. To fill this gap, we started with the simple graph convolution (SGC) model that operates on an attributed graph and formulated an influence function to approximate the changes in model parameters when a node or an edge is removed from an attributed graph. Moreover, we theoretically analyzed the error bound of the estimated influence of removing an edge. We experimentally validated the accuracy and effectiveness of our influence estimation function. In addition, we showed that the influence function of an SGC model could be used to estimate the impact of removing training nodes/edges on the test performance of the SGC without re-training the model. Finally, we demonstrated how to use influence functions to guide the adversarial attacks on GCNs effectively.

Conference paper

Fairness-Aware Unsupervised Feature Selection

by Xiaoying Xing, Hongfu Liu, Chen Chen and Jundong Li

Date presented 11/2021

ACM International Conference on Information and Knowledge Management, 11/01/2021–11/05/2021, Online

Feature selection is a prevalent data preprocessing paradigm for various learning tasks. Due to the expensive cost of acquiring supervision information, unsupervised feature selection sparks great interests recently. However, existing unsupervised feature selection algorithms do not have fairness considerations and suffer from a high risk of amplifying discrimination by selecting features that are over associated with protected attributes such as gender, race, and ethnicity. In this paper, we make an initial investigation of the fairness-aware unsupervised feature selection problem and develop a principled framework, which leverages kernel alignment to find a subset of high-quality features that can best preserve the information in the original feature space while being minimally correlated with protected attributes. Specifically, different from the mainstream in-processing debiasing methods, our proposed framework can be regarded as a model-agnostic debiasing strategy that eliminates biases and discrimination before downstream learning algorithms are involved. Experimental results on multiple real-world datasets demonstrate that our framework achieves a good trade-off between utility maximization and fairness promotion.

Conference paper

Towards Novel Target Discovery Through Open-Set Domain Adaptation

by Taotao Jing, Hongfu Liu and Zhengming Ding

Date presented 10/2021

IEEE/CVF International Conference on Computer Vision (ICCV), 10/11/2021–10/17/2021, Online

Open-set domain adaptation (OSDA) considers that the target domain contains samples from novel categories unobserved in external source domain. Unfortunately, existing OSDA methods always ignore the demand for the information of unseen categories and simply recognize them as "unknown" set without further explanation. This motivates us to understand the unknown categories more specifically by exploring the underlying structures and recovering their interpretable semantic attributes. In this paper, we propose a novel framework to accurately identify the seen categories in target domain, and effectively recover the semantic attributes for unseen categories. Specifically, structure preserving partial alignment is developed to recognize the seen categories through domain-invariant feature learning. Attribute propagation over visual graph is designed to smoothly transit attributes from seen to unseen categories via visual-semantic mapping. Moreover, two new cross-main benchmarks are constructed to evaluate the proposed framework in the novel and practical challenge. Experimental results on open-set recognition and semantic recovery demonstrate the superiority of the proposed method over other compared baselines.

Conference paper

Deep Clustering-based Fair Outlier Detection

by Hanyu Song, Peizhao Li and Hongfu Liu

Date presented 08/2021

ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 08/14/2021–08/18/2021, Online

In this paper, we focus on the fairness issues regarding unsupervised outlier detection. Traditional algorithms, without a specific design for algorithmic fairness, could implicitly encode and propagate statistical bias in data and raise societal concerns. To correct such unfairness and deliver a fair set of potential outlier candidates, we propose Deep Clustering based Fair Outlier Detection (DCFOD) that learns a good representation for utility maximization while enforcing the learnable representation to be subgroup-invariant on the sensitive attribute. Considering the coupled and reciprocal nature between clustering and outlier detection, we leverage deep clustering to discover the intrinsic cluster structure and out-of-structure instances. Meanwhile, an adversarial training erases the sensitive pattern for instances for fairness adaptation. Technically, we propose an instance-level weighted representation learning strategy to enhance the joint deep clustering and outlier detection, where the dynamic weight module re-emphasizes contributions of likely-inliers while mitigating the negative impact from outliers. Demonstrated by experiments on eight datasets comparing to 17 outlier detection algorithms, our DCFOD method consistently achieves superior performance on both the outlier detection validity and two types of fairness notions in outlier detection.

Conference paper

SelfDoc: Self-Supervised Document Represen-tation Learning

by Hongfu Liu, Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad Morariu, Handong Zhao, Rajiv Jain and Varun Manjunatha

Date presented 06/2021

IEEE Conference on Computer Vision and Pattern Recognition, 06/20/2021–06/25/2021, virtual / Nashville, TN

Conference paper

On Dyadic Fairness: Exploring and Mitigating Bias in Graph Connections

by Hongfu Liu, Peizhao Li, Yifei Wang and Pengyu Hong

Date presented 05/2021

International Conference on Learning Representations, 05/03/2021–05/07/2021, virtual

Conference paper

Tweet Sentiment Analysis of the 2020 U.S. Presidential Election

by Hongfu Liu, Han Yue and Ethan Xia

WWW '21: The Web Conference 2021, 04/19/2021–04/23/2021, Ljubljana, Slovenia

In this paper, we conducted a tweet sentiment analysis of the 2020 U.S. Presidential Election between Donald Trump and Joe Biden. Specially, we identied the Multi-Layer Perceptron classier as the methodology with the best performance on the Sanders Twitter benchmark dataset. We collected a sample of over 260,000 tweets related to the 2020 U.S. Presidential Election from the Twitter website via Twitter API, processed feature extraction, and applied Multi- Layer Perceptron to classify these tweets with a positive or negative sentiment. From the results, we concluded that (1) contrary to popular poll results, the candidates had a very close negative to positive sentiment ratio, (2) negative sentiment is more common and prominent than positive sentiment within the social media domain, (3) some key events can be detected by the trends of sentiment on
social media, and (4) sentiment analysis can be used as a low-cost and easy alternative to gather political opinion.

Hongfu Liu

Assistant Professor of Computer Science

Scholarship list

Brandeis University Social media