Hongfu Liu

Assistant Professor of Computer Science

Data Mining

Machine Learning

Data analytics in terms of cluster analysis

Outlier detection

Transfer learning

Feature selection and fair machine learning

Conference proceeding

Data-Centric Physics-Informed Graph Neural Networks for Ultra-Fast Power Flow Analysis

by Han Yue, Wentao Zhang, Yuzhang Lin and Hongfu Liu

Published 07/27/2025

IEEE Power & Energy Society General Meeting, 1 - 5

As the planning and operation of modern power systems typically require repetitive power flow analysis on numerous scenarios, deep learning (DL) applications to power flow analysis have gained momentum. However, DL models require a large volume of training data to achieve satisfactory performance. In power systems, it is very challenging to collect enough training samples from the real world, so numerical simulations must be performed to generate enough training samples. While much effort has been devoted to the improvement of DL model design, little attention has been paid to the curation of training data. To tackle this challenge and save data curation cost, we propose a data-centric approach to effectively select a small amount of beneficial samples to boost DL performance. It utilizes and improves the powerful tool of influence function to estimate the influence of training data samples on DL performance with high computational efficiency, then uses this information to guide the generation of beneficial data samples. The proposed data-centric learning approach is materialized on a physics-informed graph neural network (GNN) model for power flow analysis. Simulation results on the IEEE 300-bus test power system demonstrate the effectiveness of our proposed method over the traditional way.

Conference proceeding

Outlier Gradient Analysis: Efficiently Identifying Detrimental Training Samples for Deep Learning Models

by Anshuman Chhabra, Bo Li, Jian Chen, Prasant Mohapatra and Hongfu Liu

Published 2025

Proceedings of Machine Learning Research - International Conference on Machine Learning, ICML 2025, 267, 10334 - 10353

Conference proceeding

Achieving Fairness at No Utility Cost via Data Reweighing with Influence

by Peizhao Li and Hongfu Liu

Published 01/01/2022

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162

With the fast development of algorithmic governance, fairness has become a compulsory property for machine learning models to suppress unintentional discrimination. In this paper, we focus on the pre-processing aspect for achieving fairness, and propose a data reweighing approach that only adjusts the weight for samples in the training phase. Different from most previous reweighing methods which usually assign a uniform weight for each (sub)group, we granularly model the influence of each training sample with regard to fairness-related quantity and predictive utility, and compute individual weights based on influence under the constraints from both fairness and utility. Experimental results reveal that previous methods achieve fairness at a non-negligible cost of utility, while as a significant advantage, our approach can empirically release the tradeoff and obtain cost-free fairness for equal opportunity. We demonstrate the cost-free fairness through vanilla classifiers and standard training processes, compared to baseline methods on multiple real-world tabular datasets. Code available at https://github.com/brandeis-machine-learning/influence-fairness.

Conference proceeding

Towards Novel Target Discovery Through Open-Set Domain Adaptation

by Taotao Jing, Hongfu Liu and Zhengming Ding

Published 10/01/2021

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

Open-set domain adaptation (OSDA) considers that the target domain contains samples from novel categories unobserved in external source domain. Unfortunately, existing OSDA methods always ignore the demand for the information of unseen categories and simply recognize them as "unknown" set without further explanation. This motivates us to understand the unknown categories more specifically by exploring the underlying structures and recovering their interpretable semantic attributes. In this paper, we propose a novel framework to accurately identify the seen categories in target domain, and effectively recover the semantic attributes for unseen categories. Specifically, structure preserving partial alignment is developed to recognize the seen categories through domain-invariant feature learning. Attribute propagation over visual graph is designed to smoothly transit attributes from seen to unseen categories via visual-semantic mapping. Moreover, two new cross-main benchmarks are constructed to evaluate the proposed framework in the novel and practical challenge. Experimental results on open-set recognition and semantic recovery demonstrate the superiority of the proposed method over other compared baselines.

Conference proceeding

Fairness-Aware Unsupervised Feature Selection

by Xiaoying Xing, Hongfu Liu, Chen Chen and Jundong Li

Published 06/03/2021

Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Feature selection is a prevalent data preprocessing paradigm for various learning tasks. Due to the expensive cost of acquiring supervision information, unsupervised feature selection sparks great interests recently. However, existing unsupervised feature selection algorithms do not have fairness considerations and suffer from a high risk of amplifying discrimination by selecting features that are over associated with protected attributes such as gender, race, and ethnicity. In this paper, we make an initial investigation of the fairness-aware unsupervised feature selection problem and develop a principled framework, which leverages kernel alignment to find a subset of high-quality features that can best preserve the information in the original feature space while being minimally correlated with protected attributes. Specifically, different from the mainstream in-processing debiasing methods, our proposed framework can be regarded as a model-agnostic debiasing strategy that eliminates biases and discrimination before downstream learning algorithms are involved. Experimental results on multiple real-world datasets demonstrate that our framework achieves a good trade-off between utility maximization and fairness promotion.

Conference proceeding

Deep Clustering based Fair Outlier Detection

by Hanyu Song, Peizhao Li, Hongfu Liu and ASSOC COMP MACHINERY

Published 01/01/2021

KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 1481 - 1489

In this paper, we focus on the fairness issues regarding unsupervised outlier detection. Traditional algorithms, without a specific design for algorithmic fairness, could implicitly encode and propagate statistical bias in data and raise societal concerns. To correct such unfairness and deliver a fair set of potential outlier candidates, we propose Deep Clustering based Fair Outlier Detection (DCFOD) that learns a good representation for utility maximization while enforcing the learnable representation to be subgroup-invariant on the sensitive attribute. Considering the coupled and reciprocal nature between clustering and outlier detection, we leverage deep clustering to discover the intrinsic cluster structure and out-of-structure instances. Meanwhile, an adversarial training erases the sensitive pattern for instances for fairness adaptation. Technically, we propose an instance-level weighted representation learning strategy to enhance the joint deep clustering and outlier detection, where the dynamic weight module re-emphasizes contributions of likely-inliers while mitigating the negative impact from outliers. Demonstrated by experiments on eight datasets comparing to 17 outlier detection algorithms, our DCFOD method consistently achieves superior performance on both the outlier detection validity and two types of fairness notions in outlier detection.

Conference proceeding

Implicit Semantic Response Alignment for Partial Domain Adaptation

by Wenxiao Xiao, Zhengming Ding and Hongfu Liu

Published 01/01/2021

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 34

Partial Domain Adaptation (PDA) addresses the unsupervised domain adaptation problem where the target label space is a subset of the source label space. Most state-of-art PDA methods tackle the inconsistent label space by assigning weights to classes or individual samples, in an attempt to discard the source data that belongs to the irrelevant classes. However, we believe samples from those extra categories would still contain valuable information to promote positive transfer. In this paper, we propose the Implicit Semantic Response Alignment to explore the intrinsic relationships among different categories by applying a weighted schema on the feature level. Specifically, we design a class2vec module to extract the implicit semantic topics from the visual features. With an attention layer, we calculate the semantic response according to each implicit semantic topic. Then semantic responses of source and target data are aligned to retain the relevant information contained in multiple categories by weighting the features, instead of samples. Experiments on several cross-domain benchmark datasets demonstrate the effectiveness of our method over the state-of-the-art PDA methods. Moreover, we elaborate in-depth analyses to further explore implicit semantic alignment.

Hongfu Liu

Assistant Professor of Computer Science

Scholarship list

Brandeis University Social media