Shubhranshu Shekhar

Assistant Professor of Data Science in the Brandeis International Business School

Data Mining

Machine Learning

Public Policy

Journal article Peer reviewed

Can Machine Learning Target Health Care Fraud? Evidence From Medicare Hospitalizations

by Shubhranshu Shekhar, Jetson Leder-Luis and Leman Akoglu

Published 2026

Journal of policy analysis and management, 45, 1, n/a

The United States spends more than $4 trillion per year on health care, largely conducted by private providers and reimbursed by insurers. A major concern in this system is overbilling and fraud by hospitals, who face incentives to misreport their claims to receive higher payments. In this work, we develop novel machine learning tools to identify hospitals that overbill insurers, which can be used to guide investigations and auditing of suspicious hospitals for both public and private health insurance systems. Using large‐scale claims data from Medicare, the US federal health insurance program for the elderly and disabled, we identify patterns consistent with fraud among inpatient hospitalizations. Our proposed approach for fraud detection is fully unsupervised, not relying on any labeled training data, and is explainable to end users, providing interpretations for which diagnosis, procedure, and billing codes lead to hospitals being labeled suspicious. Using newly collected data from the Department of Justice on hospitals facing anti‐fraud lawsuits, and case studies of suspicious hospitals, we validate our approach and findings. Our method provides a nearly fivefold lift over random targeting of hospitals. We also perform a postanalysis to understand which hospital characteristics, not used for detection, are associated with suspiciousness.

Report

Unsupervised Machine Learning for Explainable Health Care Fraud Detection

by Shubhranshu Shekhar, Jetson Leder-Luis and Leman Akoglu

Published 2023

The US spends more than 4 trillion dollars per year on health care, largely conducted by private providers and reimbursed by insurers. A major concern in this system is overbilling, waste and fraud by providers, who face incentives to misreport on their claims in order to receive higher payments. In this work, we develop novel machine learning tools to identify providers that overbill insurers. Using large-scale claims data from Medicare, the US federal health insurance program for elderly adults and the disabled, we identify patterns consistent with fraud or overbilling among inpatient hospitalizations. Our proposed approach for fraud detection is fully unsupervised, not relying on any labeled training data, and is explainable to end users, providing reasoning and interpretable insights into the potentially suspicious behavior of the flagged providers. Data from the Department of Justice on providers facing anti-fraud lawsuits and case studies of suspicious providers validate our approach and findings. We also perform a post-analysis to understand hospital characteristics, those not used for detection but associate with a high suspiciousness score. Our method provides an 8-fold lift over random targeting, and can be used to guide investigations and auditing of suspicious providers for both public and private health insurance systems.

Conference proceeding

Less is more: Slimg for accurate, robust, and interpretable graph mining

by Jaemin Yoo, Meng-Chieh Lee, Shubhranshu Shekhar and Christos Faloutsos

Published 2023

Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 08/2023

How can we solve semi-supervised node classification in various graphs possibly with noisy features and structures? Graph neural networks (GNNs) have succeeded in many graph mining tasks, but their generalizability to various graph scenarios is limited due to the difficulty of training, hyperparameter tuning, and the selection of a model itself. Einstein said that we should "make everything as simple as possible, but not simpler." We rephrase it into the careful simplicity principle: a carefully-designed simple model can surpass sophisticated ones in real-world graphs. Based on the principle, we propose SlimG for semi-supervised node classification, which exhibits four desirable properties: It is (a) accurate, winning or tying on 10 out of 13 real-world datasets; (b) robust, being the only one that handles all scenarios of graph data (homophily, heterophily, random structure, noisy features, etc.); (c) fast and scalable, showing up to 18 times faster training in million-scale graphs; and (d) interpretable, thanks to the linearity and sparsity. We explain the success of SlimG through a systematic study of the designs of existing GNNs, sanity checks, and comprehensive ablation studies.

Journal article Peer reviewed

Benefit-aware early prediction of health outcomes on multivariate eeg time series

by Shubhranshu Shekhar, Dhivya Eswaran, Bryan Hooi, Jonathan Elmer, Christos Faloutsos and Leman Akoglu

Published 2023

Journal of biomedical informatics, 139, 104296

Given a cardiac-arrest patient being monitored in the ICU (intensive care unit) for brain activity, how can we predict their health outcomes as early as possible? Early decision-making is critical in many applications, e.g. monitoring patients may assist in early intervention and improved care. On the other hand, early prediction on EEG data poses several challenges: (i) earliness-accuracy trade-off; observing more data often increases accuracy but sacrifices earliness, (ii) large-scale (for training) and streaming (online decision-making) data processing, and (iii) multi-variate (due to multiple electrodes) and multi-length (due to varying length of stay of patients) time series. Motivated by this real-world application, we present BENEFITTER that infuses the incurred savings from an early prediction as well as the cost from misclassification into a unified domain-specific target called benefit. Unifying these two quantities allows us to directly estimate a single target (i.e. benefit), and importantly, (a) is efficient and fast, with training time linear in the number of input sequences, and can operate in real-time for decision-making, (b) can handle multi-variate and variable-length time-series, suitable for patient data, and (c) is effective, providing up to 2× time-savings with equal or better accuracy as compared to competitors.

Conference proceeding

Fairod: Fairness-aware outlier detection

by Shubhranshu Shekhar, Neil Shah and Leman Akoglu

Published 2021

Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society

AAAI/ACM Conference on AI, Ethics, and Society, 07/2021

Fairness and Outlier Detection (OD) are closely related, as it is exactly the goal of OD to spot rare, minority samples in a given population. However, when being a minority (as defined by protected variables, such as race/ethnicity/sex/age) does not reflect positive-class membership (such as criminal/fraud), OD produces unjust outcomes. Surprisingly, fairness-aware OD has been almost untouched in prior work, as fair machine learning literature mainly focuses on supervised settings. Our work aims to bridge this gap. Specifically, we develop desiderata capturing well-motivated fairness criteria for OD, and systematically formalize the fair OD problem. Further, guided by our desiderata, we propose FairOD, a fairness-aware outlier detector that has the following desirable properties: FairOD (1) exhibits treatment parity at test time, (2) aims to flag equal proportions of samples from all groups (i.e. obtain group fairness, via statistical parity), and (3) strives to flag truly high-risk samples within each group. Extensive experiments on a diverse set of synthetic and real world datasets show that FairOD produces outcomes that are fair with respect to protected variables, while performing comparable to (and in some cases, even better than) fairness-agnostic detectors in terms of detection performance.

Conference proceeding

Gen 2 out: Detecting and ranking generalized anomalies

by Meng-Chieh Lee, Shubhranshu Shekhar, Christos Faloutsos, T Noah Hutson and Leon Iasemidis

Published 2021

2021 IEEE International Conference on Big Data (Big Data)

IEEE International Conference on Big Data, 2021

In a cloud of m-dimensional data points, how would we spot, as well as rank, both single-point- as well as group-anomalies? We are the first to generalize anomaly detection in two dimensions: The first dimension is that we handle both point-anomalies, as well as group-anomalies, under a unified view - we shall refer to them as generalized anomalies. The second dimension is that Gen2Out not only detects, but also ranks, anomalies in suspiciousness order. Detection, and ranking, of anomalies has numerous applications: For example, in EEG recordings of an epileptic patient, an anomaly may indicate a seizure; in computer network traffic data, it may signify a power failure, or a DoS/DDoS attack.We start by setting some reasonable axioms; surprisingly, none of the earlier methods pass all the axioms. Our main contribution is the Gen2Out algorithm, that has the following desirable properties: (a) Principled and Sound anomaly scoring that obeys the axioms for detectors, (b) Doubly-general in that it detects, as well as ranks generalized anomaly– both point- and group-anomalies, (c) Scalable, it is fast and scalable, linear on input size. (d) Effective, experiments on real-world epileptic recordings (200GB) demonstrate effectiveness of Gen2Out as confirmed by clinicians. Experiments on 27 real-world benchmark datasets show that Gen2Out detects ground truth groups, matches or outperforms point-anomaly baseline algorithms on accuracy, with no competition for group-anomalies and requires about 2 minutes for 1 million data points on a stock machine.

Conference proceeding

Entity resolution in dynamic heterogeneous networks

by Shubhranshu Shekhar, Deepak Pai and Sriram Ravindran

Published 2020

Companion Proceedings of the Web Conference 2020

WWW '20

Networks evolve continuously over time not only with the addition and deletion of links and nodes but also with changes in the importance of edges. Even though many networks contain this type of temporal weightings, vast majority of research in network representation learning and classification has focused on static snapshots of the graph, while largely ignoring the temporal dynamics. In this work, we describe two approaches for incorporating weighted temporal information into network embedding methods such as Graph Convolutional Networks (GCNs). While the first approach aggregates time-weighted edges and nodes, the second approach uses temporal random walks to find relevant convolution nodes. With experiments on public and proprietary datasets, we demonstrate the effectiveness of the proposed TimeSage for link prediction tasks. By applying these predictions, we show improvements in our task of identifying fraudulent actors on a large e-commerce website selling software as subscriptions.

Conference proceeding

Incorporating privileged information to unsupervised anomaly detection

by Shubhranshu Shekhar and Leman Akoglu

Published 2018

Machine Learning and Knowledge Discovery in Databases: European Conference Part I

Machine Learning and Knowledge Discovery in Databases European Conference, 10/10/2018–10/14/2018, Dublin, Ireland

We introduce a new unsupervised anomaly detection ensemble called SPI which can harness privileged information - data available only for training examples but not for (future) test examples. Our ideas build on the Learning Using Privileged Information (LUPI) paradigm pioneered by Vapnik et al. [19,17], which we extend to unsupervised learning and in particular to anomaly detection. SPI (for Spotting anomalies with Privileged Information) constructs a number of frames/fragments of knowledge (i.e., density estimates) in the privileged space and transfers them to the anomaly scoring space through "imitation" functions that use only the partial information available for test examples. Our generalization of the LUPI paradigm to unsupervised anomaly detection shepherds the field in several key directions, including (i) domain knowledge-augmented detection using expert annotations as PI, (ii) fast detection using computationally-demanding data as PI, and (iii) early detection using "historical future" data as PI. Through extensive experiments on simulated and real datasets, we show that augmenting privileged information to anomaly detection significantly improves detection performance. We also demonstrate the promise of SPI under all three settings (i-iii); with PI capturing expert knowledge, computationally expensive features, and future data on three real world detection tasks.

Conference proceeding

Spreading Activation Way of Knowledge Integration

by Shubhranshu Shekhar, Sutanu Chakraborti and Deepak Khemani

Published 2015

Mining Intelligence and Knowledge Exploration: Third International Conference, MIKE 2015, Hyderabad, India, December 9-11, 2015, Proceedings 3

MIKE 2015: Mining Intelligence and Knowledge Exploration, 2015

Search and recommender systems benefit from effective integration of two different kinds of knowledge. The first is introspective knowledge, typically available in feature-theoretic representations of objects. The second is external knowledge, which could be obtained from how users rate (or annotate) items, or collaborate over a social network. This paper presents a spreading activation model that is aimed at a principled integration of these two sources of knowledge. In order to empirically evaluate our approach, we restrict the scope to text classification tasks, where we use the category knowledge of the labeled set of examples as an external knowledge source. Our experiments show a significantly improved classification effectiveness on hard datasets, where feature value representations, on their own, are inadequate in discriminating between classes.

Conference proceeding

Linking cases up: An extension to the case retrieval network

by Shubhranshu Shekhar, Sutanu Chakraborti and Deepak Khemani

Published 2014

Case-Based Reasoning Research and Development: 22nd International Conference, ICCBR 2014, Cork, Ireland, September 29, 2014-October 1, 2014. Proceedings 22

ICCBR 2014: Case-Based Reasoning Research and Development, 2014

In many domains, cases are associated with each other though this is not easily explained by the set of features they share. It is hard, for example to explicitly enumerate features that make a movie romantic. We present an extension to the Case Retrieval Network architecture, a spreading activation model initially proposed by Burkhard and Lenz, by allowing cases to influence each other independently of the features. We show that the architecture holds promise in improving effectiveness of retrieval in two distinct experimental domains.

Shubhranshu Shekhar

Assistant Professor of Data Science in the Brandeis International Business School

Scholarship list

Brandeis University Social media