Interactive active learning for literature screening: finetuning GPT with DeepSeek reasoning for cross-domain generalization

Yiming Li; Joseph M Plasek; Xinsong Du; Yifei Wang; Zhengyang Zhou; John Lian; Ya-Wen Chuang; Pengyu Hong; Peter C Hou; Li Zhou

doi:10.1093/jamia/ocag014

Back

Interactive active learning for literature screening: finetuning GPT with DeepSeek reasoning for cross-domain generalization

Journal article

Peer reviewed

Interactive active learning for literature screening: finetuning GPT with DeepSeek reasoning for cross-domain generalization

Yiming Li, Joseph M Plasek, Xinsong Du, Yifei Wang, Zhengyang Zhou, John Lian, Ya-Wen Chuang, Pengyu Hong, Peter C Hou and Li Zhou

Journal of the American Medical Informatics Association : JAMIA

03/09/2026

DOI: https://doi.org/10.1093/jamia/ocag014

Handle:

https://hdl.handle.net/10192/79244

PMID: 41801982

Abstract

generative pre-trained transformer (GPT)

active learning

DeepSeek

supervised learning

reasoning

literature screening

Automated literature screening in biomedical research is often hindered by domain shifts and scarcity of labeled data, which limit model accuracy and generalizability. While large language models (LLMs) perform well in zero-shot settings, they often fail to capture complex, domain-specific reasoning patterns. To address this limitation, this study investigates whether an interactive, weakly supervised learning framework combining GPT (generative pre-trained transformer)'s fine-tuning adaptability with DeepSeek's reasoning capabilities can improve literature screening performance across biomedical domains. We developed an active learning framework that leverages model disagreement between GPT-4o and DeepSeek to improve literature screening performance. This process began with a labeled corpus of 6331 articles on large language models, from which a model disagreement analysis was performed to identify cases where GPT-4o misclassified and DeepSeek produced correct predictions. Three GPT variants-GPT-4o, GPT-4o-mini, and GPT-4.1-nano, were fine-tuned under standard supervised learning settings using these disagreement-based samples. Fine-tuning prompts incorporated classification labels and, when available, rationale traces generated by DeepSeek to provide reasoning-augmented weak supervision. Model performance was evaluated on an independent benchmark set of 291 annotated articles across 10 topic queries in cancer immunotherapy and LLMs in medicine, using standard evaluation metrics, with recall as the primary measure. Fine-tuning GPT models using disagreement-based examples significantly improved performance. GPT-4o-mini achieved the best overall results after fine-tuning, especially with the highest F1 score (0.93, P < .001) and recall (0.95, P < .001). Across the biomedical topics, fine-tuned models consistently outperformed their zero-shot counterparts without increasing reviewer workload. These findings demonstrate the effectiveness of disagreement-driven active learning in enhancing GPT-based biomedical literature screening. Lightweight models like GPT-4o-mini benefit most from targeted, reasoning-enriched training, highlighting their suitability for scalable deployment. This study introduces an interactive active learning framework that leverages fine-tuned LLMs with reasoning capabilities to enhance literature screening. The approach offers a scalable solution to more efficient and reliable information retrieval in systematic reviews.

Metrics

1 Record Views

Details

Title: Interactive active learning for literature screening: finetuning GPT with DeepSeek reasoning for cross-domain generalization
Creators: Yiming Li - Brigham and Women's Hospital
Joseph M Plasek - Brigham and Women's Hospital
Xinsong Du - Brigham and Women's Hospital
Yifei Wang - Brandeis University
Zhengyang Zhou - Brandeis University
John Lian - Brigham and Women's Hospital
Ya-Wen Chuang - School of Medicine, College of Medicine, China Medical University, Taichung 404328, Taiwan
Pengyu Hong - Brandeis University
Peter C Hou - Harvard Medical School
Li Zhou - Brigham and Women's Hospital
Publication Details: Journal of the American Medical Informatics Association : JAMIA
Grant note: 1R01LM014239 / National Institutes of Health/National Library of Medicine R01AG080429 / National Institutes of Health/National Institute on Aging
Identifiers: 9924591047501921
Academic Unit: Michtom School of Computer Science; Benjamin and Mae Volen National Center for Complex Systems
Language: English
Resource Type: Journal article

Interactive active learning for literature screening: finetuning GPT with DeepSeek reasoning for cross-domain generalization

Abstract

Metrics

Details

Brandeis University Social media