Employing LLMs in Higher Education: High Data Privacy Requirements Combined with Low Computational Resources

Gabriel Martins Abreu

Back

Employing LLMs in Higher Education: High Data Privacy Requirements Combined with Low Computational Resources

Thesis

Open access

Employing LLMs in Higher Education: High Data Privacy Requirements Combined with Low Computational Resources

Gabriel Martins Abreu

Data Science Internal Internship

Bachelor of Science (BS), Brandeis University

Abstract

Data Science

Large Language Model

This paper explores how to improve information retrieval for Large Language Models (LLMs) in environments with limited computational resources. Motivated by a real-world use case at Brandeis University’s offices, we built a lightweight Retrieval-Augmented Generation (RAG) system to process large archives of documents. To overcome challenges related to limited context size and the unstructured data, we implement a graph-based RAG approach (GraphRAG) that organizes document chunks and named entities into a knowledge graph. This method improves retrieval accuracy without increasing computational cost. We also propose an evaluation method based on document relevance, avoiding the need for expensive LLM-based validation. Our experiments focused on testing RAG-based information retrieval, notably without relying on LLMs due to cost limitations. While it is outperformed by classic RAG techniques in terms of recall, our findings suggest that GraphRAG can be an effective and practical solution in environments where low cost, speed, and resource efficiency are prioritized over maximum retrieval completeness. Our results show that in a computationally constrained setting, GraphRAG is more efficient and takes half the time to run.

Files and links (1)

pdf

Employing LLMs in Higher Education1.01 MBDownload View

Open Access

Metrics

2 Record Views

Details

Title: Employing LLMs in Higher Education: High Data Privacy Requirements Combined with Low Computational Resources
Creators: Gabriel Martins Abreu
Awarding Institution: Brandeis University; Bachelor of Science (BS)
Theses and Dissertations: Bachelor of Science (BS), Brandeis University
Series: Data Science Internal Internship
Identifiers: 9924481963001921
Academic Unit: Michtom School of Computer Science
Resource Type: Thesis

Employing LLMs in Higher Education: High Data Privacy Requirements Combined with Low Computational Resources

Abstract

Files and links (1)

Metrics

Details

Brandeis University Social media