Ambiguity and Bias: A Departure from Classical LLM Evaluation

Sunny Zhou

doi:10.48617/etd.1441

Back

Ambiguity and Bias: A Departure from Classical LLM Evaluation

Thesis

Open access

Ambiguity and Bias: A Departure from Classical LLM Evaluation

Sunny Zhou

Master of Science (MS), Brandeis University, Graduate School of Arts & Sciences

2025

DOI:

https://doi.org/10.48617/etd.1441

Abstract

homonymy

polysemy

prompting

This work investigates whether open-ended prompts can evaluate large language models (LLMs) understanding of word meaning in context, focusing specifically on polysemy and homonymy. LLM generation capabilities are certainly well know, but it still unclear whether they understand the choices that make and why one generation is given over another. Traditional evaluations of LLMs often rely on fixed-alternative benchmarks that risk measuring memorization rather than semantic competence. The open-ended prompting approach, this thesis proposes, consists of probing LLM knowledge across four tasks designed to test word sense discrimination, definition generation, and contextual substitution. Using a subset of noun senses from WordNet, responses were collected from four instruction-tuned open-weight models—LLaMA 3.2B, Mistral 7B, Gemma 4B, and DeepSeek R1—and evaluated using the same models in an LLM-as-a-Judge framework. Results indicate that while models like Gemma and Mistral outperform others in terms of task performance and consistency, significant biases emerge in judgment behavior, including self-enhancement and format preferences. This illustrates not only how word meaning knowledge but also artifacts from training. These findings underscore the complexity of evaluating meaning understanding in LLMs and raise important questions about the reliability of LLM-based evaluations. The study concludes with a discussion of methodological implications and outlines future directions, including embedding-based analyses and the exploration of reasoning trajectories in LLM outputs.

Files and links (1)

pdf

Ambiguity_and_Bias__A_Departure_from_Classical_LLM_Evaluation1.73 MBDownload View

Open Access

Metrics

1 Record Views

Details

Title: Ambiguity and Bias: A Departure from Classical LLM Evaluation
Creators: Sunny Zhou
Contributors: James Pustejovsky (Advisor)
Marc Verhagen (Committee Member)
Awarding Institution: Brandeis University, Graduate School of Arts & Sciences; Master of Science (MS)
Theses and Dissertations: Master of Science (MS), Brandeis University, Graduate School of Arts & Sciences
Number of pages: 58
Identifiers: 9924505333801921
Academic Unit: Michtom School of Computer Science
Language: English
Resource Type: Thesis

Ambiguity and Bias: A Departure from Classical LLM Evaluation

Abstract

Files and links (1)

Metrics

Details

Brandeis University Social media