Using Genetic Algorithms for Unsupervised Context-Free Grammar Induction

Ryan James Partlan

doi:10.48617/etd.1063

Back

Using Genetic Algorithms for Unsupervised Context-Free Grammar Induction

Thesis

Open access

Using Genetic Algorithms for Unsupervised Context-Free Grammar Induction

Ryan James Partlan

Brandeis University

Master of Science (MS), Brandeis University, Graduate School of Arts & Sciences

2023

DOI:

https://doi.org/10.48617/etd.1063

Abstract

context-free grammar

genetic programming

grammar induction

Artificial intelligence

Genetic Algorithms

Unsupervised context-free grammar (CFG) induction is a challenging task. CFGs are useful for several downstream NLP tasks such as parsing, machine translation, language modeling, and dis- ambiguation. There is an acute need for unsupervised methods in this domain since producing corpora for supervised grammar induction is time-intensive and requires expert annotators. Fur- ther, even with expert annotators the task is prone to low inter-annotator agreement. Multiple levels of inter-dependent structures are discovered simultaneously. Genetic algorithms (GAs) have been shown to be effective at inducing CFGs for several tasks, but applying these methods to natural language corpora is still a nascent area of research. We introduce a novel method of CFG induction using a GA. We augment a traditional GA approach with operations tailored for CFG induction, and introduce two new methods of evaluating fitness of phrase-structure rules (PSRs) within the CFG. We also experiment with several constraints motivated by results from cognitive science. We show that the present method allows for efficient, parallelized induction of variable-sized CFGs.

Files and links (1)

pdf

RyanPartlanThesis1.23 MBDownload View

Open Access

Metrics

5 File views/ downloads

17 Record Views

Details

Title: Using Genetic Algorithms for Unsupervised Context-Free Grammar Induction
Creators: Ryan James Partlan
Contributors: James Pustejovsky (Advisor)
James Pustejovsky (Committee Member)
Constantine Lignos (Committee Member)
Awarding Institution: Brandeis University, Graduate School of Arts & Sciences; Master of Science (MS)
Theses and Dissertations: Master of Science (MS), Brandeis University, Graduate School of Arts & Sciences
Publisher: Brandeis University
Number of pages: 45
Identifiers: 9924283369901921
Academic Unit: Interdepartmental Program in Linguistics and Computational Linguistics
Language: English
Resource Type: Thesis

Using Genetic Algorithms for Unsupervised Context-Free Grammar Induction

Abstract

Files and links (1)

Metrics

Details

Brandeis University Social media