Abstract
Unsupervised context-free grammar (CFG) induction is a challenging task. CFGs are useful for
several downstream NLP tasks such as parsing, machine translation, language modeling, and dis-
ambiguation. There is an acute need for unsupervised methods in this domain since producing
corpora for supervised grammar induction is time-intensive and requires expert annotators. Fur-
ther, even with expert annotators the task is prone to low inter-annotator agreement. Multiple levels
of inter-dependent structures are discovered simultaneously. Genetic algorithms (GAs) have been
shown to be effective at inducing CFGs for several tasks, but applying these methods to natural
language corpora is still a nascent area of research. We introduce a novel method of CFG induction
using a GA. We augment a traditional GA approach with operations tailored for CFG induction,
and introduce two new methods of evaluating fitness of phrase-structure rules (PSRs) within the
CFG. We also experiment with several constraints motivated by results from cognitive science. We
show that the present method allows for efficient, parallelized induction of variable-sized CFGs.