Abstract
The political landscape in the US is highly divided today, and people take liberal (left- leaning) vs. conservative (right-leaning) positions on almost every issue. Understanding where the political majority stands is important for decision makers when formulating pol- icy solutions to societal problems. Natural language processing can contribute to this by providing tools that can automatically predict the political stance of textual communica- tions. Previous work in this area has focused on carefully worded speech by politicians. In this thesis, we develop a machine learning model to predict the political stance of YouTube comments by ordinary citizens. We first annotate training data by using the Mechanical Turk platform to collect judgments on YouTube comments via crowdsourcing. The YouTube com- ments are reactions to videos of candidates in the 2020 primaries posted by major US news outlets. We annotated a corpus of 11,817 YouTube comments as left-leaning or right-leaning on eight separate political issues, and trained multiple machine learning models to classify the political stance of those comments1. An evaluation of machine learning models shows that this approach is promising, although much needs to be done for these models to be deployed in actual systems. We also analyze the linguistic signals that are most useful to political stance prediction with the help of model visualization tools.