Abstract
Adverse drug event (ADE) detection serves as a primary quality of care benchmark in healthcare and plays a major role in pharmacovigilance. Social media data represents a largely untapped source of public clinical narratives which can be used to expand existing ADE tracking systems. Existing studies focus on annotation of small amounts of data to handle non-standard language usage. This study presents a new application of semi-supervised lexicon bootstrapping to flag Twitter data for potential ADEs. To do this, a new corpus sixteen times larger than the current largest, publicly available dataset was constructed and used to generate robust, bootstrapped drug and medical event lexicons. These lexicons were applied to held-out data to flag tweets containing potential ADEs. Compared to recent studies of lexicon-based ADE detection in Twitter, this method achieved competitive F1 scores and offers a robust evaluation capable of identifying severe ADEs in the social media sphere, representing important new data points relevant to existing pharmacovigilance systems.