Abstract
We present QueryNER, a manually-annotated dataset and accompanying model for
e-commerce query segmentation. Prior work in sequence labeling for e-commerce
has largely addressed aspect-value extraction which focuses on extracting
portions of a product title or query for narrowly defined aspects. Our work
instead focuses on the goal of dividing a query into meaningful chunks with
broadly applicable types. We report baseline tagging results and conduct
experiments comparing token and entity dropping for null and low recall query
recovery. Challenging test sets are created using automatic transformations and
show how simple data augmentation techniques can make the models more robust to
noise. We make the QueryNER dataset publicly available.