Abstract
We introduce three simple randomized variants of byte pair encoding (BPE) and
explore whether randomizing the selection of merge operations substantially
affects a downstream machine translation task. We focus on translation into
morphologically rich languages, hypothesizing that this task may show
sensitivity to the method of choosing subwords. Analysis using a Bayesian
linear model indicates that two of the variants perform nearly
indistinguishably compared to standard BPE while the other degrades performance
less than we anticipated. We conclude that although standard BPE is widely
used, there exists an interesting universe of potential variations on it worth
investigating. Our code is available at: https://github.com/bltlab/random-bpe.