Abstract
We explore time in Chinese by mapping tense information from a manually-aligned English parallel corpus onto Chinese verbs. We construct a detailed mapping procedure to accurately convey tense in English through combinations of word tokens and parts-of-speech and then transfer that information onto verbs in Chinese. We explore the resulting Chinese data set and discuss the pros and cons of this mapping technique. Using this Chinese data set, augmented with tense, we attempt to automatically predict the tense of each verb in Chinese using a Conditional Random Fields algorithm along with a suite of linguistic features. We include an algorithm for extracting and associating time expressions to verbs and integrate that as a feature into our tense prediction algorithm. We achieve a 34% accuracy gain over our baseline as well as a much deeper understanding of how tense can transfer between English and Chinese in a translation environment.