Abstract
One of the most difficult issues for learners of Chinese is understanding the way temporal information is marked in discourse. Like Indo-European languages, Chinese makes use of explicit temporal expressions, temporal adverbs, ordering of words and verb phrases, and pragmatics to communicate temporal relationships. However, Chinese lacks temporal inflection on verbs. An aspect marker may follow a verb, but crucially, these markers are considered optional in many contexts, and usage differs in different domains (e.g. written language, spoken language, official broadcast news). While all finite verbs in English are temporally marked in some way, the majority of verbs in most discourse will not be marked aspectually in Chinese. This lack of positive examples makes it extremely hard for learners to understand when aspect markers are licensed.\r Here, we explore the viability of using corpus linguistics techniques to create a sort of “discourse grammar”-checker for Chinese text which learners of Chinese can use to find errors in their own usage of aspect markers. We use a corpus-based machine learning approach to train a classifier on the usage of aspect markers and attempt to use this classifier to correctly posit aspect markers in unseen text. We discuss the capabilities and limits of our system, and how the optional and subjective nature of the placement of aspect markers blurs the notion of hits vs. false positives vs. false negatives, making evaluation difficult. We also sketch an annotation schema which would support a Chinese discourse-based aspect marker checking tool.