Abstract
This chapter deals with the fundamental and challenging issue of the identification of wordhood in Chinese from both theoretical and computational perspectives. We follow the Lexical-Markup Framework definition of a word as a lexical entry, a unique form-meaning pair. This in turn leads to the discovery that the most robust orthographically relevant level in Chinese is semantics, as the language allows borrowing of non-Chinese phonemes through the limited use of mixed orthography. Based on our understanding of the sematic-based nature of Chinese words, we introduce different approach to the automatic identification of Chinese words (i.e., word segmentation). This chapter’s foci are on the two currently more successful approaches: character position tagging and word boundary decision.