Modeling Word Concepts without Convention: Linguistic and Computational Issues in Chinese Word Identification

Chu-Ren Huang; Nianwen Xue

doi:10.1093/oxfordhb/9780199856336.013.0071

Back

Book chapter

Modeling Word Concepts without Convention: Linguistic and Computational Issues in Chinese Word Identification

Chu-Ren Huang and Nianwen Xue

The Oxford Handbook of Chinese Linguistics, pp.348-361

Oxford Handbooks, Oxford University Press

04/01/2015

DOI: https://doi.org/10.1093/oxfordhb/9780199856336.013.0071

Abstract

character position tagging

languages by region

linguistics

orthographically relevant level

word boundary decision

word segmentation

wordhood

Chinese Language or Literature

Computational Linguistics

This chapter deals with the fundamental and challenging issue of the identification of wordhood in Chinese from both theoretical and computational perspectives. We follow the Lexical-Markup Framework definition of a word as a lexical entry, a unique form-meaning pair. This in turn leads to the discovery that the most robust orthographically relevant level in Chinese is semantics, as the language allows borrowing of non-Chinese phonemes through the limited use of mixed orthography. Based on our understanding of the sematic-based nature of Chinese words, we introduce different approach to the automatic identification of Chinese words (i.e., word segmentation). This chapter’s foci are on the two currently more successful approaches: character position tagging and word boundary decision.

Metrics

15 Record Views

Details

Title: Modeling Word Concepts without Convention
Creators: Chu-Ren Huang (Author) - Huang, Chu-Ren (黃居仁) is Chair Professor and Dean of Faculty of Humanities at the Hong Kong Polytechnic University. His areas of scholarship are in computational and corpus linguistics, lexical semantics, and ontology. Language resources projects he led at Academia Sinica built the first lexica, corpora, treebanks, and wordnets for Chinese
Nianwen Xue (Author) - Brandeis University, Michtom School of Computer Science
Publication Details: The Oxford Handbook of Chinese Linguistics, pp.348-361
Series: Oxford Handbooks
Publisher: Oxford University Press
Identifiers: 9924148840001921
Academic Unit: Benjamin and Mae Volen National Center for Complex Systems; Interdepartmental Program in Linguistics and Computational Linguistics; Michtom School of Computer Science
Language: English
Resource Type: Book chapter

Modeling Word Concepts without Convention: Linguistic and Computational Issues in Chinese Word Identification

Abstract

Metrics

Details

Brandeis University Social media