Abstract
This work disentangles the co-constructed nature of meaning in (English) social dialog from a computational linguistic perspective, resulting in (1) the enrichment of current theoretical models of dialog, which involve a symbolic representation of conversational context but mainly focus on the informational dimension, with a systematic architecture of meaning interactions between informational and non-informational dimensions, and (2) a hybrid dialog system (i.e. including both statistical and symbolic components) for social conversation in English which puts explainability at the heart of its meaning modeling, based on the newly disentangled insights into the co-constructing nature of meaning in social dialog.By doing so, this work addresses the gap between the more old-school dialog models which are capable of symbolically handling information exchange transactions in task-oriented applications, but not the social dynamics in casual conversation, and the state-of-the-art statistical dialog systems, whose fluency in chit-chat is based on how well they fit pre-existing training datasets, but not any explainable representation of what happens in the course of a conversation. Despite their artificial fluency showcased in current media hype, the latter easily derail in a new conversational context, as illustrated in the following dialog between me and a chatbot released by Meta Research:
Alex: I just had a walk along the Charles river. The bank is gorgeous.
Chatbot: i ’ ll have to check it out , thanks !
Alex: What do you do at a river bank?
Chatbot: i ’ m a teller at the river bank .
As we can see, the chatbot interprets “bank” in the question “What do you do at a river bank?” as a financial institution, which is the statistically preferred choice given the training data. In fact, the more appropriate interpretation should be the land alongside a river.
Departing from the black box nature of these systems during their interactions with both their developers and their end users, the dialog system developed in this work explicitly represents and continuously updates the conversational context with symbolic components by generating a contextual graph such that “the Charles river”, “The bank” and “a river bank” would be connected by overt coherence relations. To capture the dynamics of social conversation, the system performs its meaning modeling across various sociocultural dimensions in the same contextual space. These dimensions include information exchange, social focus, and emotional expression. To produce an utterance, it first makes a pool of candidates that are meaningful in at least one sociocultural dimension before computing their salience score to select the best one. For instance, among the candidate responses to the utterance “I just had a walk along the Charles river. The bank is gorgeous.”, the most salient response would involve both “the bank” and the emotional resonance of “gorgeous”, an instance of positive sentiment. One such response could be “It seems that you like the bank a lot, right?”
Thanks to its explainability-first approach, the developed system is immediately useful for human-in-the-loop applications, two of which are elaborated in this work. First, as language learners can socially interact with the system, review its internal linguistic representation of conversational context, and directly access its communicative strategies and tactics, they obtain a full exposure to different aspects of social conversation in a novel and systematic way, which enables them to deepen their critical, creative, communicative and systems thinking. Second, the system can play the role of an automatic annotator of social dialog, whose output (including the contextual graph showing meaning interactions and alignments across multiple sociocultural dimensions) can be used to study how people of different neurodevelopmental groups co-construct context-sensitive meaning in social conversation. The realization of these applications on a real-life scale is the next step into the future.