Abstract
In order to understand multimodal interactions between humans or humans and machine, it is minimally necessary to identify the content of the agents’ communicative acts in the dialogue. This can involve either overt linguistic expressions (speech or writing), content-bearing gesture, or the integration of both. But this content must be interpreted relative to a deeper understanding of an agent’s Theory of Mind (one’s mental state, desires, and intentions) in the context of the dialogue as it dynamically unfolds. This, in turn, can require identifying and tracking nonverbal behaviors, such as gaze, body posture, facial expressions, and actions, all of which contribute to understanding how expressions are contextualized in the dialogue, and interpreted relative to the epistemic attitudes of each agent. In this paper, we adopt Generative Lexicon’s approach to event structure to provide a lexical semantics for ontic and epistemic actions as used in Bolander’s interpretation of Dynamic Epistemic Logic, called Lexical Event Modeling (LEM). This allows for the compositional construction of epistemic models of a dialogue state. We demonstrate how veridical and false belief scenarios are treated compositionally within this model.