Abstract
In this paper, we extend Abstract Meaning Representation (AMR) in order to represent situated multimodal dialogue, with a focus on the modality of gesture. AMR is a general-purpose meaning representation that has become popular for its transparent structure, its ease of annotation and available corpora, and its overall expressiveness. While AMR was designed to represent meaning in language as text or speech, gesture accompanying speech conveys a number of novel communicative dimensions, including situational reference, spatial locations, manner, attitude, orientation, backchanneling, and others. In this paper, we explore how to combine multimodal elements into a single representation for alignment and grounded meaning, using gesture as a case study. As a platform for multimodal situated dialogue annotation, we believe that Gesture AMR has several attractive properties. It is adequately expressive at both utterance and dialogue levels, while easily accommodating the structures inherent in gestural expressions. Further, the native reentrancy facilitates both the linking between modalities and the eventual situational grounding to contextual bindings.