Abstract
In this paper, we introduce a multimodal environment and semantics for facilitating communication and interaction with a computational agent, as proxy to a robot. To this end, we have created an embodied 3D simulation enabling both the generation and interpretation of multiple modalities, including: language, gesture, and the visualization of objects moving and agents acting in their environment. Objects are encoded with rich semantic typing and action affordances, while actions themselves are encoded as multimodal expressions (programs), allowing for contextually salient inferences and decisions in the environment.