Abstract
The idea behind the task of semantic annotation of images rests on the ability to retrieve an\r image using a query by concept, and not a query by keyword match in the text surrounding\r the image or a query by low-level features. The term 'semantic gap' refers to the breach\r between the content the user wants to find when she searches for an image and the actual\r low-level features corresponding to that content. Semantic annotation approaches focus\r on bridging this gap by recording content-relevant information, for instance, the categories\r to which objects in the image belong (e.g. dog, person), the event depicted in the scene\r (e.g. sleeping) and the location and setting of the action. This thesis proposes a model\r for complexly structured semantic annotation of images depicting events and activities that\r incorporates notions from text annotation schemes such as SpatialML (Mani et al. 2010)\r and ISO-Space (Pustejovsky et al. 2013). An annotation that characterizes events, the\r semantic roles of participants, the motion and visual characteristics of figures, as well as\r details concerning the location and time, would play a significant role in structured image\r retrieval and a mapping of annotated semantic entities and the image’s low-level features\r would likely assist the event recognition and description generation tasks.