Lorenzo Gregori
This study investigates the automatic identification of action concepts using machine learning algorithms applied to a linguistic dataset derived from the IMAGACT ontology of actions. This resource comprises 1,010 action concepts, each represented by video scenes and enriched with multilingual linguistic annotations. Specifically, each video scene is associated with the complete set of verbs that can be used to describe the depicted action in each of the languages included in the ontology. Based on these data, automatic clustering of video scenes was conducted using the associated lexical items as features, under the hypothesis that semantically similar actions tend to be expressed by similar groups of verbs. Hierarchical agglomerative clustering was first employed to establish an evaluation framework and to construct a gold standard of validated action clusters. Subsequently, a semi-supervised approach based on Affinity Propagation was trained on the annotated data. Cluster coherence was evaluated, yielding promising results in terms of internal consistency and semantic interpretability. In addition, an interactive web interface of the action map was developed to enable users to visualize and browse the resulting clusters of video scenes.