ELLIS header
University of Stuttgart Logo
Max Planck Institute for Intelligent Systems Logo

Distinguished Lecture Series - Talk by Hilde Kuehne (University of Tuebingen)

We are pleased to announce our upcoming Distinguished Lecture Series talk by Hilde Kuehne (University of Tübingen)! The talk will take place in person on April 2nd, in room UN32.101. Professor Kuehne will also be available for meetings on April 2nd. If you are interested in scheduling a meeting, please email .

Prof. Dr. Hilde Kuehne is Professor at the Tuebingen AI Center at the University of Tuebingen and affiliated professor at the MIT-IBM Watson AI Lab. Her research focuses on learning without labels and multimodal video understanding. She has created several highly cited datasets and mainly works on analyzing large collections of untrimmed video data and other multimodal data sources. Her experience includes projects with various European and US universities with a focus on video and image processing. She has published various high-impact works in the field, including HMDB, which was awarded with the ICCV 2021 Helmholtz Prize and the PAMI Mark Everingham Prize in 2022. She has organized various workshops in the field and currently serves as general chair for ICCV 2025. Beyond her work, she is committed to bringing more diversity to STEM and is a board member of the Women in Computer Vision Initiative.

Title: Advances in self-supervised multimodal learning

Advances in self-supervised multimodal learning

The field of multimodal learning has witnessed significant progress in recent years, mainly enabled by advances in contrastive and autoregressive learning techniques. This scientific talk aims to present the latest developments in this domain, focusing on the following areas: I will quickly recap the concept of embedding space learning, which involves mapping multimodal input data, such as images, text, and video, into a shared feature space. Based on that, I will discuss the abilities of vision-language models that can arise from this, namely the concept of spatial and spatial-temporal grounding, which involves localizing objects and actions in images and videos. Finally, the talk will close with an outlook toward the challenges and future directions in multimodal learning, including the learning of multimodal structures to improve the efficiency and scalability of future systems.

Date: April 2d, 2025
Time: 9:45
Place: Universitätstraße 32.101, Campus Vaihingen of the University of Stuttgart.

Looking forward to seeing you all there! No registration necessary.