15.06.2026 - 5 Papers Accepted at ICML
The ELLIS Unit Stuttgart is proud to announce that five of its research papers have been accepted for presentation at the International Conference on Machine Learning (ICML 2026), one of the premier conferences in the field of machine learning. This achievement underscores the unit’s commitment to advancing cutting-edge research in artificial intelligence and machine learning.
Accepted Papers:
Title: “Smart: Scalable Mesh-free Aerodynamic Simulations form Raw Geometries using a Transformer-based Surrogate Model”
Authors: Jan Hagnberger and Mathias Niepert
Abstract: Machine learning-based surrogate models have emerged as more efficient alternatives to numerical solvers for physical simulations over complex geometries, such as car bodies. Many existing models incorporate the simulation mesh as an additional input, thereby reducing prediction errors. However, generating a simulation mesh for new geometries is computationally costly. In contrast, mesh-free methods, which do not rely on the simulation mesh, typically incur higher errors. Motivated by these considerations, we introduce SMART, a neural surrogate model that predicts physical quantities at arbitrary query locations using only a point-cloud representation of the geometry, without requiring access to the simulation mesh. The geometry and simulation parameters are encoded into a shared latent space that captures both structural and parametric characteristics of the physical field. A physics decoder then attends to the encoder’s intermediate latent representations to map spatial queries to physical quantities. Through this cross-layer interaction, the model jointly updates latent geometric features and the evolving physical field. Extensive experiments show that SMART is competitive with and often outperforms existing methods that rely on the simulation mesh as input, demonstrating its capabilities for industry-level simulations.
arXiv:2601.18707
Title: “Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining”
Authors: Boshra Ariguib and Mathias Niepert and Andrei Manolache
Abstract: High-quality molecular representations are essential for property prediction and molecular design, yet large labeled datasets remain scarce. While self-supervised pretraining on molecular graphs has shown promise, many existing approaches either depend on hand-crafted augmentations or complex generative objectives, and often rely solely on 2D topology, leaving valuable 3D structural information underutilized. To address this gap, we introduce C-FREE (Contrast-Free Representation learning on Ego-nets), a simple framework that integrates 2D graphs with ensembles of 3D conformers. C-FREE learns molecular representations by predicting subgraph embeddings from their complementary neighborhoods in the latent space, using fixed-radius ego-nets as modeling units across different conformers. This design allows us to integrate both geometric and topological information within a hybrid Graph Neural Network (GNN)-Transformer backbone, without negatives, positional encodings, or expensive pre-processing. Pretraining on the GEOM dataset, which provides rich 3D conformational diversity, C-FREE achieves state-of-the-art results on MoleculeNet, surpassing contrastive, generative, and other multimodal self-supervised methods. Fine-tuning across datasets with diverse sizes and molecule types further demonstrates that pretraining transfers effectively to new chemical domains, highlighting the importance of 3D-informed molecular representations.
arXiv:2509.22468
Title: “Logical Guidance for the Exact Composition of Diffusion Models”
Authors: Francesco Alesiani and Jonathan H. Warrell and Tanja Bien and Henrik Christiansen and Matheus Ferraz and Mathias Niepert
Abstract: We propose LOGDIFF (Logical Guidance for the Exact Composition of Diffusion Models), a guidance framework for diffusion models that enables principled constrained generation with complex logical expressions at inference time. We study when exact score-based guidance for complex logical formulas can be obtained from guidance signals associated with atomic properties. First, we derive an exact Boolean calculus that provides a sufficient condition for exact logical guidance. Specifically, if a formula admits a circuit representation in which conjunctions combine conditionally independent subformulas and disjunctions combine subformulas that are either conditionally independent or mutually exclusive, exact logical guidance is achievable. In this case, the guidance signal can be computed exactly from atomic scores and posterior probabilities using an efficient recursive algorithm. Moreover, we show that, for commonly encountered classes of distributions, any desired Boolean formula is compilable into such a circuit representation. Second, by combining atomic guidance scores with posterior probability estimates, we introduce a hybrid guidance approach that bridges classifier guidance and classifier-free guidance, applicable to both compositional logical guidance and standard conditional generation. We demonstrate the effectiveness of our framework on multiple image and protein structure generation tasks.
arXiv:2602.05549
Title: “FOCA: Future-Oriented Conditioning for Data-Efficient Vision-Language-Action Adaptation”
Authors: Minh Duc Nguyen and Nghiem Tuong Diep and Nguyen Gia Binh and Trong-Bao Ho and Doanh Le Thien and Quang Tan Nguyen and Thien-Loc Ha and Tran Van Nhiem and Bao Thach and Tran Xuan Nhat and Tuan Anh Tran and Artur Habuda and Philip Lund M{\o}ller and Tran Nguyen Le and Daniel Sonntag and Mathias Niepert and Khoa D. Doan and Vu N. Duong and Hung Ngo and Minh Nhat VU and Duy Minh Ho Nguyen and An Thai Le and Vien Anh Ngo
Abstract: Vision–Language–Action (VLA) models enable general-purpose robotic control via large-scale multimodal pretraining, yet their effectiveness under few-shot imitation learning remains limited. We conduct a systematic stress test of state-of-the-art VLA models and show that performance degrades sharply as demonstrations are reduced, revealing a key weakness of existing adaptation strategies. To address this, we introduce FOCA, a future-oriented conditioning framework for data-efficient VLA adaptation. FOCA combines explicit prediction of task-grounded future interaction embeddings with implicit alignment to future goal observations, enabling long-horizon reasoning in latent space without pixel-level prediction. This formulation naturally supports action-free co-training with synthetic videos from video world models and can be interpreted as learning a future-conditioned value-like representation. Extensive experiments demonstrate FOCA achieves 95.7\% success with 20 demonstrations on LIBERO, improves 7–12\% on RoboCasa, and delivers up to 26\% absolute gains on real robots, establishing a new state of the art in few-shot VLA adaptation.
Title: “Protein Fold Classification at Scale: Benchmarking and Pretraining”
Authors: Dexiong Chen and Andrei Manolache and Mathias Niepert and Karsten Borgwardt
Abstract: Classifying protein topology is essential for deciphering biological function, but progress is held back by the lack of large-scale benchmarks that avoid duplicates and by models that do not scale well. We introduce TEDBench, a large-scale, non-redundant benchmark for protein fold classification constructed from the Encyclopedia of Domains (TED) and Foldseek-clustered AlphaFold structures. We show that on TEDBench, current protein representation learning methods either require very large models or fail to deliver strong performance. To address this challenge, we propose Masked Invariant Autoencoders (MiAE), a self-supervised framework for protein structure representation learning. MiAE uses an extremely high masking ratio of up to 90% with an \mathrm{SE(3)}-invariant encoder and a lightweight decoder that reconstructs backbone coordinates from the latent representation and mask tokens. MiAE scales well and outperforms supervised counterparts and state-of-the-art baselines on TEDBench, establishing a strong recipe for protein fold classification. To test transfer beyond AlphaFold structures, we further benchmark on a curated dataset from experimental structures of CATH v4.4. TEDBench is available at this https URL.
arXiv:2605.18552

