ELLIS header
University of Stuttgart Logo
Max Planck Institute for Intelligent Systems Logo

Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models

Matteo Bortoletto, Constantin Ruhdorfer, Lei Shi, Andreas Bulling

Findings of Empirical Methods in Natural Language Processing (EMNLP), 2025.


Abstract


Links


BibTeX

@inproceedings{bortoletto25_femnlp, title = {Brittle {{Minds}}, {{Fixable Activations}}: {{Understanding Belief Representations}} in {{Language Models}}}, shorttitle = {Brittle {{Minds}}, {{Fixable Activations}}}, author = {Bortoletto, Matteo and Ruhdorfer, Constantin and Shi, Lei and Bulling, Andreas}, booktitle = {Findings of Empirical Methods in Natural Language Processing (EMNLP)}, year = {2025}, doi = {10.48550/arXiv.2406.17513} }