Brittle Minds, Fixable Activations: Understanding Belief Representations in Language Models
Matteo Bortoletto, Constantin Ruhdorfer, Lei Shi, Andreas Bulling
Findings of Empirical Methods in Natural Language Processing (EMNLP), 2025.
Abstract
Links
BibTeX
@inproceedings{bortoletto25_femnlp,
title = {Brittle {{Minds}}, {{Fixable Activations}}: {{Understanding Belief Representations}} in {{Language Models}}},
shorttitle = {Brittle {{Minds}}, {{Fixable Activations}}},
author = {Bortoletto, Matteo and Ruhdorfer, Constantin and Shi, Lei and Bulling, Andreas},
booktitle = {Findings of Empirical Methods in Natural Language Processing (EMNLP)},
year = {2025},
doi = {10.48550/arXiv.2406.17513}
}