ELLIS header
University of Stuttgart Logo
Max Planck Institute for Intelligent Systems Logo

Finding the Dwarf: Recovering Precise Types from WebAssembly Binaries

Daniel Lehmann, Michael Pradel

Proceedings of the 43rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 1–16, 2022.


Abstract

The increasing popularity of WebAssembly creates a demand for understanding and reverse engineering WebAssembly binaries. Recovering high-level function types is an important part of this process. One method to recover types is data-flow analysis, but it is complex to implement and may require manual heuristics when logical constraints fall short. In contrast, this paper presents SnowWhite, a learning-based approach for recovering precise, high-level parameter and return types for WebAssembly functions. It improves over prior work on learning-based type recovery by representing the types-to-predict in an expressive type language, which can describe a large number of complex types, instead of the fixed, and usually small type vocabulary used previously. Thus, recovery of a single type is no longer a classification task but sequence prediction, for which we build on the success of neural sequence-to-sequence models. We evaluate SnowWhite on a new, large-scale dataset of 6.3 million type samples extracted from 300,905 WebAssembly object files. The results show the type language is expressive, precisely describing 1,225 types instead the 7 to 35 types considered in previous learning-based approaches. Despite this expressiveness, our type recovery has high accuracy, exactly matching 44.5% (75.2%) of all parameter types and 57.7% (80.5%) of all return types within the top-1 (top-5) predictions.

Links


BibTeX

@inproceedings{lehmann22_pldi, title = {Finding the Dwarf: Recovering Precise Types from WebAssembly Binaries}, author = {Lehmann, Daniel and Pradel, Michael}, year = {2022}, booktitle = {Proceedings of the 43rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)}, pages = {1--16}, preprint = {https://software-lab.org/publications/pldi2022.pdf} }