Selected Publications
Please refer to our ELLIS Unit's Google Scholar for a complete list of references, or to each individual researcher's Google Scholar page.2023
Journal Articles
-
SCENE: Reasoning about Traffic Scenes using Heterogeneous Graph Neural Networks
Thomas Monninger, Julian Schmidt, Jan Rupprecht, David Raba, Julian Jordan, Daniel Frank, Steffen Staab, Klaus Dietmayer
IEEE Robotics and Automation Letters, , pp. 1–8, 2023.
Understanding traffic scenes requires considering heterogeneous information about dynamic agents and the static infrastructure. In this work we propose SCENE, a methodology to encode diverse traffic scenes in heterogeneous graphs and to reason about these graphs using a heterogeneous Graph Neural Network encoder and task-specific decoders. The heterogeneous graphs, whose structures are defined by an ontology, consist of different nodes with type-specific node features and different relations with type-specific edge features. In order to exploit all the information given by these graphs, we propose to use cascaded layers of graph convolution. The result is an encoding of the scene. Task-specific decoders can be applied to predict desired attributes of the scene. Extensive evaluation on two diverse binary node classification tasks show the main strength of this methodology: despite being generic, it even manages to outperform task-specific baselines. The further application of our methodology to the task of node classification in various knowledge graphs shows its transferability to other domains.@article{monninger23_ral, title = {SCENE: Reasoning about Traffic Scenes using Heterogeneous Graph Neural Networks}, author = {Monninger, Thomas and Schmidt, Julian and Rupprecht, Jan and Raba, David and Jordan, Julian and Frank, Daniel and Staab, Steffen and Dietmayer, Klaus}, journal = {IEEE Robotics and Automation Letters}, year = {2023}, doi = {10.1109/LRA.2023.3234771}, pages = {1--8} }
Conference Papers
-
Made of Steel? Learning Plausible Materials for Components in the Vehicle Repair Domain
Annerose Eichel, Sabine Schulte Walde
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. , 2023.
@inproceedings{eichel23_eacl, author = {Eichel, Annerose and im Walde, Sabine Schulte}, title = {Made of Steel? Learning Plausible Materials for Components in the Vehicle Repair Domain}, booktitle = {Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL)}, year = {2023}, pages = {}, preprint = {} } -
Link Prediction with Attention Applied on Multiple Knowledge Graph Embedding Models
Cosimo Gregucci, Mojtaba Nayyeri, Daniel Hernandez, Steffen Staab
Proceedings of the ACM Web Conference, pp. , 2023.
@inproceedings{gregucci23_websci, title = {Link Prediction with Attention Applied on Multiple Knowledge Graph Embedding Models}, author = {Gregucci, Cosimo and Nayyeri, Mojtaba and Hernandez, Daniel and Staab, Steffen}, year = {2023}, pages = {}, booktitle = {Proceedings of the ACM Web Conference}, preprint = {} } -
Emotional Framing in the Spreading of False and True Claims
Akram Sadat Hosseini, Steffen Staab
Proceedings of the 15th ACM Web Science Conference, pp. , 2023.
@inproceedings{hosseini23_websci, title = {Emotional Framing in the Spreading of False and True Claims}, author = {Hosseini, Akram Sadat and Staab, Steffen}, year = {2023}, pages = {}, booktitle = {Proceedings of the 15th ACM Web Science Conference}, preprint = {} } -
A Systematic Search for Compound Semantics in Pretrained BERT Architectures
Filip Miletic, Sabine Schulte Walde
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp. , 2023.
@inproceedings{miletic23_eacl, author = {Miletic, Filip and im Walde, Sabine Schulte}, title = {A Systematic Search for Compound Semantics in Pretrained BERT Architectures}, booktitle = {Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL)}, year = {2023}, pages = {}, preprint = {} }
2022
Journal Articles
-
Improved Classification Rates for Localized SVMs
Ingrid Blaschzyk, Ingo Steinwart
Journal of Machine Learning Research, 23(165), pp. 1-59, 2022.
Localized support vector machines solve SVMs on many spatially defined small chunks and besides their computational benefit compared to global SVMs one of their main characteristics is the freedom of choosing arbitrary kernel and regularization parameter on each cell. We take advantage of this observation to derive global learning rates for localized SVMs with Gaussian kernels and hinge loss. It turns out that our rates outperform under suitable sets of assumptions known classification rates for localized SVMs, for global SVMs, and other learning algorithms based on e.g., plug-in rules or trees. The localized SVM rates are achieved under a set of margin conditions, which describe the behavior of the data-generating distribution, and no assumption on the existence of a density is made. Moreover, we show that our rates are obtained adaptively, that is without knowing the margin parameters in advance. The statistical analysis of the excess risk relies on a simple partitioning based technique, which splits the input space into a subset that is close to the decision boundary and into a subset that is sufficiently far away. A crucial condition to derive then improved global rates is a margin condition that relates the distance to the decision boundary to the amount of noise.@article{blaschzyk22_jmlr, title = {Improved Classification Rates for Localized SVMs}, author = {Blaschzyk, Ingrid and Steinwart, Ingo}, journal = {Journal of Machine Learning Research}, volume = {23}, number = {165}, year = {2022}, doi = {}, pages = {1-59} } -
In the Arms of a Robot: Designing Autonomous Hugging Robots with Intra-Hug Gestures
Alexis E. Block, Hasti Seifi, Otmar Hilliges, Roger Gassert, Katherine J. Kuchenbecker
ACM Transactions on Human-Robot Interaction Special Issue on Designing the Robot Body: Critical Perspectives on Affective Embodied Interaction (THRI), , 2022.
@article{block22_THRI, title = {In the Arms of a Robot: Designing Autonomous Hugging Robots with Intra-Hug Gestures}, author = {Block, Alexis E. and Seifi, Hasti and Hilliges, Otmar and Gassert, Roger and Kuchenbecker, Katherine J.}, journal = {ACM Transactions on Human-Robot Interaction Special Issue on Designing the Robot Body: Critical Perspectives on Affective Embodied Interaction (THRI)}, volume = {}, year = {2022}, doi = {} } -
Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent
David Holzmüller, Ingo Steinwart
Journal of Machine Learning Research, 23(181), pp. 1–82, 2022.
We prove that two-layer (Leaky)ReLU networks initialized by e.g. the widely used method proposed by He et al. (2015) and trained using gradient descent on a least-squares loss are not universally consistent. Specifically, we describe a large class of one-dimensional data-generating distributions for which, with high probability, gradient descent only finds a bad local minimum of the optimization landscape, since it is unable to move the biases far away from their initialization at zero. It turns out that in these cases, the found network essentially performs linear regression even if the target function is non-linear. We further provide numerical evidence that this happens in practical situations, for some multi- dimensional distributions and that stochastic gradient descent exhibits similar behavior. We also provide empirical results on how the choice of initialization and optimizer can influence this behavior.Preprint: https://arxiv.org/abs/2002.04861
@article{holzmueller22_jmlr, title = {Training Two-Layer ReLU Networks with Gradient Descent is Inconsistent}, author = {Holzmüller, David and Steinwart, Ingo}, journal = {Journal of Machine Learning Research}, volume = {23}, number = {181}, year = {2022}, doi = {}, pages = {1--82}, preprint = {https://arxiv.org/abs/2002.04861} } -
Predicting the Force Map of an ERT-Based Tactile Sensor Using Simulation and Deep Networks
Hyosang Lee, Huanbo Sun, Hyunkyu Park, Gokhan Serhat, Bernard Javot, Georg Martius, Katherine J. Kuchenbecker
IEEE Transactions on Automation Science and Engineering (TASE), , 2022.
Electrical resistance tomography (ERT) can be used to create large-scale soft tactile sensors that are flexible and robust. Good performance requires a fast and accurate mapping from the sensor’s sequential voltage measurements to the distribution of force across its surface. However, particularly with multiple contacts, this task is challenging for both previously developed approaches: physics-based modeling and end-to-end data-driven learning. Some promising results were recently achieved using sim-to-real transfer learning, but estimating multiple contact locations and accurate contact forces remains difficult because simulations tend to be less accurate with a high number of contact locations and/or high force. This paper introduces a modular hybrid method that combines simulation data synthesized from an electromechanical finite element model with real measurements collected from a new ERT-based tactile sensor. We use about 290,000 simulated and 90,000 real measurements to train two deep neural networks: the first (Transfer-Net) captures the inevitable gap between simulation and reality, and the second (Recon-Net) reconstructs contact forces from voltage measurements. The number of contacts, contact locations, force magnitudes, and contact diameters are evaluated for a manually collected multi-contact dataset of 150 measurements. Our modular pipeline’s results outperform predictions by both a physics-based model and end-to-end learning.@article{lee22_TASE, title = {Predicting the Force Map of an ERT-Based Tactile Sensor Using Simulation and Deep Networks}, author = {Lee, Hyosang and Sun, Huanbo and Park, Hyunkyu and Serhat, Gokhan and Javot, Bernard and Martius, Georg and Kuchenbecker, Katherine J.}, journal = {IEEE Transactions on Automation Science and Engineering (TASE)}, year = {2022}, doi = {10.1109/TASE.2022.3156184} } -
Neural Software Analysis
Michael Pradel, Satish Chandra
Communications of the ACM, 65(1), pp. 86–96, 2022.
Developer tools that use a neural machine learning model to make predictions about previously unseen code.doi: 10.1145/3460348
@article{pradel22_cacm, author = {Pradel, Michael and Chandra, Satish}, title = {Neural Software Analysis}, journal = {Communications of the ACM}, volume = {65}, number = {1}, pages = {86--96}, year = {2022}, doi = {10.1145/3460348} } -
A Soft Thumb-sized Vision-based Sensor with Accurate All-round Force Perception
Huanbo Sun, Katherine J. Kuchenbecker, Georg Martius
Nature Machine Intelligence, 4, 2022.
Vision-based haptic sensors have emerged as a promising approach to robotic touch due to affordable high-resolution cameras and successful computer vision techniques; however, their physical design and the information they provide do not yet meet the requirements of real applications. We present a robust, soft, low-cost, vision-based, thumb-sized three-dimensional haptic sensor named Insight, which continually provides a directional force-distribution map over its entire conical sensing surface. Constructed around an internal monocular camera, the sensor has only a single layer of elastomer over-moulded on a stiff frame to guarantee sensitivity, robustness and soft contact. Furthermore, Insight uniquely combines photometric stereo and structured light using a collimator to detect the three-dimensional deformation of its easily replaceable flexible outer shell. The force information is inferred by a deep neural network that maps images to the spatial distribution of three-dimensional contact force (normal and shear). Insight has an overall spatial resolution of 0.4 mm, a force magnitude accuracy of around 0.03 N and a force direction accuracy of around five degrees over a range of 0.03–2 N for numerous distinct contacts with varying contact area. The presented hardware and software design concepts can be transferred to a wide variety of robot parts.@article{sun22_NMI, title = {A Soft Thumb-sized Vision-based Sensor with Accurate All-round Force Perception}, author = {Sun, Huanbo and Kuchenbecker, Katherine J. and Martius, Georg}, journal = {Nature Machine Intelligence}, volume = {4}, organization = {Max Planck Institute for Intelligent Systems}, year = {2022}, doi = {10.1038/s42256-021-00439-3} } -
Distributional Measures of Semantic Abstraction
Sabine Schulte im Walde, Diego Frassinelli
Frontiers in Artificial Intelligence: Language and Computation, 4(796756), 2022.
This article provides an in-depth study of distributional measures for distinguishing between degrees of semantic abstraction. Abstraction is considered a “central construct in cognitive science” (Barsalou, 2003) and a “process of information reduction that allows for efficient storage and retrieval of central knowledge” (Burgoon et al., 2013). Relying on the distributional hypothesis, computational studies have successfully exploited measures of contextual co-occurrence and neighbourhood density to distinguish between conceptual semantic categorisations. So far, these studies have modeled semantic abstraction across lexical-semantic tasks such as ambiguity; diachronic meaning changes; abstractness vs. concreteness; and hypernymy. Yet, the distributional approaches target different conceptual types of semantic relatedness, and as to our knowledge not much attention has been paid to apply, compare or analyse the computational abstraction measures across conceptual tasks. The current article suggests a novel perspective that exploits variants of distributional measures to investigate semantic abstraction in English in terms of the abstract–concrete dichotomy (e.g., glory–banana) and in terms of the generality–specificity distinction (e.g., animal–fish), in order to compare the strengths and weaknesses of the measures regarding categorisations of abstraction, and to determine and investigate conceptual differences. In a series of experiments we identify reliable distributional measures for both instantiations of lexical-semantic abstraction and reach a precision higher than 0.7, but the measures clearly differ for the abstract–concrete vs. abstract–specific distinctions and for nouns vs. verbs. Overall, we identify two groups of measures, (i) frequency and word entropy when distinguishing between more and less abstract words in terms of the generality–specificity distinction, and (ii) neighbourhood density variants (especially target–context diversity) when distinguishing between more and less abstract words in terms of the abstract–concrete dichotomy. We conclude that more general words are used more often and are less surprising than more specific words, and that abstract words establish themselves empirically in semantically more diverse contexts than concrete words. Finally, our experiments once more point out that distributional models of conceptual categorisations need to take word classes and ambiguity into account: results for nouns vs. verbs differ in many respects, and ambiguity hinders fine-tuning empirical observations.@article{schulteimwalde22_fai, author = {{Schulte im Walde}, Sabine and Frassinelli, Diego}, title = {Distributional Measures of Semantic Abstraction}, journal = {Frontiers in Artificial Intelligence: Language and Computation}, volume = {4}, number = {796756}, year = {2022}, doi = {10.3389/frai.2021.796756} }
Conference Papers
-
Neuro-Symbolic Visual Dialog
Adnen Abdessaied, Mihai Bâce, Andreas Bulling
Proceedings of the 29th International Conference on Computational Linguistics (COLING), pp. 1–11, 2022.
We propose Neuro-Symbolic Visual Dialog (NSVD) —the first method to combine deep learning and symbolic program execution for multi-round visually-grounded reasoning. NSVD significantly outperforms existing purely-connectionist methods on two key challenges inherent to visual dialog: long-distance co-reference resolution as well as vanishing question-answering performance. We demonstrate the latter by proposing a more realistic and stricter evaluation scheme in which we use predicted answers for the full dialog history when calculating accuracy. We describe two variants of our model and show that using this new scheme, our best model achieves an accuracy of 99.72% on CLEVR-Dialog —a relative improvement of more than 10% over the state of the art —while only requiring a fraction of training data. Moreover, we demonstrate that our neuro-symbolic models have a higher mean first failure round, are more robust against incomplete dialog histories, and generalise better not only to dialogs that are up to three times longer than those seen during training but also to unseen question types and scenes.@inproceedings{abdessaied22_coling, author = {Abdessaied, Adnen and Bâce, Mihai and Bulling, Andreas}, title = {Neuro-Symbolic Visual Dialog}, booktitle = {Proceedings of the 29th International Conference on Computational Linguistics (COLING)}, year = {2022}, pages = {1--11}, preprint = {https://perceptualui.org/publications/abdessaied22_coling/} } -
Tensor-based Graph Modularity for Text Data Clustering
Rafika Boutalbi, Mira Ait-Saada, Anastasiia Iurshina, Steffen Staab, Mohamed Nadif
ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 1–5, 2022.
Graphs are used in several applications to represent similaritiesbetween instances. For text data, we can represent texts by differentfeatures such as bag-of-words, static embeddings (Word2vec, GloVe,etc.), and contextual embeddings (BERT, RoBERTa, etc.), leading tomultiple similarities (or graphs) based on each representation. Theproposal posits that incorporating the local invariance within everygraph and the consistency across different graphs leads to a consen-sus clustering that improves the document clustering. This problemis complex and challenged with the sparsity and the noisy data in-cluded in each graph. To this end, we rely on the modularity metric,which effectively evaluates graph clustering in such circumstances.Therefore, we present a novel approach for text clustering basedon both a sparse tensor representation and graph modularity. Thisleads to cluster texts (nodes) while capturing information arisingfrom the different graphs. We iteratively maximize a Tensor-basedGraph Modularity criterion. Extensive experiments on benchmarktext clustering datasets are performed, showing that the proposed al-gorithm referred to asTensor Graph Modularity–TGM– outperformsother baseline methods in terms of clustering task. The source codeis available at https://github.com/TGMclustering/TGMclustering.@inproceedings{boutalbi22_sigir, title = {Tensor-based Graph Modularity for Text Data Clustering}, author = {Boutalbi, Rafika and Ait-Saada, Mira and Iurshina, Anastasiia and Staab, Steffen and Nadif, Mohamed}, year = {2022}, booktitle = {ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)}, pages = {1--5} } -
Projection Predictive Inference for Generalized Linear and Additive Multilevel Models
Alejandro Catalina, Paul-Christian Bürkner, Aki Vehtari
Artificial Intelligence and Statistics (AISTATS) Conference Proceedings, pp. 1-23, 2022.
Projection predictive inference is a decision theoretic Bayesian approach that decouples model estimation from decision making. Given a reference model previously built including all variables present in the data, projection predictive inference projects its posterior onto a constrained space of a subset of variables. Variable selection is then performed by sequentially adding relevant variables until predictive performance is satisfactory. Previously, projection predictive inference has been demonstrated only for generalized linear models (GLMs) and Gaussian processes (GPs) where it showed superior performance to competing variable selection procedures. In this work, we extend projection predictive inference to support variable and structure selection for generalized linear multilevel models (GLMMs) and generalized additive multilevel models (GAMMs). Our simulative and real-word experiments demonstrate that our method can drastically reduce the model complexity required to reach reference predictive performance and achieve good frequency properties.@inproceedings{catalina22_aistats, title = {Projection Predictive Inference for Generalized Linear and Additive Multilevel Models}, author = {Catalina, Alejandro and Bürkner, Paul-Christian and Vehtari, Aki}, booktitle = {Artificial Intelligence and Statistics (AISTATS) Conference Proceedings}, pages = {1-23}, year = {2022}, doi = {10.48550/arXiv.2010.06994} } -
CrystalBLEU: Precisely and Efficiently Measuring the Similarity of Code
Aryaz Eghbali, Michael Pradel
Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 1–12, 2022.
Abstract BibTeX Project ACM SIGSOFT Distinguished Paper Award
Recent years have brought a surge of work on predicting pieces of source code, e.g., for code completion, code migration, program repair, or translating natural language into code. All this work faces the challenge of evaluating the quality of a prediction w.r.t. some oracle, typically in the form of a reference solution. A common evaluation metric is the BLEU score, an n-gram-based metric originally proposed for evaluating natural language translation, but adopted in software engineering because it can be easily computed on any programming language and enables automated evaluation at scale. However, a key difference between natural and programming languages is that in the latter, completely unrelated pieces of code may have many common n-grams simply because of the syntactic verbosity and coding conventions of programming languages. We observe that these trivially shared n-grams hamper the ability of the metric to distinguish between truly similar code examples and code examples that are merely written in the same language. This paper presents CrystalBLEU, an evaluation metric based on BLEU, that allows for precisely and efficiently measuring the similarity of code. Our metric preserves the desirable properties of BLEU, such as being language-agnostic, able to handle incomplete or partially incorrect code, and efficient, while reducing the noise caused by trivially shared n-grams. We evaluate CrystalBLEU on two datasets from prior work and on a new, labeled dataset of semantically equivalent programs. Our results show that CrystalBLEU can distinguish similar from dissimilar code examples 1.9–4.5 times more effectively, when compared to the original BLEU score and a previously proposed variant of BLEU for code.@inproceedings{eghbali22_ase, title = {CrystalBLEU: Precisely and Efficiently Measuring the Similarity of Code}, author = {Eghbali, Aryaz and Pradel, Michael}, year = {2022}, pages = {1--12}, booktitle = {Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE)} } -
Investigating Independence vs. Control: Agenda-Setting in Russian News Coverage on Social Media
Annerose Eichel, Gabriella Lapesa, Sabine Schulte Walde
Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC), pp. 5314–5323, 2022.
Agenda-setting is a widely explored phenomenon in political science: powerful stakeholders (governments or their financial supporters) have control over the media and set their agenda: political and economical powers determine which news should be salient. This is a clear case of targeted manipulation to divert the public attention from serious issues affecting internal politics (such as economic downturns and scandals) by flooding the media with potentially distracting information. We investigate agenda-setting in the Russian social media landscape, exploring the relation between economic indicators and mentions of foreign geopolitical entities, as well as of Russia itself. Our contributions are at three levels: at the level of the domain of the investigation, our study is the first to substructure the Russian media landscape in state-controlled vs. independent outlets in the context of strategic distraction from negative economic trends; at the level of the scope of the investigation, we involve a large set of geopolitical entities (while previous work has focused on the U.S.); at the qualitative level, our analysis of posts on Ukraine, whose relationship with Russia is of high geopolitical relevance, provides further insights into the contrast between state-controlled and independent outlets.@inproceedings{eichel23_lrec, author = {Eichel, Annerose and Lapesa, Gabriella and im Walde, Sabine Schulte}, title = {Investigating Independence vs. Control: Agenda-Setting in Russian News Coverage on Social Media}, booktitle = {Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC)}, year = {2022}, pages = {5314–5323}, preprint = {} } -
BenchIE: A Framework for Multi-Faceted Fact-Based Open Information Extraction Evaluation
Kiril Gashteovski, Mingying Yu, Bhushan Kotnis, Carolin Lawrence, Mathias Niepert, Goran Glavaš
Proc. of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), 2022.
Intrinsic evaluations of OIE systems are carried out either manually – with human evaluators judging the correctness of extractions – or automatically, on standardized benchmarks. The latter, while much more cost-effective, is less reliable, primarily because of the incompleteness of the existing OIE benchmarks: the ground truth extractions do not include all acceptable variants of the same fact, leading to unreliable assessment of models’ performance. Moreover, the existing OIE benchmarks are available for English only. In this work, we introduce BenchIE: a benchmark and evaluation framework for comprehensive evaluation of OIE systems for English, Chinese and German. In contrast to existing OIE benchmarks, BenchIE takes into account informational equivalence of extractions: our gold standard consists of fact synsets, clusters in which we exhaustively list all surface forms of the same fact. We benchmark several state-of-the-art OIE systems using BenchIE and demonstrate that these systems are significantly less effective than indicated by existing OIE benchmarks. We make BenchIE (data and evaluation code) publicly available.Preprint: https://arxiv.org/abs/2109.06850
@inproceedings{gashteovski22_acl, title = {BenchIE: A Framework for Multi-Faceted Fact-Based Open Information Extraction Evaluation}, author = {Gashteovski, Kiril and Yu, Mingying and Kotnis, Bhushan and Lawrence, Carolin and Niepert, Mathias and Glavaš, Goran}, year = {2022}, booktitle = {Proc. of the 60th Annual Meeting of the Association for Computational Linguistics (ACL)}, doi = {}, preprint = {https://arxiv.org/abs/2109.06850} } -
Modular and Iterative Multilingual Open Information Extraction
Bhushan Kotnis, Kiril Gashteovski, Daniel Onoro Rubio, Ammar Shaker, Vanesa Rodriguez-Tembras, Makoto Takamoto, Mathias Niepert, Carolin Lawrence
Proc. of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), 2022.
Open Information Extraction (OpenIE) is the task of extracting (subject, predicate, object) triples from natural language sentences. Current OpenIE systems extract all triple slots independently. In contrast, we investigate the hypothesis that it may be beneficial to extract triple slots iteratively: first extract easy slots, followed by the difficult ones by conditioning on the easy slots, and therefore achieve a better overall extraction. Based on this hypothesis, we propose a neural OpenIE system, MILLIE, that operates in an iterative fashion. Due to the iterative nature, the system is also modular: it is possible to seamlessly integrate rule based extraction systems with a neural end-to-end system, thereby allowing rule based systems to supply extraction slots which MILLIE can leverage for extracting the remaining slots. We confirm our hypothesis empirically: MILLIE outperforms SOTA systems on multiple languages ranging from Chinese to Arabic. Additionally, we are the first to provide an OpenIE test dataset for Arabic.@inproceedings{kotnis22_acl, title = {Modular and Iterative Multilingual Open Information Extraction}, author = {Kotnis, Bhushan and Gashteovski, Kiril and Rubio, Daniel Onoro and Shaker, Ammar and Rodriguez-Tembras, Vanesa and Takamoto, Makoto and Niepert, Mathias and Lawrence, Carolin}, year = {2022}, booktitle = {Proc. of the 60th Annual Meeting of the Association for Computational Linguistics (ACL)}, doi = {}, url = {https://openreview.net/pdf?id=KNqKOUnl_3F} } -
Finding the Dwarf: Recovering Precise Types from WebAssembly Binaries
Daniel Lehmann, Michael Pradel
Proceedings of the 43rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 1–16, 2022.
The increasing popularity of WebAssembly creates a demand for understanding and reverse engineering WebAssembly binaries. Recovering high-level function types is an important part of this process. One method to recover types is data-flow analysis, but it is complex to implement and may require manual heuristics when logical constraints fall short. In contrast, this paper presents SnowWhite, a learning-based approach for recovering precise, high-level parameter and return types for WebAssembly functions. It improves over prior work on learning-based type recovery by representing the types-to-predict in an expressive type language, which can describe a large number of complex types, instead of the fixed, and usually small type vocabulary used previously. Thus, recovery of a single type is no longer a classification task but sequence prediction, for which we build on the success of neural sequence-to-sequence models. We evaluate SnowWhite on a new, large-scale dataset of 6.3 million type samples extracted from 300,905 WebAssembly object files. The results show the type language is expressive, precisely describing 1,225 types instead the 7 to 35 types considered in previous learning-based approaches. Despite this expressiveness, our type recovery has high accuracy, exactly matching 44.5% (75.2%) of all parameter types and 57.7% (80.5%) of all return types within the top-1 (top-5) predictions.@inproceedings{lehmann22_pldi, title = {Finding the Dwarf: Recovering Precise Types from WebAssembly Binaries}, author = {Lehmann, Daniel and Pradel, Michael}, year = {2022}, pages = {1--16}, booktitle = {Proceedings of the 43rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)}, preprint = {https://software-lab.org/publications/pldi2022.pdf} } -
Generating Realistic Vulnerabilities via Neural Code Editing: An Empirical Study
Yu Nong, Yuzhe Ou, Michael Pradel, Feng Chen, Haipeng Cai
Proceedings of the ACM Symposium on the Foundations of Software Engineering (FSE), pp. 1–13, 2022.
The availability of large-scale, realistic vulnerability datasets is essential both for benchmarking existing techniques and for developing effective new data-driven approaches for software security. Yet such datasets are critically lacking. A promising solution is to generate such datasets by injecting vulnerabilities into real-world programs, which are richly available. Thus, in this paper, we explore the feasibility of vulnerability injection through neural code editing. With a synthetic dataset and a real-world one, we investigate the potential and gaps of three state-of-the-art neural code editors for vulnerability injection. We find that the studied editors have critical limitations on the real-world dataset, where the best accuracy is only 10.03%, versus 79.40% on the synthetic dataset. While the graph-based editors are more effective (successfully injecting vulnerabilities in up to 34.93% of real-world testing samples) than the sequence-based one (0 success), they still suffer from complex code structures and fall short for long edits due to their insufficient designs of the preprocessing and deep learning (DL) models. We reveal the promise of neural code editing for generating realistic vulnerable samples, as they help boost the effectiveness of DL-based vulnerability detectors by up to 49.51% in terms of F1 score. We also provide insights into the gaps in current editors (e.g., they are good at deleting but not at replacing code) and actionable suggestions for addressing them (e.g., designing effective editing primitives).@inproceedings{nong22_fse, title = {Generating Realistic Vulnerabilities via Neural Code Editing: An Empirical Study}, author = {Nong, Yu and Ou, Yuzhe and Pradel, Michael and Chen, Feng and Cai, Haipeng}, year = {2022}, pages = {1--13}, booktitle = {Proceedings of the ACM Symposium on the Foundations of Software Engineering (FSE)}, preprint = {https://software-lab.org/publications/fse2022_vuln_inj_study.pdf} } -
Utilizing Expert Features for Contrastive Learning of Time-Series Representations
Manuel Nonnenmacher, Lukas Oldenburg, Ingo Steinwart, David Reeb
Proc. of the 39th International Conference on Machine Learning (ICML), pp. 1–21, 2022.
We present an approach that incorporates expert knowledge for time-series representation learning. Our method employs expert features to replace the commonly used data transformations in previous contrastive learning approaches. We do this since time-series data frequently stems from the industrial or medical field where expert features are often available from domain experts, while transformations are generally elusive for time-series data. We start by proposing two properties that useful time-series representations should fulfill and show that current representation learning approaches do not ensure these properties. We therefore devise ExpCLR, a novel contrastive learning approach built on an objective that utilizes expert features to encourage both properties for the learned representation. Finally, we demonstrate on three real-world time-series datasets that ExpCLR surpasses several state-of-the-art methods for both unsupervised and semi-supervised representation learning.Preprint: https://arxiv.org/abs/2206.11517
@inproceedings{nonnenmacher22_icml, title = {Utilizing Expert Features for Contrastive Learning of Time-Series Representations}, author = {Nonnenmacher, Manuel and Oldenburg, Lukas and Steinwart, Ingo and Reeb, David}, year = {2022}, pages = {1--21}, booktitle = {Proc. of the 39th International Conference on Machine Learning (ICML)}, preprint = {https://arxiv.org/abs/2206.11517} } -
SOSP: Efficiently Capturing Global Correlations by Second-Order Structured Pruning
Manuel Nonnenmacher, Thomas Pfeil, Ingo Steinwart, David Reeb
Proc. of the Tenth International Conference on Learning Representations (ICLR), pp. 1–24, 2022.
Pruning neural networks reduces inference time and memory costs. On standard hardware, these benefits will be especially prominent if coarse-grained structures, like feature maps, are pruned. We devise two novel saliency-based methods for second-order structured pruning (SOSP) which include correlations among all structures and layers. Our main method SOSP-H employs an innovative second-order approximation, which enables saliency evaluations by fast Hessian-vector products. SOSP-H thereby scales like a first-order method despite taking into account the full Hessian. We validate SOSP-H by comparing it to our second method SOSP-I that uses a well-established Hessian approximation, and to numerous state-of-the-art methods. While SOSP-H performs on par or better in terms of accuracy, it has clear advantages in terms of scalability and efficiency. This allowed us to scale SOSP-H to large-scale vision tasks, even though it captures correlations across all layers of the network. To underscore the global nature of our pruning methods, we evaluate their performance not only by removing structures from a pretrained network, but also by detecting architectural bottlenecks. We show that our algorithms allow to systematically reveal architectural bottlenecks, which we then remove to further increase the accuracy of the networks.Preprint: https://arxiv.org/abs/2110.11395
@inproceedings{nonnenmacher22_iclr, title = {SOSP: Efficiently Capturing Global Correlations by Second-Order Structured Pruning}, author = {Nonnenmacher, Manuel and Pfeil, Thomas and Steinwart, Ingo and Reeb, David}, year = {2022}, pages = {1--24}, booktitle = {Proc. of the Tenth International Conference on Learning Representations (ICLR)}, preprint = {https://arxiv.org/abs/2110.11395} } -
Robot, Pass Me the Tool: Handle Visibility Facilitates Task-oriented Handovers
Valerio Ortenzi, Maija Filipovica, Diar Abdlkarim, Tommaso Pardi, Chie Takahashi, Alan Wing, Massimiliano Di Luca, Katherine J. Kuchenbecker
Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 256–264, 2022.
A human handing over an object modulates their grasp and movements to accommodate their partner’s capabilities, which greatly increases the likelihood of a successful transfer. State-of-the-art robot behavior lacks this level of user understanding, resulting in interactions that force the human partner to shoulder the burden of adaptation. This paper investigates how visual occlusion of the object being passed affects the subjective perception and quantitative performance of the human receiver. We performed an experiment in virtual reality where seventeen participants were tasked with repeatedly reaching to take a tool from the hand of a robot; each of the three tested objects (hammer, screwdriver, scissors) was presented in a wide variety of poses. We carefully analysed the user’s hand and head motions, the time to grasp the object, and the chosen grasp location, as well as participants’ ratings of the grasp they just performed. Results show that initial visibility of the handle significantly increases the reported holdability and immediate usability of a tool. Furthermore, a robot that offers objects so that their handles are more occluded forces the receiver to spend more time in planning and executing the grasp and also lowers the probability that the tool will be grasped by the handle. Together these findings indicate that robots can more effectively support their human work partners by increasing the visibility of the intended grasp location of objects being passed.@inproceedings{ortenzi22_HRI, title = {Robot, Pass Me the Tool: Handle Visibility Facilitates Task-oriented Handovers}, author = {Ortenzi, Valerio and Filipovica, Maija and Abdlkarim, Diar and Pardi, Tommaso and Takahashi, Chie and Wing, Alan and Luca, Massimiliano Di and Kuchenbecker, Katherine J.}, booktitle = {Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI)}, pages = {256–264}, year = {2022}, doi = {10.5555/3523760.3523797} } -
Nalin: Learning from Runtime Behavior to Find Name-Value Inconsistencies in Jupyter Notebooks
Jibesh Patra, Michael Pradel
Proceedings of the 44th IEEE/ACM International Conference on Software Engineering (ICSE), pp. 1–13, 2022.
Variable names are important to understand and maintain code. If a variable name and the value stored in the variable do not match, then the program suffers from a name-value inconsistency, which is due to one of two situations that developers may want to fix: Either a correct value is referred to through a misleading name, which negatively affects code understandability and maintainability, or the correct name is bound to a wrong value, which may cause unexpected runtime behavior. Finding name-value inconsistencies is hard because it requires an understanding of the meaning of names and knowledge about the values assigned to a variable at runtime. This paper presents Nalin, a technique to automatically detect name-value inconsistencies. The approach combines a dynamic analysis that tracks assignments of values to names with a neural machine learning model that predicts whether a name and a value fit together. To the best of our knowledge, this is the first work to formulate the problem of finding coding issues as a classification problem over names and runtime values. We apply Nalin to 106,652 real-world Python programs, where meaningful names are particularly important due to the absence of statically declared types. Our results show that the classifier detects name-value inconsistencies with high accuracy, that the warnings reported by Nalin have a precision of 80% and a recall of 76% w.r.t. a ground truth created in a user study, and that our approach complements existing techniques for finding coding issues.@inproceedings{patra22_icse, title = {Nalin: Learning from Runtime Behavior to Find Name-Value Inconsistencies in Jupyter Notebooks}, author = {Patra, Jibesh and Pradel, Michael}, year = {2022}, pages = {1--13}, booktitle = {Proceedings of the 44th {IEEE/ACM} International Conference on Software Engineering ({ICSE})}, preprint = {https://software-lab.org/publications/icse2022_Nalin.pdf} } -
Ordered Subgraph Aggregation Networks
Chendi Qian, Gaurav Rattan, Floris Geerts, Christopher Morris, Mathias Niepert
Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS), pp. , 2022.
Numerous subgraph-enhanced graph neural networks (GNNs) have emerged recently, provably boosting the expressive power of standard (message-passing) GNNs. However, there is a limited understanding of how these approaches relate to each other and to the Weisfeiler–Leman hierarchy. Moreover, current approaches either use all subgraphs of a given size, sample them uniformly at random, or use hand-crafted heuristics instead of learning to select subgraphs in a data-driven manner. Here, we offer a unified way to study such architectures by introducing a theoretical framework and extending the known expressivity results of subgraph-enhanced GNNs. Concretely, we show that increasing subgraph size always increases the expressive power and develop a better understanding of their limitations by relating them to the established k-𝖶𝖫 hierarchy. In addition, we explore different approaches for learning to sample subgraphs using recent methods for backpropagating through complex discrete probability distributions. Empirically, we study the predictive performance of different subgraph-enhanced GNNs, showing that our data-driven architectures increase prediction accuracy on standard benchmark datasets compared to non-data-driven subgraph-enhanced graph neural networks while reducing computation time.Preprint: https://arxiv.org/abs//2206.11168
@inproceedings{qian22_neurips, title = {Ordered Subgraph Aggregation Networks}, author = {Qian, Chendi and Rattan, Gaurav and Geerts, Floris and Morris, Christopher and Niepert, Mathias}, year = {2022}, pages = {}, booktitle = {Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS)}, preprint = {https://arxiv.org/abs//2206.11168} } -
PDEBENCH: An Extensive Benchmark for Scientific Machine Learning
Makoto Takamoto, Timothy Praditia, Raphael Leiteritz, Dan MacKinlay, Francesco Alesiani, Dirk Pflüger, Mathias Niepert
Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS), pp. , 2022.
Machine learning-based modeling of physical systems has gained increasing interest in recent years. Despite recent progress, there is still a lack of such benchmarks for scientific ML with sufficient volume and variety that are easy to use but still challenging and representative for a wide range of problems. In this paper, we introduce PDEBench, a benchmark suite of time-dependent simulation tasks based on Partial Differential Equations (PDEs). PDEBench comprises both code and data to benchmark the performance of novel machine learning models against both classical numerical simulations and machine learning baselines. Our proposed set of benchmark problems contributes in particular the following unique features: (1) A much wider range of PDEs than existing approaches, ranging from relatively common examples to more realistic and difficult ones; (2) much larger ready-to-use datasets than state-of-the-art, comprising multiple simulation-runs across varying initial or boundary conditions and model parameters; (3) and it provides easily extensible source codes with user-friendly APIs for data generation and baseline results with advanced machine learning models (FNO, U-Net, PINN, Gradient-based inverse method). PDEBench allows researchers to extend the dataset freely for their own purposes using a standardized API, and to compare the performance of their new models. Finally, we propose new metrics to help to understand and evaluate a given ML model in the context of scientific ML. With those metrics we identified tasks which the present ML methods cannot provide acceptable accuracy, and propose them as future challenge-task for the community.@inproceedings{takamoto22_neurips, title = {PDEBENCH: An Extensive Benchmark for Scientific Machine Learning}, author = {Takamoto, Makoto and Praditia, Timothy and Leiteritz, Raphael and MacKinlay, Dan and Alesiani, Francesco and Pflüger, Dirk and Niepert, Mathias}, year = {2022}, pages = {}, booktitle = {Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS)}, preprint = {https://openreview.net/forum?id=dh_MkX0QfrK} } -
Hyperbolic Embedding Inference for Structured Multi-Label Prediction
Bo Xiong, M. Cochez, Mojtaba Nayyeri, Steffen Staab
Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS), pp. , 2022.
@inproceedings{xiong22_neurips_2, title = {Hyperbolic Embedding Inference for Structured Multi-Label Prediction}, author = {Xiong, Bo and Cochez, M. and Nayyeri, Mojtaba and Staab, Steffen}, year = {2022}, pages = {}, booktitle = {Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS)} } -
Faithful Embeddings for EL++ Knowledge Bases
Bo Xiong, Nico Potyka, Trung-Kien Tran, Mojtaba Nayyeri, Steffen Staab
Proceedings of the 21st International Semantic Web Conference (ISWC2022), pp. 1–18, 2022.
Recently, increasing efforts are put into learning continual representations for symbolic knowledge bases (KBs). However, these approaches either only embed the data-level knowledge (ABox) or suffer from inherent limitations when dealing with concept-level knowledge (TBox), i.e., they cannot faithfully model the logical structure present in the KBs. We present BoxEL, a geometric KB embedding approach that allows for better capturing the logical structure (i.e., ABox and TBox axioms) in the description logic EL++. BoxEL models concepts in a KB as axis-parallel boxes that are suitable for modeling concept intersection, entities as points inside boxes, and relations between concepts/entities as affine transformations. We show theoretical guarantees (soundness) of BoxEL for preserving logical structure. Namely, the learned model of BoxEL embedding with loss 0 is a (logical) model of the KB. Experimental results on (plausible) subsumption reasonings and a real-world application for protein-protein prediction show that BoxEL outperforms traditional knowledge graph embedding methods as well as state-of-the-art EL++ embedding approaches.Preprint: https://arxiv.org/abs/2201.09919
@inproceedings{xiong22_iswc, title = {Faithful Embeddings for EL++ Knowledge Bases}, author = {Xiong, Bo and Potyka, Nico and Tran, Trung-Kien and Nayyeri, Mojtaba and Staab, Steffen}, year = {2022}, pages = {1--18}, booktitle = {Proceedings of the 21st International Semantic Web Conference (ISWC2022)}, preprint = {https://arxiv.org/abs/2201.09919} } -
Ultrahyperbolic Knowledge Graph Embeddings
Bo Xiong, Shichao Zhu, Mojtaba Nayyeri, Chengjin Xu, Shirui Pan, Chuan Zhou, Steffen Staab
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 1–10, 2022.
Recent knowledge graph (KG) embeddings have been advanced by hyperbolic geometry due to its superior capability for representing hierarchies. The topological structures of real-world KGs, however, are rather heterogeneous, i.e., a KG is composed of multiple distinct hierarchies and non-hierarchical graph structures. Therefore, a homogeneous (either Euclidean or hyperbolic) geometry is not sufficient for fairly representing such heterogeneous structures. To capture the topological heterogeneity of KGs, we present an ultrahyperbolic KG embedding (UltraE) in an ultrahyperbolic (or pseudo-Riemannian) manifold that seamlessly interleaves hyperbolic and spherical manifolds. In particular, we model each relation as a pseudo-orthogonal transformation that preserves the pseudo-Riemannian bilinear form. The pseudo-orthogonal transformation is decomposed into various operators (i.e., circular rotations, reflections and hyperbolic rotations), allowing for simultaneously modeling heterogeneous structures as well as complex relational patterns. Experimental results on three standard KGs show that UltraE outperforms previous Euclidean- and hyperbolic-based approaches.@inproceedings{xiong22_kdd, title = {Ultrahyperbolic Knowledge Graph Embeddings}, author = {Xiong, Bo and Zhu, Shichao and Nayyeri, Mojtaba and Xu, Chengjin and Pan, Shirui and Zhou, Chuan and Staab, Steffen}, year = {2022}, pages = {1--10}, booktitle = {ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)} } -
Pseudo-Riemannian Graph Convolutional Networks
Bo Xiong, Shichao Zhu, Nico Potyka, Shirui Pan, Chuan Zhou, Steffen Staab
Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS), pp. , 2022.
Graph Convolutional Networks (GCNs) are typically studied through the lens of Euclidean geometry. Non-Euclidean Riemannian manifolds provide specific inductive biases for embedding hierarchical or spherical data, but cannot align well with data of mixed topologies. We consider a larger class of semi-Riemannian manifolds with indefinite metric that generalize hyperboloid and sphere as well as their submanifolds. We develop new geodesic tools that allow for extending neural network operations into geodesically disconnected semi-Riemannian manifolds. As a consequence, we derive a principled Semi-Riemannian GCN that first models data in semi-Riemannian manifolds of constant nonzero curvature in the context of graph neural networks. Our method provides a geometric inductive bias that is sufficiently flexible to model mixed heterogeneous topologies like hierarchical graphs with cycles. Empirical results demonstrate that our method outperforms Riemannian counterparts when embedding graphs of complex topologies.Preprint: https://arxiv.org/abs/2106.03134
@inproceedings{xiong22_neurips, title = {Pseudo-Riemannian Graph Convolutional Networks}, author = {Xiong, Bo and Zhu, Shichao and Potyka, Nico and Pan, Shirui and Zhou, Chuan and Staab, Steffen}, year = {2022}, pages = {}, booktitle = {Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS)}, preprint = {https://arxiv.org/abs/2106.03134} }
2021
Journal Articles
-
Amortized Bayesian Model Comparison with Evidental Deep Learning
Stefan T. Radev, Marco D’Alessandro, Ulf K. Mertens, Andreas Voss, Ullrich Kothe, Paul-Christian Bürkner
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), (), pp. 1–12, 2021.
Comparing competing mathematical models of complex processes is a shared goal among many branches of science. The Bayesian probabilistic framework offers a principled way to perform model comparison and extract useful metrics for guiding decisions. However, many interesting models are intractable with standard Bayesian methods, as they lack a closed-form likelihood function or the likelihood is computationally too expensive to evaluate. In this work, we propose a novel method for performing Bayesian model comparison using specialized deep learning architectures. Our method is purely simulation-based and circumvents the step of explicitly fitting all alternative models under consideration to each observed dataset. Moreover, it requires no hand-crafted summary statistics of the data and is designed to amortize the cost of simulation over multiple models, datasets, and dataset sizes. This makes the method especially effective in scenarios where model fit needs to be assessed for a large number of datasets, so that case-based inference is practically infeasible. Finally, we propose a novel way to measure epistemic uncertainty in model comparison problems. We demonstrate the utility of our method on toy examples and simulated data from nontrivial models from cognitive science and single-cell neuroscience. We show that our method achieves excellent results in terms of accuracy, calibration, and efficiency across the examples considered in this work. We argue that our framework can enhance and enrich model-based analysis and inference in many fields dealing with computational models of natural processes. We further argue that the proposed measure of epistemic uncertainty provides a unique proxy to quantify absolute evidence even in a framework which assumes that the true data-generating model is within a finite set of candidate models.@article{radev21_tnnls, title = {Amortized Bayesian Model Comparison with Evidental Deep Learning}, author = {Radev, Stefan T. and D'Alessandro, Marco and Mertens, Ulf K. and Voss, Andreas and Kothe, Ullrich and Bürkner, Paul-Christian}, journal = {IEEE Transactions on Neural Networks and Learning Systems (TNNLS)}, volume = {}, number = {}, year = {2021}, pages = {1--12}, doi = {10.1109/TNNLS.2021.3124052} } -
Rank-normalization, Folding, and Localization: An Improved Rhat for Assessing Convergence of MCMC (with discussion)
Aki Vehtari, Andrew Gelman, Daniel Simpson, Bob Carpenter, Paul-Christian Bürkner
Bayesian Analysis, 16(2), pp. 667-718, 2021.
Markov chain Monte Carlo is a key computational tool in Bayesian statistics, but it can be challenging to monitor the convergence of an iterative stochastic algorithm. In this paper we show that the convergence diagnostic R̂ of Gelman and Rubin (1992) has serious flaws. Traditional R̂ will fail to correctly diagnose convergence failures when the chain has a heavy tail or when the variance varies across the chains. In this paper we propose an alternative rank-based diagnostic that fixes these problems. We also introduce a collection of quantile-based local efficiency measures, along with a practical approach for computing Monte Carlo error estimates for quantiles. We suggest that common trace plots should be replaced with rank plots from multiple chains. Finally, we give recommendations for how these methods should be used in practice.doi: 10.1214/20-BA1221
@article{vehtari21_ba, title = {Rank-normalization, Folding, and Localization: An Improved Rhat for Assessing Convergence of MCMC (with discussion)}, author = {Vehtari, Aki and Gelman, Andrew and Simpson, Daniel and Carpenter, Bob and Bürkner, Paul-Christian}, journal = {Bayesian Analysis}, volume = {16}, number = {2}, year = {2021}, pages = {667-718}, doi = {10.1214/20-BA1221} }
Conference Papers
-
Efficient Learning of Discrete-Continuous Computation Graphs
David Friede, Mathias Niepert
Advances in Neural Information Processing Systems (NeurIPS), pp. 1–13, 2021.
Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph’s execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the scale parameter of the Gumbel noise perturbations during training improves the learning behavior. Second, we propose dropout residual connections specifically tailored to stochastic, discrete-continuous computation graphs. With an extensive set of experiments, we show that we can train complex discrete-continuous models which one cannot train with standard stochastic softmax tricks. We also show that complex discrete-stochastic models generalize better than their continuous counterparts on several benchmark datasets.@inproceedings{friede21_neurips, title = {Efficient Learning of Discrete-Continuous Computation Graphs}, author = {Friede, David and Niepert, Mathias}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, pages = {1--13}, year = {2021}, url = {https://proceedings.neurips.cc/paper/2021/file/3556a3018cce3076e27dbbf9645b44d5-Paper.pdf} } -
Answering Complex Queries in Knowledge Graphs with Bidirectional Sequence Encoders
Bhushan Kotnis, Carolin Lawrence, Mathias Niepert
Proc. of the AAAI Conference on Artificial Intelligence (AAAI), pp. 4968–4977, 2021.
Representation learning for knowledge graphs (KGs) has focused on the problem of answering simple link prediction queries. In this work we address the more ambitious challenge of predicting the answers of conjunctive queries with multiple missing entities. We propose Bidirectional Query Embedding (BiQE), a method that embeds conjunctive queries with models based on bi-directional attention mechanisms. Contrary to prior work, bidirectional self-attention can capture interactions among all the elements of a query graph. We introduce two new challenging datasets for studying conjunctive query inference and conduct experiments on several benchmark datasets that demonstrate BiQE significantly outperforms state of the art baselines.@inproceedings{kotnis21_aaai, title = {Answering Complex Queries in Knowledge Graphs with Bidirectional Sequence Encoders}, author = {Kotnis, Bhushan and Lawrence, Carolin and Niepert, Mathias}, year = {2021}, booktitle = {Proc. of the AAAI Conference on Artificial Intelligence (AAAI)}, doi = {}, preprint = {https://arxiv.org/abs/2004.02596}, volume = {35}, number = {6}, pages = {4968--4977}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/16630} } -
Explaining Neural Matrix Factorization with Gradient Rollback
Carolin Lawrence, Timo Sztyler, Mathias Niepert
Proc. of the AAAI Conference on Artificial Intelligence (AAAI), pp. 4987–4995, 2021.
Explaining the predictions of neural black-box models is an important problem, especially when such models are used in applications where user trust is crucial. Estimating the influence of training examples on a learned neural model’s behavior allows us to identify training examples most responsible for a given prediction and, therefore, to faithfully explain the output of a black-box model. The most generally applicable existing method is based on influence functions, which scale poorly for larger sample sizes and models. We propose gradient rollback, a general approach for influence estimation, applicable to neural models where each parameter update step during gradient descent touches a smaller number of parameters, even if the overall number of parameters is large. Neural matrix factorization models trained with gradient descent are part of this model class. These models are popular and have found a wide range of applications in industry. Especially knowledge graph embedding methods, which belong to this class, are used extensively. We show that gradient rollback is highly efficient at both training and test time. Moreover, we show theoretically that the difference between gradient rollback’s influence approximation and the true influence on a model’s behavior is smaller than known bounds on the stability of stochastic gradient descent. This establishes that gradient rollback is robustly estimating example influence. We also conduct experiments which show that gradient rollback provides faithful explanations for knowledge base completion and recommender datasets. An implementation and an appendix are available.@inproceedings{lawrence21_aaai, title = {Explaining Neural Matrix Factorization with Gradient Rollback}, author = {Lawrence, Carolin and Sztyler, Timo and Niepert, Mathias}, year = {2021}, booktitle = {Proc. of the AAAI Conference on Artificial Intelligence (AAAI)}, doi = {}, preprint = {https://arxiv.org/abs/2010.05516}, volume = {35}, number = {6}, pages = {4987--4995}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/16632} } -
Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions
Mathias Niepert, Pasquale Minervini, Luca Franceschi
Advances in Neural Information Processing Systems (NeurIPS), pp. 1–13, 2021.
Combining discrete probability distributions and combinatorial optimization problems with neural network components has numerous applications but poses several challenges. We propose Implicit Maximum Likelihood Estimation (I-MLE), a framework for end-to-end learning of models combining discrete exponential family distributions and differentiable neural components. I-MLE is widely applicable as it only requires the ability to compute the most probable states and does not rely on smooth relaxations. The framework encompasses several approaches such as perturbation-based implicit differentiation and recent methods to differentiate through black-box combinatorial solvers. We introduce a novel class of noise distributions for approximating marginals via perturb-and-MAP. Moreover, we show that I-MLE simplifies to maximum likelihood estimation when used in some recently studied learning settings that involve combinatorial solvers. Experiments on several datasets suggest that I-MLE is competitive with and often outperforms existing approaches which rely on problem specific relaxations.@inproceedings{niepert21_neurips, title = {Implicit {MLE}: Backpropagating Through Discrete Exponential Family Distributions}, author = {Niepert, Mathias and Minervini, Pasquale and and Luca Franceschi}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, pages = {1--13}, year = {2021}, url = {https://proceedings.neurips.cc/paper/2021/file/7a430339c10c642c4b2251756fd1b484-Paper.pdf}, preprint = {https://arxiv.org/abs/2106.01798} } -
Thinking Like a Developer? Comparing the Attention of Humans with Neural Models of Code
Matteo Paltenghi, Michael Pradel
Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp. 867–879, 2021.
Neural models of code are successfully tackling various prediction tasks, complementing and sometimes even outperforming traditional program analyses. While most work focuses on end-to-end evaluations of such models, it often remains unclear what the models actually learn, and to what extent their reasoning about code matches that of skilled humans. A poor understanding of the model reasoning risks deploying models that are right for the wrong reason, and taking decisions based on spurious correlations in the training dataset. This paper investigates to what extent the attention weights of effective neural models match the reasoning of skilled humans. To this end, we present a methodology for recording human attention and use it to gather 1,508 human attention maps from 91 participants, which is the largest such dataset we are aware of. Computing human-model correlations shows that the copy attention of neural models often matches the way humans reason about code (Spearman rank coefficients of 0.49 and 0.47), which gives an empirical justification for the intuition behind copy attention. In contrast, the regular attention of models is mostly uncorrelated with human attention. We find that models and humans sometimes focus on different kinds of tokens, e.g., strings are important to humans but mostly ignored by models. The results also show that human-model agreement positively correlates with accurate predictions by a model, which calls for neural models that even more closely mimic human reasoning. Beyond the insights from our study, we envision the release of our dataset of human attention maps to help understand future neural models of code and to foster work on human-inspired models.@inproceedings{paltenghi21_ase, author = {Paltenghi, Matteo and Pradel, Michael}, title = {Thinking Like a Developer? Comparing the Attention of Humans with Neural Models of Code}, booktitle = {Proceedings of the 36th {IEEE/ACM} International Conference on Automated Software Engineering ({ASE})}, pages = {867--879}, year = {2021}, doi = {10.1109/ASE51524.2021.9678712} } -
Semantic Bug Seeding: A Learning-based Approach for Creating Realistic Bugs
Jibesh Patra, Michael Pradel
Proceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering ESEC/FSE, pp. 906–918, 2021.
When working on techniques to address the wide-spread problem of software bugs, one often faces the need for a large number of realistic bugs in real-world programs. Such bugs can either help evaluate an approach, e.g., in form of a bug benchmark or a suite of program mutations, or even help build the technique, e.g., in learning-based bug detection. Because gathering a large number of real bugs is difficult, a common approach is to rely on automatically seeded bugs. Prior work seeds bugs based on syntactic transformation patterns, which often results in unrealistic bugs and typically cannot introduce new, application-specific code tokens. This paper presents SemSeed, a technique for automatically seeding bugs in a semantics-aware way. The key idea is to imitate how a given real-world bug would look like in other programs by semantically adapting the bug pattern to the local context. To reason about the semantics of pieces of code, our approach builds on learned token embeddings that encode the semantic similarities of identifiers and literals. Our evaluation with real-world JavaScript software shows that the approach effectively reproduces real bugs and clearly outperforms a semantics-unaware approach. The seeded bugs are useful as training data for learning-based bug detection, where they significantly improve the bug detection ability. Moreover, we show that SemSeed-created bugs complement existing mutation testing operators, and that our approach is efficient enough to seed hundreds of thousands of bugs within an hour.@inproceedings{patra21_esec, author = {Patra, Jibesh and Pradel, Michael}, editor = {Spinellis, Diomidis and Gousios, Georgios and Chechik, Marsha and Penta, Massimiliano Di}, title = {Semantic Bug Seeding: A Learning-based Approach for Creating Realistic Bugs}, booktitle = {Proceedings of the 29th {ACM} Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering {ESEC/FSE}}, pages = {906--918}, year = {2021}, doi = {10.1145/3468264.3468623} } -
Neural Photofit: Gaze-based Mental Image Reconstruction
Florian Strohm, Ekta Sood, Sven Mayer, Philipp Müller, Mihai Bâce, Andreas Bulling
Proc. IEEE International Conference on Computer Vision (ICCV), pp. 245-254, 2021.
We propose a novel method that leverages human fixations to visually decode the image a person has in mind into a photofit (facial composite). Our method combines three neural networks: An encoder, a scoring network, and a decoder. The encoder extracts image features and predicts a neural activation map for each face looked at by a human observer. A neural scoring network compares the human and neural attention and predicts a relevance score for each extracted image feature. Finally, image features are aggregated into a single feature vector as a linear combination of all features weighted by relevance which a decoder decodes into the final photofit. We train the neural scoring network on a novel dataset containing gaze data of 19 participants looking at collages of synthetic faces. We show that our method significantly outperforms a mean baseline predictor and report on a human study that shows that we can decode photofits that are visually plausible and close to the observer’s mental image. Code and dataset available upon request.@inproceedings{strohm21_iccv, title = {Neural Photofit: Gaze-based Mental Image Reconstruction}, author = {Strohm, Florian and Sood, Ekta and Mayer, Sven and Müller, Philipp and Bâce, Mihai and Bulling, Andreas}, year = {2021}, booktitle = {Proc. IEEE International Conference on Computer Vision (ICCV)}, doi = {10.1109/ICCV48922.2021.00031}, pages = {245-254} } -
IdBench: Evaluating Semantic Representations of Identifier Names in Source Code
Yaza Wainakh, Moiz Rauf, Michael Pradel
Proceedings of the 43rd IEEE/ACM International Conference on Software Engineering (ICSE), pp. 562–573, 2021.
Identifier names convey useful information about the intended semantics of code. Name-based program analyses use this information, e.g., to detect bugs, to predict types, and to improve the readability of code. At the core of name-based analyses are semantic representations of identifiers, e.g., in the form of learned embeddings. The high-level goal of such a representation is to encode whether two identifiers, e.g., len and size, are semantically similar. Unfortunately, it is currently unclear to what extent semantic representations match the semantic relatedness and similarity perceived by developers. This paper presents IdBench, the first benchmark for evaluating semantic representations against a ground truth created from thousands of ratings by 500 software developers. We use IdBench to study state-of-the-art embedding techniques proposed for natural language, an embedding technique specifically designed for source code, and lexical string distance functions. Our results show that the effectiveness of semantic representations varies significantly and that the best available embeddings successfully represent semantic relatedness. On the downside, no existing technique provides a satisfactory representation of semantic similarities, among other reasons because identifiers with opposing meanings are incorrectly considered to be similar, which may lead to fatal mistakes, e.g., in a refactoring tool. Studying the strengths and weaknesses of the different techniques shows that they complement each other. As a first step toward exploiting this complementarity, we present an ensemble model that combines existing techniques and that clearly outperforms the best available semantic representation.@inproceedings{wainakh21_icse, author = {Wainakh, Yaza and Rauf, Moiz and Pradel, Michael}, title = {IdBench: Evaluating Semantic Representations of Identifier Names in Source Code}, booktitle = {Proceedings of the 43rd {IEEE/ACM} International Conference on Software Engineering ({ICSE})}, pages = {562--573}, year = {2021}, doi = {10.1109/ICSE43902.2021.00059} } -
Uncertainty Estimation and Calibration with Finite-State Probabilistic RNNs
Cheng Wang, Carolin Lawrence, Mathias Niepert
Proc. of the Ninth International Conference on Learning Representations (ICLR), 2021.
Uncertainty quantification is crucial for building reliable and trustable machine learning systems. We propose to estimate uncertainty in recurrent neural networks (RNNs) via stochastic discrete state transitions over recurrent timesteps. The uncertainty of the model can be quantified by running a prediction several times, each time sampling from the recurrent state transition distribution, leading to potentially different results if the model is uncertain. Alongside uncertainty quantification, our proposed method offers several advantages in different settings. The proposed method can (1) learn deterministic and probabilistic automata from data, (2) learn well-calibrated models on real-world classification tasks, (3) improve the performance of out-of-distribution detection, and (4) control the exploration-exploitation trade-off in reinforcement learning. An implementation is available.@inproceedings{wang21_iclr, title = {Uncertainty Estimation and Calibration with Finite-State Probabilistic RNNs}, author = {Wang, Cheng and Lawrence, Carolin and Niepert, Mathias}, year = {2021}, booktitle = {Proc. of the Ninth International Conference on Learning Representations (ICLR)}, doi = {}, preprint = {https://arxiv.org/abs/2011.12010}, url = {https://openreview.net/forum?id=9EKHN1jOlA} }
2020
Journal Articles
-
Approximate Leave-future-out Cross-validation for Bayesian Time Series Models
Paul-Christian Bürkner, Jonah Gabry, Aki Vehtari
Journal of Statistical Computation and Simulation, 90(14), pp. 2499–2523, 2020.
One of the common goals of time series analysis is to use the observed series to inform predictions for future observations. In the absence of any actual new data to predict, cross-validation can be used to estimate a model’s future predictive accuracy, for instance, for the purpose of model comparison or selection. Exact cross-validation for Bayesian models is often computationally expensive, but approximate cross-validation methods have been developed, most notably methods for leave-one-out cross-validation (LOO-CV). If the actual prediction task is to predict the future given the past, LOO-CV provides an overly optimistic estimate because the information from future observations is available to influence predictions of the past. To properly account for the time series structure, we can use leave-future-out cross-validation (LFO-CV). Like exact LOO-CV, exact LFO-CV requires refitting the model many times to different subsets of the data. Using Pareto smoothed importance sampling, we propose a method for approximating exact LFO-CV that drastically reduces the computational costs while also providing informative diagnostics about the quality of the approximation.@article{buerkner20_jscs, title = {Approximate Leave-future-out Cross-validation for Bayesian Time Series Models}, author = {Bürkner, Paul-Christian and Gabry, Jonah and Vehtari, Aki}, journal = {Journal of Statistical Computation and Simulation}, volume = {90}, number = {14}, year = {2020}, pages = {2499–2523}, doi = {10.1080/00949655.2020.1783262} } -
Sobolev Norm Learning Rates for Regularized Least-Squares Algorithm
Simon Fischer, Ingo Steinwart
Journal of Machine Learning Research (JMLR), 21, pp. 1–38, 2020.
Learning rates for least-squares regression are typically expressed in terms of L2-norms. In this paper we extend these rates to norms stronger than the L2-norm without requiring the regression function to be contained in the hypothesis space. In the special case of Sobolev reproducing kernel Hilbert spaces used as hypotheses spaces, these stronger norms coincide with fractional Sobolev norms between the used Sobolev space and L2. As a consequence, not only the target function but also some of its derivatives can be estimated without changing the algorithm. From a technical point of view, we combine the well-known integral operator techniques with an embedding property, which so far has only been used in combination with empirical process arguments. This combination results in new finite sample bounds with respect to the stronger norms. From these finite sample bounds our rates easily follow. Finally, we prove the asymptotic optimality of our results in many cases.@article{fischer20_jmlr, author = {Fischer, Simon and Steinwart, Ingo}, title = {Sobolev Norm Learning Rates for Regularized Least-Squares Algorithm}, journal = {Journal of Machine Learning Research (JMLR)}, year = {2020}, volume = {21}, pages = {1--38}, preprint = {https://arxiv.org/abs/1702.07254}, url = {https://www.jmlr.org/papers/volume21/19-734/19-734.pdf} }
Conference Papers
-
Quantification of Users’ Visual Attention During Everyday Mobile Device Interactions
Mihai Bâce, Sander Staal, Andreas Bulling
Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–14, 2020.
We present the first real-world dataset and quantitative evaluation of visual attention of mobile device users in-situ, i.e. while using their devices during everyday routine. Understanding user attention is a core research challenge in mobile HCI but previous approaches relied on usage logs or self-reports that are only proxies and consequently do neither reflect attention completely nor accurately. Our evaluations are based on Everyday Mobile Visual Attention (EMVA) – a new 32-participant dataset containing around 472 hours of video snippets recorded over more than two weeks in real life using the front-facing camera as well as associated usage logs, interaction events, and sensor data. Using an eye contact detection method, we are first to quantify the highly dynamic nature of everyday visual attention across users, mobile applications, and usage contexts. We discuss key insights from our analyses that highlight the potential and inform the design of future mobile attentive user interfaces.@inproceedings{bace20_chi, title = {Quantification of Users' Visual Attention During Everyday Mobile Device Interactions}, author = {B{\^a}ce, Mihai and Staal, Sander and Bulling, Andreas}, year = {2020}, pages = {1--14}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/3313831.3376449}, news = {https://ethz.ch/en/news-and-events/eth-news/news/2020/09/our-actual-attention-is-now-measurable.html}, video = {https://www.youtube.com/watch?v=SzLn3LujIqw} } -
Flexible Prior Elicitation via the Prior Predictive Distribution
Marcelo Hartmann, Georgi Agiashvili, Paul Bürkner, Arto Klami
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 1129-1138, 2020.
The prior distribution for the unknown model parameters plays a crucial role in the process of statistical inference based on Bayesian methods. However, specifying suitable priors is often difficult even when detailed prior knowledge is available in principle. The challenge is to express quantitative information in the form of a probability distribution. Prior elicitation addresses this question by extracting subjective information from an expert and transforming it into a valid prior. Most existing methods, however, require information to be provided on the unobservable parameters, whose effect on the data generating process is often complicated and hard to understand. We propose an alternative approach that only requires knowledge about the observable outcomes - knowledge which is often much easier for experts to provide. Building upon a principled statistical framework, our approach utilizes the prior predictive distribution implied by the model to automatically transform experts judgements about plausible outcome values to suitable priors on the parameters. We also provide computational strategies to perform inference and guidelines to facilitate practical use.@inproceedings{hartmann20_uai, title = {Flexible Prior Elicitation via the Prior Predictive Distribution}, author = {Hartmann, Marcelo and Agiashvili, Georgi and Bürkner, Paul and Klami, Arto}, booktitle = {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)}, pages = {1129-1138}, volume = {124}, year = {2020}, url = {https://proceedings.mlr.press/v124/hartmann20a.html}, preprint = {https://arxiv.org/abs/2002.09868} } -
Predicting Degrees of Technicality in Automatic Terminology Extraction
Anna Hätty, Dominik Schlechtweg, Michael Dorna, Sabine Schulte im Walde
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 2883–2889, 2020.
While automatic term extraction is a well-researched area, computational approaches to distinguish between degrees of technicality are still understudied. We semi-automatically create a German gold standard of technicality across four domains, and illustrate the impact of a web-crawled general-language corpus on technicality prediction. When defining a classification approach that combines general-language and domain-specific word embeddings, we go beyond previous work and align vector spaces to gain comparative embeddings. We suggest two novel models to exploit general- vs. domain-specific comparisons: a simple neural network model with pre-computed comparative-embedding information as input, and a multi-channel model computing the comparison internally. Both models outperform previous approaches, with the multi-channel model performing best.@inproceedings{haetty20_acl, author = {Hätty, Anna and Schlechtweg, Dominik and Dorna, Michael and {Schulte im Walde}, Sabine}, title = {Predicting Degrees of Technicality in Automatic Terminology Extraction}, booktitle = {Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)}, pages = {2883–2889}, year = {2020}, doi = {10.18653/v1/2020.acl-main.258} } -
TypeWriter: Neural Type Prediction with Search-based Validation
Michael Pradel, Georgios Gousios, Jason Liu, Satish Chandra
Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 209–220, 2020.
Maintaining large code bases written in dynamically typed languages, such as JavaScript or Python, can be challenging due to the absence of type annotations: simple data compatibility errors proliferate, IDE support is limited, and APIs are hard to comprehend. Recent work attempts to address those issues through either static type inference or probabilistic type prediction. Unfortunately, static type inference for dynamic languages is inherently limited, while probabilistic approaches suffer from imprecision. This paper presents TypeWriter, the first combination of probabilistic type prediction with search-based refinement of predicted types. TypeWriter’s predictor learns to infer the return and argument types for functions from partially annotated code bases by combining the natural language properties of code with programming language-level information. To validate predicted types, TypeWriter invokes a gradual type checker with different combinations of the predicted types, while navigating the space of possible type combinations in a feedback-directed manner. We implement the TypeWriter approach for Python and evaluate it on two code corpora: a multi-million line code base at Facebook and a collection of 1,137 popular open-source projects. We show that TypeWriter’s type predictor achieves an F1 score of 0.64 (0.79) in the top-1 (top-5) predictions for return types, and 0.57 (0.80) for argument types, which clearly outperforms prior type prediction models. By combining predictions with search-based validation, TypeWriter can fully annotate between 14% to 44% of the files in a randomly selected corpus, while ensuring type correctness. A comparison with a static type inference tool shows that TypeWriter adds many more non-trivial types. TypeWriter currently suggests types to developers at Facebook and several thousands of types have already been accepted with minimal changes.@inproceedings{pradel20_esec, author = {Pradel, Michael and Gousios, Georgios and Liu, Jason and Chandra, Satish}, editor = {Devanbu, Prem and Cohen, Myra B. and Zimmermann, Thomas}, title = {TypeWriter: Neural Type Prediction with Search-based Validation}, booktitle = {Proceedings of the 28th {ACM} Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering ({ESEC/FSE})}, pages = {209--220}, year = {2020}, doi = {10.1145/3368089.3409715} } -
Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention
Ekta Sood, Simon Tannert, Philipp Müller, Andreas Bulling
Advances in Neural Information Processing Systems (NeurIPS), pp. 1–15, 2020.
A lack of corpora has so far limited advances in integrating human gaze data as a supervisory signal in neural attention mechanisms for natural language processing (NLP). We propose a novel hybrid text saliency model (TSM) that, for the first time, combines a cognitive model of reading with explicit human gaze supervision in a single machine learning framework. We show on four different corpora that our hybrid TSM duration predictions are highly correlated with human gaze ground truth. We further propose a novel joint modelling approach to integrate the predictions of the TSM into the attention layer of a network designed for a specific upstream task without the need for task-specific human gaze data. We demonstrate that our joint model outperforms the state of the art in paraphrase generation on the Quora Question Pairs corpus by more than 10% in BLEU-4 and achieves state-of-the-art performance for sentence compression on the challenging Google Sentence Compression corpus. As such, our work introduces a practical approach for bridging between data-driven and cognitive models and demonstrates a new way to integrate human gaze-guided neural attention into NLP tasks.@inproceedings{sood20_neurips, author = {Sood, Ekta and Tannert, Simon and Müller, Philipp and Bulling, Andreas}, title = {Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention}, year = {2020}, pages = {1--15}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, url = {https://proceedings.neurips.cc/paper/2020/hash/460191c72f67e90150a093b4585e7eb4-Abstract.html} }
2019
Journal Articles
-
Getafix: Learning to Fix Bugs Automatically
Johannes Bader, Andrew Scott, Michael Pradel, Satish Chandra
Proceedings of the ACM on Programming Languages, 3(OOPSLA), pp. 159:1–159:27, 2019.
Static analyzers help find bugs early by warning about recurring bug categories. While fixing these bugs still remains a mostly manual task in practice, we observe that fixes for a specific bug category often are repetitive. This paper addresses the problem of automatically fixing instances of common bugs by learning from past fixes. We present Getafix, an approach that produces human-like fixes while being fast enough to suggest fixes in time proportional to the amount of time needed to obtain static analysis results in the first place. Getafix is based on a novel hierarchical clustering algorithm that summarizes fix patterns into a hierarchy ranging from general to specific patterns. Instead of an expensive exploration of a potentially large space of candidate fixes, Getafix uses a simple yet effective ranking technique that uses the context of a code change to select the most appropriate fix for a given bug. Our evaluation applies Getafix to 1,268 bug fixes for six bug categories reported by popular static analyzers for Java, including null dereferences, incorrect API calls, and misuses of particular language constructs. The approach predicts exactly the human-written fix as the top-most suggestion between 12% and 91% of the time, depending on the bug category. The top-5 suggestions contain fixes for 526 of the 1,268 bugs. Moreover, we report on deploying the approach within Facebook, where it contributes to the reliability of software used by billions of people. To the best of our knowledge, Getafix is the first industrially-deployed automated bug-fixing tool that learns fix patterns from past, human-written fixes to produce human-like fixes.doi: 10.1145/3360585
@article{bader19_pl, author = {Bader, Johannes and Scott, Andrew and Pradel, Michael and Chandra, Satish}, title = {Getafix: Learning to Fix Bugs Automatically}, journal = {Proceedings of the ACM on Programming Languages}, volume = {3}, number = {{OOPSLA}}, pages = {159:1--159:27}, year = {2019}, doi = {10.1145/3360585} } -
Learning Rates for Kernel-Based Expectile Regression
Muhammad Farooq, Ingo Steinwart
Machine Learning, 108, pp. 203–227, 2019.
Conditional expectiles are becoming an increasingly important tool in finance as well as in other areas of applications. We analyse a support vector machine type approach for estimating conditional expectiles and establish learning rates that are minimax optimal modulo a logarithmic factor if Gaussian RBF kernels are used and the desired expectile is smooth in a Besov sense. As a special case, our learning rates improves the best known rates for kernel-based least squares regression in aforementioned scenario. Key ingredients of our statistical analysis are a general calibration inequality for the asymmetric least squares loss, a corresponding variance bound as well as an improved entropy number bound for Gaussian RBF kernels.doi: 10.1007/s10994-018-5762-9
Preprint: https://arxiv.org/abs/1702.07552
@article{farooq19_ml, author = {Farooq, Muhammad and Steinwart, Ingo}, title = {Learning Rates for Kernel-Based Expectile Regression}, year = {2019}, volume = {108}, pages = {203--227}, journal = {Machine Learning}, preprint = {https://arxiv.org/abs/1702.07552}, doi = {10.1007/s10994-018-5762-9} } -
MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation
Xucong Zhang, Yusuke Sugano, Mario Fritz, Andreas Bulling
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 41(1), pp. 162-175, 2019.
Learning-based methods are believed to work well for unconstrained gaze estimation, i.e. gaze estimation from a monocular RGB camera without assumptions regarding user, environment, or camera. However, current gaze datasets were collected under laboratory conditions and methods were not evaluated across multiple datasets. Our work makes three contributions towards addressing these limitations. First, we present the MPIIGaze dataset, which contains 213,659 full face images and corresponding ground-truth gaze positions collected from 15 users during everyday laptop use over several months. An experience sampling approach ensured continuous gaze and head poses and realistic variation in eye appearance and illumination. To facilitate cross-dataset evaluations, 37,667 images were manually annotated with eye corners, mouth corners, and pupil centres. Second, we present an extensive evaluation of state-of-the-art gaze estimation methods on three current datasets, including MPIIGaze. We study key challenges including target gaze range, illumination conditions, and facial appearance variation. We show that image resolution and the use of both eyes affect gaze estimation performance, while head pose and pupil centre information are less informative. Finally, we propose GazeNet, the first deep appearance-based gaze estimation method. GazeNet improves on the state of the art by 22% (from a mean error of 13.9 degrees to 10.8 degrees) for the most challenging cross-dataset evaluation.@article{zhang19_pami, title = {MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation}, author = {Zhang, Xucong and Sugano, Yusuke and Fritz, Mario and Bulling, Andreas}, year = {2019}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)}, doi = {10.1109/TPAMI.2017.2778103}, pages = {162-175}, volume = {41}, number = {1} }
Conference Papers
-
Learning Discrete Structures for Graph Neural Networks
Luca Franceschi, Mathias Niepert, Massimiliano Pontil, Xiao He
Proc. of the 36th International Conference on Machine Learning (ICML), pp. 1–11, 2019.
Graph neural networks (GNNs) are a popular class of machine learning models whose major advantage is their ability to incorporate a sparse and discrete dependency structure between data points. Unfortunately, GNNs can only be used when such a graph-structure is available. In practice, however, real-world graphs are often noisy and incomplete or might not be available at all. With this work, we propose to jointly learn the graph structure and the parameters of graph convolutional networks (GCNs) by approximately solving a bilevel program that learns a discrete probability distribution on the edges of the graph. This allows one to apply GCNs not only in scenarios where the given graph is incomplete or corrupted but also in those where a graph is not available. We conduct a series of experiments that analyze the behavior of the proposed method and demonstrate that it outperforms related methods by a significant margin.@inproceedings{franceschi19_icml, title = {Learning Discrete Structures for Graph Neural Networks}, author = {Franceschi, Luca and Niepert, Mathias and Pontil, Massimiliano and He, Xiao}, year = {2019}, booktitle = {Proc. of the 36th International Conference on Machine Learning (ICML)}, preprint = {https://arxiv.org/abs/1903.11960}, pages = {1--11}, url = {http://proceedings.mlr.press/v97/franceschi19a/franceschi19a.pdf} } -
Attending to Future Tokens For Bidirectional Sequence Generation
Carolin Lawrence, Bhushan Kotnis, Mathias Niepert
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1–10, 2019.
Neural sequence generation is typically performed token-by-token and left-to-right. Whenever a token is generated only previously produced tokens are taken into consideration. In contrast, for problems such as sequence classification, bidirectional attention, which takes both past and future tokens into consideration, has been shown to perform much better. We propose to make the sequence generation process bidirectional by employing special placeholder tokens. Treated as a node in a fully connected graph, a placeholder token can take past and future tokens into consideration when generating the actual output token. We verify the effectiveness of our approach experimentally on two conversational tasks where the proposed bidirectional model outperforms competitive baselines by a large margin.doi: 10.18653/v1/D19-1001
Preprint: https://arxiv.org/abs/1908.05915
@inproceedings{lawrence19_emnlp, author = {Lawrence, Carolin and Kotnis, Bhushan and Niepert, Mathias}, title = {Attending to Future Tokens For Bidirectional Sequence Generation}, booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)}, pages = {1--10}, year = {2019}, doi = {10.18653/v1/D19-1001}, preprint = {https://arxiv.org/abs/1908.05915} } -
NL2Type: Inferring JavaScript Function Types from Natural Language Information
Rabee Sohail Malik, Jibesh Patra, Michael Pradel
Proceedings of the 41st IEEE/ACM International Conference on Software Engineering (ICSE), pp. 304–315, 2019.
JavaScript is dynamically typed and hence lacks the type safety of statically typed languages, leading to suboptimal IDE support, difficult to understand APIs, and unexpected runtime behavior. Several gradual type systems have been proposed, e.g., Flow and TypeScript, but they rely on developers to annotate code with types. This paper presents NL2Type, a learning-based approach for predicting likely type signatures of JavaScript functions. The key idea is to exploit natural language information in source code, such as comments, function names, and parameter names, a rich source of knowledge that is typically ignored by type inference algorithms. We formulate the problem of predicting types as a classification problem and train a recurrent, LSTM-based neural model that, after learning from an annotated code base, predicts function types for unannotated code. We evaluate the approach with a corpus of 162,673 JavaScript files from real-world projects. NL2Type predicts types with a precision of 84.1% and a recall of 78.9% when considering only the top-most suggestion, and with a precision of 95.5% and a recall of 89.6% when considering the top-5 suggestions. The approach outperforms both JSNice, a state-of-the-art approach that analyzes implementations of functions instead of natural language information, and DeepTyper, a recent type prediction approach that is also based on deep learning. Beyond predicting types, NL2Type serves as a consistency checker for existing type annotations. We show that it discovers 39 inconsistencies that deserve developer attention (from a manual analysis of 50 warnings), most of which are due to incorrect type annotations.@inproceedings{malik19_icse, author = {Malik, Rabee Sohail and Patra, Jibesh and Pradel, Michael}, editor = {Atlee, Joanne M. and Bultan, Tevfik and Whittle, Jon}, title = {NL2Type: Inferring JavaScript Function Types from Natural Language Information}, booktitle = {Proceedings of the 41st {IEEE/ACM} International Conference on Software Engineering ({ICSE})}, pages = {304--315}, year = {2019}, doi = {10.1109/ICSE.2019.00045} } -
A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains
Dominik Schlechtweg, Anna Hätty, Marco Tredici, Sabine Schulte im Walde
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 732–746, 2019.
We perform an interdisciplinary large-scale evaluation for detecting lexical semantic divergences in a diachronic and in a synchronic task: semantic sense changes across time, and semantic sense changes across domains. Our work addresses the superficialness and lack of comparison in assessing models of diachronic lexical change, by bringing together and extending benchmark models on a common state-of-the-art evaluation task. In addition, we demonstrate that the same evaluation task and modelling approaches can successfully be utilised for the synchronic detection of domain-specific sense divergences in the field of term extraction.doi: 10.18653/v1/P19-1072
@inproceedings{schlechtweg19_acl, author = {Schlechtweg, Dominik and Hätty, Anna and del Tredici, Marco and {Schulte im Walde}, Sabine}, title = {A Wind of Change: Detecting and Evaluating Lexical Semantic Change across Times and Domains}, booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL)}, pages = {732–746}, year = {2019}, doi = {10.18653/v1/P19-1072} } -
State-Regularized Recurrent Neural Networks
Cheng Wang, Mathias Niepert
Proc. of the 36th International Conference on Machine Learning (ICML), pp. 6596–6606, 2019.
Recurrent neural networks are a widely used class of neural architectures with two shortcomings. First, it is difficult to understand what exactly they learn. Second, they tend to work poorly on sequences requiring long-term memorization, despite having this capacity in principle. We aim to address both shortcomings with a class of recurrent networks that use a stochastic state transition mechanism between cell applications. This mechanism, which we term state-regularization, makes RNNs transition between a finite set of learnable states. We evaluate state-regularized RNNs on (1) regular languages for the purpose of automata extraction; (2) nonregular languages such as balanced parentheses, palindromes, and the copy task where external memory is required; and (3) real-word sequence learning tasks for sentiment analysis, visual object recognition, and language modeling. We show that state-regularization simplifies the extraction of finite state automata from the RNN’s state transition dynamics; forces RNNs to operate more like automata with external memory and less like finite state machines; and makes RNNs more interpretable.@inproceedings{wang19_icml, title = {State-Regularized Recurrent Neural Networks}, author = {Wang, Cheng and Niepert, Mathias}, year = {2019}, booktitle = {Proc. of the 36th International Conference on Machine Learning (ICML)}, preprint = {https://arxiv.org/abs/1901.08817}, pages = {6596--6606}, url = {https://proceedings.mlr.press/v97/wang19j.html} } -
Evaluation of Appearance-Based Methods and Implications for Gaze-Based Applications
Xucong Zhang, Yusuke Sugano, Andreas Bulling
Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–13, 2019.
Appearance-based gaze estimation methods that only require an off-the-shelf camera have significantly improved but they are still not yet widely used in the human-computer interaction (HCI) community. This is partly because it remains unclear how they perform compared to model-based approaches as well as dominant, special-purpose eye tracking equipment. To address this limitation, we evaluate the performance of state-of-the-art appearance-based gaze estimation for interaction scenarios with and without personal calibration, indoors and outdoors, for different sensing distances, as well as for users with and without glasses. We discuss the obtained findings and their implications for the most important gaze-based applications, namely explicit eye input, attentive user interfaces, gaze-based user modelling, and passive eye monitoring. To democratise the use of appearance-based gaze estimation and interaction in HCI, we finally present OpenGaze (www.opengaze.org), the first software toolkit for appearance-based gaze estimation and interaction.Code: http://www.opengaze.org/
@inproceedings{zhang19_chi, author = {Zhang, Xucong and Sugano, Yusuke and Bulling, Andreas}, title = {Evaluation of Appearance-Based Methods and Implications for Gaze-Based Applications}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, year = {2019}, doi = {10.1145/3290605.3300646}, pages = {1--13} }
2018
Journal Articles
-
DeepBugs: a Learning Approach to Name-based Bug Detection
Michael Pradel, Koushik Sen
Proceedings of the ACM on Programming Languages, 2(OOPSLA), pp. 147:1–147:25, 2018.
Natural language elements in source code, e.g., the names of variables and functions, convey useful information. However, most existing bug detection tools ignore this information and therefore miss some classes of bugs. The few existing name-based bug detection approaches reason about names on a syntactic level and rely on manually designed and tuned algorithms to detect bugs. This paper presents DeepBugs, a learning approach to name-based bug detection, which reasons about names based on a semantic representation and which automatically learns bug detectors instead of manually writing them. We formulate bug detection as a binary classification problem and train a classifier that distinguishes correct from incorrect code. To address the challenge that effectively learning a bug detector requires examples of both correct and incorrect code, we create likely incorrect code examples from an existing corpus of code through simple code transformations. A novel insight learned from our work is that learning from artificially seeded bugs yields bug detectors that are effective at finding bugs in real-world code. We implement our idea into a framework for learning-based and name-based bug detection. Three bug detectors built on top of the framework detect accidentally swapped function arguments, incorrect binary operators, and incorrect operands in binary operations. Applying the approach to a corpus of 150,000 JavaScript files yields bug detectors that have a high accuracy (between 89% and 95%), are very efficient (less than 20 milliseconds per analyzed file), and reveal 102 programming mistakes (with 68% true positive rate) in real-world code.doi: 10.1145/3276517
@article{pradel18_pl, author = {Pradel, Michael and Sen, Koushik}, title = {DeepBugs: a Learning Approach to Name-based Bug Detection}, journal = {Proceedings of the ACM on Programming Languages}, volume = {2}, number = {{OOPSLA}}, pages = {147:1--147:25}, year = {2018}, doi = {10.1145/3276517} }
Conference Papers
-
Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages
Jeremy Barnes, Roman Klinger, Sabine Schulte im Walde
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL), pp. 2483–2493, 2018.
Sentiment analysis in low-resource languages suffers from a lack of annotated corpora to estimate high-performing models. Machine translation and bilingual word embeddings provide some relief through cross-lingual sentiment approaches. However, they either require large amounts of parallel data or do not sufficiently capture sentiment information. We introduce Bilingual Sentiment Embeddings (BLSE), which jointly represent sentiment information in a source and target language. This model only requires a small bilingual lexicon, a source-language corpus annotated for sentiment, and monolingual word embeddings for each language. We perform experiments on three language combinations (Spanish, Catalan, Basque) for sentence-level cross-lingual sentiment classification and find that our model significantly outperforms state-of-the-art methods on four out of six experimental setups, as well as capturing complementary information to machine translation. Our analysis of the resulting embedding space provides evidence that it represents sentiment information in the resource-poor target language without any annotated data in that language.doi: 10.18653/v1/P18-1231
@inproceedings{barnes18_acl, author = {Barnes, Jeremy and Klinger, Roman and {Schulte im Walde}, Sabine}, title = {Bilingual Sentiment Embeddings: Joint Projection of Sentiment Across Languages}, booktitle = {Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL)}, pages = {2483–2493}, year = {2018}, doi = {10.18653/v1/P18-1231} } -
Learning Sequence Encoders for Temporal Knowledge Graph Completion
Alberto García-Durán, Sebastijan Dumančić, Mathias Niepert
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 4816–4821, 2018.
Research on link prediction in knowledge graphs has mainly focused on static multi-relational data. In this work we consider temporal knowledge graphs where relations between entities may only hold for a time interval or a specific point in time. In line with previous work on static knowledge graphs, we propose to address this problem by learning latent entity and relation type representations. To incorporate temporal information, we utilize recurrent neural networks to learn time-aware representations of relation types which can be used in conjunction with existing latent factorization methods. The proposed approach is shown to be robust to common challenges in real-world KGs: the sparsity and heterogeneity of temporal expressions. Experiments show the benefits of our approach on four temporal KGs. The data sets are available under a permissive BSD-3 license.doi: 10.18653/v1/D18-1516
Preprint: https://arxiv.org/abs/1809.03202
@inproceedings{garciaduran18_emnlp, author = {García-Durán, Alberto and Dumančić, Sebastijan and Niepert, Mathias}, title = {Learning Sequence Encoders for Temporal Knowledge Graph Completion}, booktitle = {Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)}, pages = {4816–4821}, year = {2018}, doi = {10.18653/v1/D18-1516}, preprint = {https://arxiv.org/abs/1809.03202} } -
KBLRN: End-to-End Learning of Knowledge Base Representations with Latent, Relational, and Numerical Features
Alberto García-Durán, Mathias Niepert
Proc. of the 34th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 1–10, 2018.
We present KBLRN, a framework for end-to-end learning of knowledge base representations from latent, relational, and numerical features. KBLRN integrates feature types with a novel combination of neural representation learning and probabilistic product of experts models. To the best of our knowledge, KBLRN is the first approach that learns representations of knowledge bases by integrating latent, relational, and numerical features. We show that instances of KBLRN outperform existing methods on a range of knowledge base completion tasks. We contribute a novel data sets enriching commonly used knowledge base completion benchmarks with numerical features. The data sets are available under a permissive BSD-3 license. We also investigate the impact numerical features have on the KB completion performance of KBLRN.@inproceedings{garciaduran18_uai, title = {KBLRN: End-to-End Learning of Knowledge Base Representations with Latent, Relational, and Numerical Features}, author = {García-Durán, Alberto and Niepert, Mathias}, year = {2018}, booktitle = {Proc. of the 34th Conference on Uncertainty in Artificial Intelligence (UAI)}, preprint = {https://arxiv.org/abs/1709.04676}, pages = {1--10}, url = {http://auai.org/uai2018/proceedings/papers/149.pdf} } -
Training Person-Specific Gaze Estimators from Interactions with Multiple Devices
Xucong Zhang, Michael Xuelin Huang, Yusuke Sugano, Andreas Bulling
Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 1–12, 2018.
Learning-based gaze estimation has significant potential to enable attentive user interfaces and gaze-based interaction on the billions of camera-equipped handheld devices and ambient displays. While training accurate person- and device-independent gaze estimators remains challenging, person-specific training is feasible but requires tedious data collection for each target device. To address these limitations, we present the first method to train person-specific gaze estimators across multiple devices. At the core of our method is a single convolutional neural network with shared feature extraction layers and device-specific branches that we train from face images and corresponding on-screen gaze locations. Detailed evaluations on a new dataset of interactions with five common devices (mobile phone, tablet, laptop, desktop computer, smart TV) and three common applications (mobile game, text editing, media center) demonstrate the significant potential of cross-device training. We further explore training with gaze locations derived from natural interactions, such as mouse or touch input.@inproceedings{zhang18_chi, title = {Training Person-Specific Gaze Estimators from Interactions with Multiple Devices}, author = {Zhang, Xucong and Huang, Michael Xuelin and Sugano, Yusuke and Bulling, Andreas}, year = {2018}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/3173574.3174198}, pages = {1--12} }
2017
Journal Articles
-
A Bernstein-type Inequality for Some Mixing Processes and Dynamical Systems with an Application to Learning
Hanyuan Hang, Ingo Steinwart
Annals of Statistics, 45, pp. 708–743, 2017.
We establish a Bernstein-type inequality for a class of stochastic processes that includes the classical geometrically φ -mixing processes, Rio’s generalization of these processes and many time-discrete dynamical systems. Modulo a logarithmic factor and some constants, our Bernstein-type inequality coincides with the classical Bernstein inequality for i.i.d. data. We further use this new Bernstein-type inequality to derive an oracle inequality for generic regularized empirical risk minimization algorithms and data generated by such processes. Applying this oracle inequality to support vector machines using the Gaussian kernels for binary classification, we obtain essentially the same rate as for i.i.d. processes, and for least squares and quantile regression; it turns out that the resulting learning rates match, up to some arbitrarily small extra term in the exponent, the optimal rates for i.i.d. processes.doi: 10.1214/16-AOS1465
Preprint: http://arxiv.org/pdf/1501.03059v1.pdf
@article{hang17_as, author = {Hang, Hanyuan and Steinwart, Ingo}, title = {A {B}ernstein-type Inequality for Some Mixing Processes and Dynamical Systems with an Application to Learning}, year = {2017}, volume = {45}, pages = {708--743}, journal = {Annals of Statistics}, preprint = {http://arxiv.org/pdf/1501.03059v1.pdf}, doi = {10.1214/16-AOS1465} }
Conference Papers
-
Learning Graph Representations with Embedding Propagation
Alberto García-Durán, Mathias Niepert
Advances in Neural Information Processing Systems (NeurIPS), pp. 5125–5136, 2017.
We propose Embedding Propagation (EP), an unsupervised learning framework for graph-structured data. EP learns vector representations of graphs by passing two types of messages between neighboring nodes. Forward messages consist of label representations such as representations of words and other attributes associated with the nodes. Backward messages consist of gradients that result from aggregating the label representations and applying a reconstruction loss. Node representations are finally computed from the representation of their labels. With significantly fewer parameters and hyperparameters an instance of EP is competitive with and often outperforms state of the art unsupervised and semi-supervised learning methods on a range of benchmark data sets.@inproceedings{garciaduran17_neurips, title = {Learning Graph Representations with Embedding Propagation}, author = {García-Durán, Alberto and Niepert, Mathias}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, pages = {5125–5136}, year = {2017}, url = {https://proceedings.neurips.cc/paper/2017/file/e0688d13958a19e087e123148555e4b4-Paper.pdf}, doi = {10.5555/3295222.3295265}, preprint = {https://arxiv.org/abs/1710.03059} } -
Gaze Embeddings for Zero-Shot Image Classification
Nour Karessli, Zeynep Akata, Bernt Schiele, Andreas Bulling
Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6412-6421, 2017.
Zero-shot image classification using auxiliary information, such as attributes describing discriminative object properties, requires time-consuming annotation by domain experts. We instead propose a method that relies on human gaze as auxiliary information, exploiting that even non-expert users have a natural ability to judge class membership. We present a data collection paradigm that involves a discrimination task to increase the information content obtained from gaze data. Our method extracts discriminative descriptors from the data and learns a compatibility function between image and gaze using three novel gaze embeddings: Gaze Histograms (GH), Gaze Features with Grid (GFG) and Gaze Features with Sequence (GFS). We introduce two new gaze-annotated datasets for fine-grained image classification and show that human gaze data is indeed class discriminative, provides a competitive alternative to expert-annotated attributes, and outperforms other baselines for zero-shot image classification.@inproceedings{karessli17_cvpr, title = {Gaze Embeddings for Zero-Shot Image Classification}, author = {Karessli, Nour and Akata, Zeynep and Schiele, Bernt and Bulling, Andreas}, year = {2017}, booktitle = {Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, pages = {6412-6421}, doi = {10.1109/CVPR.2017.679} } -
Everyday Eye Contact Detection Using Unsupervised Gaze Target Discovery
Xucong Zhang, Yusuke Sugano, Andreas Bulling
Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 193-203, 2017.
Eye contact is an important non-verbal cue in social signal processing and promising as a measure of overt attention in human-object interactions and attentive user interfaces. However, robust detection of eye contact across different users, gaze targets, camera positions, and illumination conditions is notoriously challenging. We present a novel method for eye contact detection that combines a state-of-the-art appearance-based gaze estimator with a novel approach for unsupervised gaze target discovery, i.e. without the need for tedious and time-consuming manual data annotation. We evaluate our method in two real-world scenarios: detecting eye contact at the workplace, including on the main work display, from cameras mounted to target objects, as well as during everyday social interactions with the wearer of a head-mounted egocentric camera. We empirically evaluate the performance of our method in both scenarios and demonstrate its effectiveness for detecting eye contact independent of target object type and size, camera position, and user and recording environment.@inproceedings{zhang17_uist, title = {Everyday Eye Contact Detection Using Unsupervised Gaze Target Discovery}, author = {Zhang, Xucong and Sugano, Yusuke and Bulling, Andreas}, year = {2017}, pages = {193-203}, doi = {10.1145/3126594.3126614}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, video = {https://www.youtube.com/watch?v=ccrS5XuhQpk} }
2016
Conference Papers
-
Automatic Semantic Classification of German Preposition Types: Comparing Hard and Soft Clustering Approaches across Features
Maximilian Köper, Sabine Schulte im Walde
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (ACL), pp. 256–263, 2016.
This paper addresses an automatic classification of preposition types in German, comparing hard and soft clustering approaches and various window- and syntax-based co-occurrence features. We show that (i) the semantically most salient preposition features (i.e., subcategorised nouns) are the most successful, and that (ii) soft clustering approaches are required for the task but reveal quite different attitudes towards predicting ambiguity.doi: 10.18653/v1/P16-2042
@inproceedings{koeper16_acl, author = {Köper, Maximilian and {Schulte im Walde}, Sabine}, title = {Automatic Semantic Classification of German Preposition Types: Comparing Hard and Soft Clustering Approaches across Features}, booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (ACL)}, pages = {256–263}, year = {2016}, doi = {10.18653/v1/P16-2042} } -
Learning Convolutional Neural Networks for Graphs
Mohamed Ahmed Mathias Niepert, Konstantin Kutzkov
Proc. of the 33rd International Conference on Machine Learning (ICML), pp. 2014–2023, 2016.
Numerous important problems can be framed as learning from graph data. We propose a framework for learning convolutional neural networks for arbitrary graphs. These graphs may be undirected, directed, and with both discrete and continuous node and edge attributes. Analogous to image-based convolutional networks that operate on locally connected regions of the input, we present a general approach to extracting locally connected regions from graphs. Using established benchmark data sets, we demonstrate that the learned feature representations are competitive with state of the art graph kernels and that their computation is highly efficient.Preprint: https://arxiv.org/abs/1605.05273
@inproceedings{niepert16_icml, title = {Learning Convolutional Neural Networks for Graphs}, author = {Mathias Niepert, Mohamed Ahmed and Kutzkov, Konstantin}, year = {2016}, booktitle = {Proc. of the 33rd International Conference on Machine Learning (ICML)}, preprint = {https://arxiv.org/abs/1605.05273}, pages = {2014--2023}, doi = {10.5555/3045390.3045603} } -
Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction
Kim-Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (ACL), pp. 454–459, 2016.
We propose a novel vector representation that integrates lexical contrast into distributional vectors and strengthens the most salient features for determining degrees of word similarity. The improved vectors significantly outperform standard models and distinguish antonyms from synonyms with an average precision of 0.66–0.76 across word classes (adjectives, nouns, verbs). Moreover, we integrate the lexical contrast vectors into the objective function of a skip-gram model. The novel embedding outperforms state-of-the-art models on predicting word similarities in SimLex-999, and on distinguishing antonyms from synonyms.doi: 10.18653/v1/P16-2074
@inproceedings{nguyen16_acl, author = {Nguyen, Kim-Anh and {Schulte im Walde}, Sabine and Vu, Ngoc Thang}, title = {Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction}, booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (ACL)}, pages = {454–459}, year = {2016}, doi = {10.18653/v1/P16-2074} } -
Discriminative Gaifman Models
Mathias Niepert
Advances in Neural Information Processing Systems (NeurIPS), pp. 1–9, 2016.
We present discriminative Gaifman models, a novel family of relational machine learning models. Gaifman models learn feature representations bottom up from representations of locally connected and bounded-size regions of knowledge bases (KBs). Considering local and bounded-size neighborhoods of knowledge bases renders logical inference and learning tractable, mitigates the problem of overfitting, and facilitates weight sharing. Gaifman models sample neighborhoods of knowledge bases so as to make the learned relational models more robust to missing objects and relations which is a common situation in open-world KBs. We present the core ideas of Gaifman models and apply them to large-scale relational learning problems. We also discuss the ways in which Gaifman models relate to some existing relational machine learning approaches.@inproceedings{niepert16_neurips, title = {Discriminative Gaifman Models}, author = {Niepert, Mathias}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, pages = {1--9}, year = {2016}, url = {https://proceedings.neurips.cc/paper/2016/file/7c4ede33a62160a19586f6e26eaefacf-Paper.pdf}, doi = {10.5555/3157382.3157479}, preprint = {https://arxiv.org/abs/1610.09369} } -
AggreGaze: Collective Estimation of Audience Attention on Public Displays
Yusuke Sugano, Xucong Zhang, Andreas Bulling
Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 821-831, 2016.
Gaze is frequently explored in public display research given its importance for monitoring and analysing audience attention. However, current gaze-enabled public display interfaces require either special-purpose eye tracking equipment or explicit personal calibration for each individual user. We present AggreGaze, a novel method for estimating spatio-temporal audience attention on public displays. Our method requires only a single off-the-shelf camera attached to the display, does not require any personal calibration, and provides visual attention estimates across the full display. We achieve this by 1) compensating for errors of state-of-the-art appearance-based gaze estimation methods through on-site training data collection, and by 2) aggregating uncalibrated and thus inaccurate gaze estimates of multiple users into joint attention estimates. We propose different visual stimuli for this compensation: a standard 9-point calibration, moving targets, text and visual stimuli embedded into the display content, as well as normal video content. Based on a two-week deployment in a public space, we demonstrate the effectiveness of our method for estimating attention maps that closely resemble ground-truth audience gaze distributions.@inproceedings{sugano16_uist, title = {AggreGaze: Collective Estimation of Audience Attention on Public Displays}, author = {Sugano, Yusuke and Zhang, Xucong and Bulling, Andreas}, year = {2016}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, doi = {10.1145/2984511.2984536}, pages = {821-831}, video = {https://www.youtube.com/watch?v=eFK39S_lgdg} } -
A 3D Morphable Eye Region Model for Gaze Estimation
Erroll Wood, Tadas Baltrušaitis, Louis-Philippe Morency, Peter Robinson, Andreas Bulling
Proc. European Conference on Computer Vision (ECCV), pp. 297-313, 2016.
Morphable face models are a powerful tool, but have previ- ously failed to model the eye accurately due to complexities in its material and motion. We present a new multi-part model of the eye that includes a morphable model of the facial eye region, as well as an anatomy-based eyeball model. It is the first morphable model that accurately captures eye region shape, since it was built from high-quality head scans. It is also the first to allow independent eyeball movement, since we treat it as a separate part. To showcase our model we present a new method for illumination- and head-pose–invariant gaze estimation from a single RGB image. We fit our model to an image through analysis-by-synthesis, solving for eye region shape, texture, eyeball pose, and illumination simul- taneously. The fitted eyeball pose parameters are then used to estimate gaze direction. Through evaluation on two standard datasets we show that our method generalizes to both webcam and high-quality camera images, and outperforms a state-of-the-art CNN method achieving a gaze estimation accuracy of 9.44° in a challenging user-independent scenario.@inproceedings{wood16_eccv, author = {Wood, Erroll and Baltru{\v{s}}aitis, Tadas and Morency, Louis-Philippe and Robinson, Peter and Bulling, Andreas}, title = {A 3D Morphable Eye Region Model for Gaze Estimation}, booktitle = {Proc. European Conference on Computer Vision (ECCV)}, year = {2016}, pages = {297-313}, doi = {10.1007/978-3-319-46448-0_18} } -
Spatio-Temporal Modeling and Prediction of Visual Attention in Graphical User Interfaces
Pingmei Xu, Yusuke Sugano, Andreas Bulling
Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 3299-3310, 2016.
We present a computational model to predict users’ spatio-temporal visual attention for WIMP-style (windows, icons, mouse, pointer) graphical user interfaces. Like existing models of bottom-up visual attention in computer vision, our model does not require any eye tracking equipment. Instead, it predicts attention solely using information available to the interface, specifically users’ mouse and keyboard input as well as the UI components they interact with. To study our model in a principled way we further introduce a method to synthesize user interface layouts that are functionally equivalent to real-world interfaces, such as from Gmail, Facebook, or GitHub. We first quantitatively analyze attention allocation and its correlation with user input and UI components using ground-truth gaze, mouse, and keyboard data of 18 participants performing a text editing task. We then show that our model predicts attention maps more accurately than state-of-the-art methods. Our results underline the significant potential of spatio-temporal attention modeling for user interface evaluation, optimization, or even simulation.@inproceedings{xu16_chi, title = {Spatio-Temporal Modeling and Prediction of Visual Attention in Graphical User Interfaces}, author = {Xu, Pingmei and Sugano, Yusuke and Bulling, Andreas}, year = {2016}, pages = {3299-3310}, booktitle = {Proc. ACM SIGCHI Conference on Human Factors in Computing Systems (CHI)}, doi = {10.1145/2858036.2858479} }
2015
Journal Articles
-
Fully Adaptive Density-Based Clustering
Ingo Steinwart
Annals of Statistics, 43(5), pp. 2132–2167, 2015.
The clusters of a distribution are often defined by the connected components of a density level set. However, this definition depends on the user-specified level. We address this issue by proposing a simple, generic algorithm, which uses an almost arbitrary level set estimator to estimate the smallest level at which there are more than one connected components. In the case where this algorithm is fed with histogram-based level set estimates, we provide a finite sample analysis, which is then used to show that the algorithm consistently estimates both the smallest level and the corresponding connected components. We further establish rates of convergence for the two estimation problems, and last but not least, we present a simple, yet adaptive strategy for determining the width-parameter of the involved density estimator in a data-depending way.doi: 10.1214/15-AOS1331
Preprint: http://arxiv.org/pdf/1409.8437v2.pdf
@article{Steinwart15_as, author = {Steinwart, Ingo}, title = {Fully Adaptive Density-Based Clustering}, journal = {Annals of Statistics}, volume = {43}, number = {5}, pages = {2132--2167}, year = {2015}, preprint = {http://arxiv.org/pdf/1409.8437v2.pdf}, doi = {10.1214/15-AOS1331} } -
Towards an Axiomatic Approach to Hierarchical Clustering of Measures
Philipp Thomann, Ingo Steinwart, Nico Schmid
Journal of Machine Learning Research (JMLR), 16, pp. 1949–2002, 2015.
We propose some axioms for hierarchical clustering of probability measures and investigate their ramifications. The basic idea is to let the user stipulate the clusters for some elementary measures. This is done without the need of any notion of metric, similarity or dissimilarity. Our main results then show that for each suitable choice of user-defined clustering on elementary measures we obtain a unique notion of clustering on a large set of distributions satisfying a set of additivity and continuity axioms. We illustrate the developed theory by numerous examples including some with and some without a density.@article{thomann15_jmlr, author = {Thomann, Philipp and Steinwart, Ingo and Schmid, Nico}, title = {Towards an Axiomatic Approach to Hierarchical Clustering of Measures}, journal = {Journal of Machine Learning Research (JMLR)}, volume = {16}, pages = {1949--2002}, year = {2015}, preprint = {https://arxiv.org/abs/1508.03712}, url = {https://www.jmlr.org/papers/v16/thomann15a.html} }
Conference Papers
-
Orbits: Enabling Gaze Interaction in Smart Watches using Moving Targets
Augusto Esteves, Eduardo Velloso, Andreas Bulling, Hans Gellersen
Proc. ACM Symposium on User Interface Software and Technology (UIST), pp. 457-466, 2015.
We introduce Orbits, a novel gaze interaction technique that enables hands-free input on smart watches. The technique relies on moving controls to leverage the smooth pursuit movements of the eyes and detect whether and at which control the user is looking at. In Orbits, controls include targets that move in a circular trajectory in the face of the watch, and can be selected by following the desired one for a small amount of time. We conducted two user studies to assess the technique’s recognition and robustness, which demonstrated how Orbits is robust against false positives triggered by natural eye movements and how it presents a hands-free, high accuracy way of interacting with smart watches using off-the-shelf devices. Finally, we developed three example interfaces built with Orbits: a music player, a notifications face plate and a missed call menu. Despite relying on moving controls – very unusual in current HCI interfaces – these were generally well received by participants in a third and final study.@inproceedings{esteves15_uist, title = {Orbits: Enabling Gaze Interaction in Smart Watches using Moving Targets}, author = {Esteves, Augusto and Velloso, Eduardo and Bulling, Andreas and Gellersen, Hans}, year = {2015}, booktitle = {Proc. ACM Symposium on User Interface Software and Technology (UIST)}, doi = {10.1145/2807442.2807499}, pages = {457-466} } -
Prediction of Search Targets From Fixations in Open-world Settings
Hosnieh Sattar, Sabine Müller, Mario Fritz, Andreas Bulling
Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 981-990, 2015.
Previous work on predicting the target of visual search from human fixations only considered closed-world settings in which training labels are available and predictions are performed for a known set of potential targets. In this work we go beyond the state of the art by studying search target prediction in an open-world setting in which we no longer assume that we have fixation data to train for the search targets. We present a dataset containing fixation data of 18 users searching for natural images from three image categories within synthesised image collages of about 80 images. In a closed-world baseline experiment we show that we can predict the correct target image out of a candidate set of five images. We then present a new problem formulation for search target prediction in the open-world setting that is based on learning compatibilities between fixations and potential targets.@inproceedings{sattar15_cvpr, author = {Sattar, Hosnieh and M{\"{u}}ller, Sabine and Fritz, Mario and Bulling, Andreas}, title = {Prediction of Search Targets From Fixations in Open-world Settings}, booktitle = {Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2015}, pages = {981-990}, doi = {10.1109/CVPR.2015.7298700} } -
Rendering of Eyes for Eye-Shape Registration and Gaze Estimation
Erroll Wood, Tadas Baltrušaitis, Xucong Zhang, Yusuke Sugano, Peter Robinson, Andreas Bulling
Proc. IEEE International Conference on Computer Vision (ICCV), pp. 3756-3764, 2015.
Images of the eye are key in several computer vision problems, such as shape registration and gaze estimation. Recent large-scale supervised methods for these problems require time-consuming data collection and manual annotation, which can be unreliable. We propose synthesizing perfectly labelled photo-realistic training data in a fraction of the time. We used computer graphics techniques to build a collection of dynamic eye-region models from head scan geometry. These were randomly posed to synthesize close-up eye images for a wide range of head poses, gaze directions, and illumination conditions. We used our model’s controllability to verify the importance of realistic illumination and shape variations in eye-region training data. Finally, we demonstrate the benefits of our synthesized training data (SynthesEyes) by out-performing state-of-the-art methods for eye-shape registration as well as cross-dataset appearance-based gaze estimation in the wild.@inproceedings{wood15_iccv, title = {Rendering of Eyes for Eye-Shape Registration and Gaze Estimation}, author = {Wood, Erroll and Baltru{\v{s}}aitis, Tadas and Zhang, Xucong and Sugano, Yusuke and Robinson, Peter and Bulling, Andreas}, doi = {10.1109/ICCV.2015.428}, year = {2015}, pages = {3756-3764}, booktitle = {Proc. IEEE International Conference on Computer Vision (ICCV)} } -
Appearance-based Gaze Estimation in the Wild
Xucong Zhang, Yusuke Sugano, Mario Fritz, Andreas Bulling
Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4511-4520, 2015.
Appearance-based gaze estimation is believed to work well in real-world settings but existing datasets were collected under controlled laboratory conditions and methods were not evaluated across multiple datasets. In this work we study appearance-based gaze estimation in the wild. We present the MPIIGaze dataset that contains 213,659 images we collected from 15 participants during natural everyday laptop use over more than three months. Our dataset is significantly more variable than existing datasets with respect to appearance and illumination. We also present a method for in-the-wild appearance-based gaze estimation using multimodal convolutional neural networks, which significantly outperforms state-of-the art methods in the most challenging cross-dataset evaluation setting. We present an extensive evaluation of several state-of-the-art image-based gaze estimation algorithm on three current datasets, including our own. This evaluation provides clear insights and allows us identify key research challenges of gaze estimation in the wild.@inproceedings{zhang15_cvpr, author = {Zhang, Xucong and Sugano, Yusuke and Fritz, Mario and Bulling, Andreas}, title = {Appearance-based Gaze Estimation in the Wild}, booktitle = {Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year = {2015}, pages = {4511-4520}, doi = {10.1109/CVPR.2015.7299081}, video = {https://www.youtube.com/watch?v=rw6LZA1USG8} }
2014
Journal Articles
-
A Tutorial on Human Activity Recognition Using Body-worn Inertial Sensors
Andreas Bulling, Ulf Blanke, Bernt Schiele
ACM Computing Surveys, 46(3), pp. 1–33, 2014.
The last 20 years have seen an ever increasing research activity in the field of human activity recognition. With activity recognition having considerably matured so did the number of challenges in designing, implementing and evaluating activity recognition systems. This tutorial aims to provide a comprehensive hands-on introduction for newcomers to the field of human activity recognition. It specifically focuses on activity recognition using on-body inertial sensors. We first discuss the key research challenges that human activity recognition shares with general pattern recognition and identify those challenges that are specific to human activity recognition. We then describe the concept of an activity recognition chain (ARC) as a general-purpose framework for designing and evaluating activity recognition systems. We detail each component of the framework, provide references to related research and introduce the best practise methods developed by the activity recognition research community. We conclude with the educational example problem of recognising different hand gestures from inertial sensors attached to the upper and lower arm. We illustrate how each component of this framework can be implemented for this specific activity recognition problem and demonstrate how different implementations compare and how they impact overall recognition performance.doi: 10.1145/2499621
@article{bulling14_csur, author = {Bulling, Andreas and Blanke, Ulf and Schiele, Bernt}, title = {A Tutorial on Human Activity Recognition Using Body-worn Inertial Sensors}, journal = {ACM Computing Surveys}, volume = {46}, number = {3}, year = {2014}, pages = {1--33}, doi = {10.1145/2499621} }
Conference Papers
-
Chasing Hypernyms in Vector Spaces with Entropy
Enrico Santus, Alessandro Lenci, Qin Lu, Sabine Schulte im Walde
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, (Volume 2: Short Papers) (EACL), pp. 38–42, 2014.
In this paper, we introduce SLQS, a new entropy-based measure for the unsupervised identification of hypernymy and its directionality in Distributional Semantic Models (DSMs). SLQS is assessed through two tasks: (i.) identifying the hypernym in hyponym-hypernym pairs, and (ii.) discriminating hypernymy among various semantic relations. In both tasks, SLQS outperforms other state-of-the-art measures.doi: 10.3115/v1/E14-4008
@inproceedings{santus14_eacl, author = {Santus, Enrico and Lenci, Alessandro and Lu, Qin and {Schulte im Walde}, Sabine}, title = {Chasing Hypernyms in Vector Spaces with Entropy}, booktitle = {Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, (Volume 2: Short Papers) (EACL)}, pages = {38–42}, year = {2014}, doi = {10.3115/v1/E14-4008} }
2013
Conference Papers
-
A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities
Stephen Roller, Sabine Schulte im Walde
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1146–1157, 2013.
Recent investigations into grounded models of language have shown that holistic views of language and perception can provide higher performance than independent views. In this work, we improve a two-dimensional multimodal version of Latent Dirichlet Allocation (Andrews et al., 2009) in various ways. (1) We outperform text-only models in two different evaluations, and demonstrate that low-level visual features are directly compatible with the existing model. (2) We present a novel way to integrate visual features into the LDA model using unsupervised clusters of images. The clusters are directly interpretable and improve on our evaluation tasks. (3) We provide two novel ways to extend the bimodal models to support three or more modalities. We find that the three-, four-, and five-dimensional models significantly outperform models using only one or two modalities, and that nontextual modalities each provide separate, disjoint knowledge that cannot be forced into a shared, latent structure.@inproceedings{roller13_emnlp, author = {Roller, Stephen and {Schulte im Walde}, Sabine}, title = {A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities}, booktitle = {Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP)}, pages = {1146–1157}, year = {2013} } -
Using Subcategorization Knowledge to improve Case Prediction for Translation to German
Marion Weller, Alexander Fraser, Sabine Schulte im Walde
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL), pp. 593–603, 2013.
This paper demonstrates the need and impact of subcategorization information for SMT. We combine (i) features on source-side syntactic subcategorization and (ii) an external knowledge base with quantitative, dependency-based information about target-side subcategorization frames. A manual evaluation of an English-to-German translation task shows that the subcategorization information has a positive impact on translation quality through better prediction of case.@inproceedings{weller13_acl, author = {Weller, Marion and Fraser, Alexander and {Schulte im Walde}, Sabine}, title = {Using Subcategorization Knowledge to improve Case Prediction for Translation to German}, booktitle = {Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL)}, pages = {593–603}, year = {2013} }
2011
Journal Articles
-
Eye Movement Analysis for Activity Recognition Using Electrooculography
Andreas Bulling, Jamie A. Ward, Hans Gellersen, Gerhard Tröster
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 33(4), pp. 741-753, 2011.
In this work we investigate eye movement analysis as a new sensing modality for activity recognition. Eye movement data was recorded using an electrooculography (EOG) system. We first describe and evaluate algorithms for detecting three eye movement characteristics from EOG signals - saccades, fixations, and blinks - and propose a method for assessing repetitive patterns of eye movements. We then devise 90 different features based on these characteristics and select a subset of them using minimum redundancy maximum relevance feature selection (mRMR). We validate the method using an eight participant study in an office environment using an example set of five activity classes: copying a text, reading a printed paper, taking hand-written notes, watching a video, and browsing the web. We also include periods with no specific activity (the NULL class). Using a support vector machine (SVM) classifier and a person-independent (leave-one-out) training scheme, we obtain an average precision of 76.1% and recall of 70.5% over all classes and participants. The work demonstrates the promise of eye-based activity recognition (EAR) and opens up discussion on the wider applicability of EAR to other activities that are difficult, or even impossible, to detect using common sensing modalities.@article{bulling11_pami, author = {Bulling, Andreas and Ward, Jamie A. and Gellersen, Hans and Tr{\"{o}}ster, Gerhard}, title = {Eye {M}ovement {A}nalysis for {A}ctivity {R}ecognition {U}sing {E}lectrooculography}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)}, volume = {33}, number = {4}, year = {2011}, pages = {741-753}, doi = {10.1109/TPAMI.2010.86} }
Conference Papers
-
Optimal Learning Rates for Least Squares SVMs Using Gaussian Kernels
Mona Eberts, Ingo Steinwart
Advances in Neural Information Processing Systems (NeurIPS), pp. 1539–1547, 2011.
We prove a new oracle inequality for support vector machines with Gaussian RBF kernels solving the regularized least squares regression problem. To this end, we apply the modulus of smoothness. With the help of the new oracle inequality we then derive learning rates that can also be achieved by a simple data-dependent parameter selection method. Finally, it turns out that our learning rates are asymptotically optimal for regression functions satisfying certain standard smoothness conditions.@inproceedings{eberts11_neurips, title = {Optimal Learning Rates for Least Squares {SVM}s Using {G}aussian Kernels}, author = {Eberts, Mona and Steinwart, Ingo}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, pages = {1539--1547}, year = {2011}, url = {https://proceedings.neurips.cc/paper/2011/file/51ef186e18dc00c2d31982567235c559-Paper.pdf}, volume = {24} }
2010
Conference Papers
-
Universal Kernels on Non-Standard Input Spaces
Andreas Christmann, Ingo Steinwart
Advances in Neural Information Processing Systems (NeurIPS), pp. 406–414, 2010.
During the last years support vector machines (SVMs) have been successfully applied even in situations where the input space X; is not necessarily a subset of ℝd. Examples include SVMs using probability measures to analyse e.g. histograms or coloured images, SVMs for text classification and web mining, and SVMs for applications from computational biology using, e.g., kernels for trees and graphs. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H ⊂ Lp (PX) is dense, or if the SVM is based on a universal kernel k. So far, however, there are no RKHSs of practical interest known that satisfy these assumptions if X ⊄ ℝd. We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of ℝd. We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing.@inproceedings{christmann10_neurips, title = {Universal Kernels on Non-Standard Input Spaces}, author = {Christmann, Andreas and Steinwart, Ingo}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, pages = {406--414}, year = {2010}, volume = {23}, url = {https://papers.nips.cc/paper/2010/hash/4e0cb6fb5fb446d1c92ede2ed8780188-Abstract.html} }
2009
Journal Articles
-
Consistency of Support Vector Machines for Forecasting the Evolution of an Unknown Ergodic Dynamical System from Observations with Unknown Noise
Ingo Steinwart, Marian Anghel
Annals of Statistics, 37, pp. 841–875, 2009.
We consider the problem of forecasting the next (observable) state of an unknown ergodic dynamical system from a noisy observation of the present state. Our main result shows, for example, that support vector machines (SVMs) using Gaussian RBF kernels can learn the best forecaster from a sequence of noisy observations if (a) the unknown observational noise process is bounded and has a summable α-mixing rate and (b) the unknown ergodic dynamical system is defined by a Lipschitz continuous function on some compact subset of ℝd and has a summable decay of correlations for Lipschitz continuous functions. In order to prove this result we first establish a general consistency result for SVMs and all stochastic processes that satisfy a mixing notion that is substantially weaker than α -mixing.doi: 10.1214/07-AOS562
Preprint: http://arxiv.org/pdf/0707.0322
@article{steinwart09_as, author = {Steinwart, Ingo and Anghel, Marian}, title = {Consistency of Support Vector Machines for Forecasting the Evolution of an Unknown Ergodic Dynamical System from Observations with Unknown Noise}, journal = {Annals of Statistics}, volume = {37}, pages = {841--875}, year = {2009}, preprint = {http://arxiv.org/pdf/0707.0322}, doi = {10.1214/07-AOS562} }
Conference Papers
-
Fast Learning from Non-i.i.d. Observations
Ingo Steinwart, Andreas Christmann
Advances in Neural Information Processing Systems (NeurIPS), pp. 1768–1776, 2009.
We prove an oracle inequality for generic regularized empirical risk minimization algorithms learning from α-mixing processes. To illustrate this oracle inequality, we use it to derive learning rates for some learning methods including least squares SVMs. Since the proof of the oracle inequality uses recent localization ideas developed for independent and identically distributed (i.i.d.) processes, it turns out that these learning rates are close to the optimal rates known in the i.i.d. case.@inproceedings{steinwart09_neurips, title = {Fast Learning from Non-i.i.d. Observations}, author = {Steinwart, Ingo and Christmann, Andreas}, booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, volume = {22}, pages = {1768--1776}, year = {2009}, url = {https://papers.nips.cc/paper/2009/hash/a89cf525e1d9f04d16ce31165e139a4b-Abstract.html} }
2008
Conference Papers
-
Combining EM Training and the MDL Principle for an Automatic Verb Classification incorporating Selectional Preferences
Sabine Schulte im Walde, Christian Hying, Christian Scheible, Helmut Schmid
Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 496–504, 2008.
This paper presents an innovative, complex approach to semantic verb classification that relies on selectional preferences as verb properties. The probabilistic verb class model underlying the semantic classes is trained by a combination of the EM algorithm and the MDL principle, providing soft clusters with two dimensions (verb senses and subcategorisation frames with selectional preferences) as a result. A language-model-based evaluation shows that after 10 training iterations the verb class model results are above the baseline results.@inproceedings{schulteimwalde08_acl, author = {{Schulte im Walde}, Sabine and Hying, Christian and Scheible, Christian and Schmid, Helmut}, title = {Combining EM Training and the MDL Principle for an Automatic Verb Classification incorporating Selectional Preferences}, booktitle = {Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL)}, pages = {496–504}, year = {2008} }
Books
-
Support Vector Machines
Ingo Steinwart, Andreas Christmann
2008.
Explains the principles that make support vector machines a successful modelling and prediction tool for a variety of applications. Rigorous treatment of state-of-the-art results on support vector machines. Suitable for both graduate students and researchers in statistical machine learning.@book{steinwart08_svm, author = {Steinwart, Ingo and Christmann, Andreas}, year = {2008}, title = {Support Vector Machines}, publisher = {Springer}, address = {New York}, doi = {10.1007/978-0-387-77242-4} }
2007
Journal Articles
-
Fast rates for support vector machines using Gaussian kernels
Ingo Steinwart, Clint Scovel
Annals of Statistics, 35, pp. 575–607, 2007.
For binary classification we establish learning rates up to the order of n-1 for support vector machines (SVMs) with hinge loss and Gaussian RBF kernels. These rates are in terms of two assumptions on the considered distributions: Tsybakov’s noise assumption to establish a small estimation error, and a new geometric noise condition which is used to bound the approximation error. Unlike previously proposed concepts for bounding the approximation error, the geometric noise assumption does not employ any smoothness assumption.Paper: https://www.jstor.org/stable/25463569
Preprint: http://arxiv.org/pdf/0708.1838
@article{steinwart07_as, author = {Steinwart, Ingo and Scovel, Clint}, journal = {Annals of Statistics}, title = {Fast rates for support vector machines using {G}aussian kernels}, volume = {35}, pages = {575--607}, year = {2007}, preprint = {http://arxiv.org/pdf/0708.1838}, url = {https://www.jstor.org/stable/25463569} }
2005
Journal Articles
-
A Classification Framework for Anomaly Detection
Ingo Steinwart, Don Hush, Clint Scovel
Journal of Machine Learning Research (JMLR), 6, pp. 211–232, 2005.
One way to describe anomalies is by saying that anomalies are not concentrated. This leads to the problem of finding level sets for the data generating density. We interpret this learning problem as a binary classification problem and compare the corresponding classification risk with the standard performance measure for the density level problem. In particular it turns out that the empirical classification risk can serve as an empirical performance measure for the anomaly detection problem. This allows us to compare different anomaly detection algorithms empirically, i.e. with the help of a test set. Furthermore, by the above interpretation we can give a strong justification for the well-known heuristic of artificially sampling ’labeled’ samples, provided that the sampling plan is well chosen. In particular this enables us to propose a support vector machine (SVM) for anomaly detection for which we can easily establish universal consistency. Finally, we report some experiments which compare our SVM to other commonly used methods including the standard one-class SVM.@article{steinwart05_jmlr, author = {Steinwart, Ingo and Hush, Don and Scovel, Clint}, journal = {Journal of Machine Learning Research (JMLR)}, title = {A Classification Framework for Anomaly Detection}, year = {2005}, pages = {211--232}, volume = {6}, url = {https://www.jmlr.org/papers/v6/steinwart05a.html}, doi = {10.5555/1046920.1058109} }
2004
Journal Articles
-
On Robustness Properties of Convex Risk Minimization Methods for Pattern Recognition
Andreas Christmann, Ingo Steinwart
Journal of Machine Learning Research (JMLR), 5, pp. 1007–1034, 2004.
The paper brings together methods from two disciplines: machine learning theory and robust statistics. We argue that robustness is an important aspect and we show that many existing machine learning methods based on the convex risk minimization principle have - besides other good properties - also the advantage of being robust. Robustness properties of machine learning methods based on convex risk minimization are investigated for the problem of pattern recognition. Assumptions are given for the existence of the influence function of the classifiers and for bounds on the influence function. Kernel logistic regression, support vector machines, least squares and the AdaBoost loss function are treated as special cases. Some results on the robustness of such methods are also obtained for the sensitivity curve and the maxbias, which are two other robustness criteria. A sensitivity analysis of the support vector machine is given.@article{christmann04_jmlr, author = {Christmann, Andreas and Steinwart, Ingo}, title = {On Robustness Properties of Convex Risk Minimization Methods for Pattern Recognition}, journal = {Journal of Machine Learning Research (JMLR)}, volume = {5}, year = {2004}, pages = {1007--1034}, url = {http://www.jmlr.org/papers/volume5/christmann04a/christmann04a.pdf}, doi = {10.5555/1005332.1016792} }