Auto-regressive pre-trained transformer models, herein referred to as transformer models, have emerged as the de facto state-of-the-art approach for performing formal reasoning over natural language. However, the extent to which these models can truly acquire the logical semantics of natural language and the rules of inference remains a subject of ongoing debate within the research community.
Accordingly, in this thesis, we pose the following central question: Can transformer models learn the logical semantics of natural language and the rules of inference? To answer this question, we identify two key problems that are particularly well-suited to provide insight into this matter. First, we explore the problem of model-checking with natural language, which serves to investigate the first component of our question—namely, whether transformer models can learn the logical semantics of natural language. Second, we explore the problem of natural language satisfiability, through which we investigate whether these models are capable of learning the rules of inference.
In our exploration, we employ various language fragments by systematically varying grammatical constructs with logical significance. In the context of natural language satisfiability, such variations directly influence the computational complexity of the problem space, enabling us to examine the impact of computational complexity on the reasoning capabilities of transformer models. To ensure a rigorous and faithful evaluation, we take deliberate measures to avoid common pitfalls associated with data synthesis, particularly those arising from random sampling. For instance, in the case of natural language satisfiability, we ensure that the datasets contain a sufficient number of challenging problem instances by sampling from the phase-change region—the region in the clause-variable space where the probability of satisfiability is approximately 0.5. By adopting such a methodologically sound evaluation framework, we derive several key insights. First, linguistic constructs such as generalised quantifiers, binary predicates, and anaphora exert a significant influence on the ability of transformer models to solve model-checking tasks. Second, transformer models can learn the logical semantics of natural language if the underlying problem is simple. Third, the computation complexity of the problem has a significant effect on transformer models' ability to perform reasoning. Fourth, despite their impressive capabilities, these models are still unable to learn the rules of inference.
- Natural Language Processing
- Language Models
- Natural Language Inference
- Sentence-Level semantics
Reasoning with Natural Language: Probing Transformer models' ability to perform formal reasoning in natural language
Batawala Acharige, T. (Author). 9 Jun 2025
Student thesis: Phd