In Reinforcement Learning, it is common to numerically encode the observed information provided to the autonomous agent based on the context of the problem specification. This makes it challenging to transfer knowledge between problems if the encoded representations do not align. Therefore, we consider language based problems where pre-trained encoders are used that provide a consistent representation and allow for the transfer of knowledge between linguistically similar observations. However, to define a problem with language, a human must first provide language annotations for each observation in the environment which is defined as direct supervision. Instead, this work introduces a novel solution where human provided natural language instructions are completed unsupervised that can be used as the language annotations. Furthermore, we assume that the same human can provide feedback on whether the instruction is completed correctly and therefore only requires weak supervision to provide annotations. To enable such a method, language in some form must exist in the observation space and therefore a solution based on those seen in the generation of Text Games is considered. Specifically, template based language is defined using a set of generation rules to map the observation's features to a language form. The advantage of our method is that both the instructions and template rules can be defined by a domain expert who may not understand Reinforcement Learning. So far, no work in Reinforcement Learning considers: 1) the systematic introduction of language using language generation rules to problems that otherwise would be solved numerically and, 2) abstraction to natural language with an unsupervised instruction following approach. Lastly, this thesis proposes a software framework (HELIOS) which improves the efficiency and standardises the creation of language generation and instruction following approaches for Reinforcement Learning problems. The efficacy and limitations of the proposed contributions are evaluated within a set of benchmark problems as well as within a large state-space Chess environment. We find that for Chess, defining language with template rules can provide results that are as good or better than the original numeric representation and natural language annotations extracted from a public Chess forum. The final demonstration for this is performed using a Sailing simulation problem where results show the impact instructions have on the path of the boat and how this leads to reaching the goal successfully. When instructions are completed unsupervised both the number of episodes required for training and the amount of bad outcomes in early exploration is reduced. In summary, the main contribution of this work is to design a generally applicable framework for the introduction of natural language to a Reinforcement Learning solution which can: 1) minimise the need for direct supervision and, 2) facilitate the transfer of linguistic knowledge for improved generalisability.
Date of Award | 1 Aug 2024 |
---|
Original language | English |
---|
Awarding Institution | - The University of Manchester
|
---|
Supervisor | Andre Freitas (Supervisor) & Jonathan Shapiro (Supervisor) |
---|
- text games
- real world
- natural language
- instruction following
- reinforcement learning
Improving Real-World Reinforcement Learning by Self Completing Human Instructions on Rule Defined Language
Osborne, P. (Author). 1 Aug 2024
Student thesis: Phd