Understanding natural language sentences with word embedding and multi-modal interaction

Junpei Zhong, Tetsuya Ogata, Angelo Cangelosi, Chenguang Yang

Research output: Chapter in Book/Conference proceedingConference contributionpeer-review

Abstract

Understanding and grounding human commands with natural languages have been a fundamental requirement for service robotic applications. Although there have been several attempts toward this goal, the bottleneck still exists to store and process the corpora of natural language in an interaction system. Currently, the neural- and statistical-based (N&S) natural language processing have shown potential to solve this problem. With the availability of large data-sets nowadays, these processing methods are able to extract semantic relationships while parsing a corpus of natural language (NL) text without much human design, compared with the rule-based language processing methods. In this paper, we show that how two N&S based word embedding methods, called Word2vec and GloVe, can be used in natural language understanding as pre-training tools in a multi-modal environment. Together with two different multiple time-scale recurrent neural models, they form hybrid neural language understanding models for a robot manipulation experiment.

Original languageEnglish
Title of host publication7th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, ICDL-EpiRob 2017
PublisherIEEE
Pages184-189
Number of pages6
Volume2018-January
ISBN (Electronic)9781538637159
DOIs
Publication statusPublished - 2 Apr 2018
Event7th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, ICDL-EpiRob 2017 - Lisbon, Portugal
Duration: 18 Sept 201721 Sept 2017

Conference

Conference7th Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics, ICDL-EpiRob 2017
Country/TerritoryPortugal
CityLisbon
Period18/09/1721/09/17

Fingerprint

Dive into the research topics of 'Understanding natural language sentences with word embedding and multi-modal interaction'. Together they form a unique fingerprint.

Cite this