Journals & Magazines >IEEE Transactions on Systems,... >Volume: 37 Issue: 2

Developmental Word Grounding Through a Growing Neural Network With a Humanoid Robot

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

This paper presents an unsupervised approach of integrating speech and visual information without using any prepared data. The approach enables a humanoid robot, Incremen...Show More

Metadata

Abstract:

This paper presents an unsupervised approach of integrating speech and visual information without using any prepared data. The approach enables a humanoid robot, Incremental Knowledge Robot 1 (IKR1), to learn word meanings. The approach is different from most existing approaches in that the robot learns online from audio-visual input, rather than from stationary data provided in advance. In addition, the robot is capable of learning incrementally, which is considered to be indispensable to lifelong learning. A noise-robust self-organized growing neural network is developed to represent the topological structure of unsupervised online data. We are also developing an active-learning mechanism, called "desire for knowledge", to let the robot select the object for which it possesses the least information for subsequent learning. Experimental results show that the approach raises the efficiency of the learning process. Based on audio and visual data, they construct a mental model for the robot, which forms a basis for constructing IKR1's inner world and builds a bridge connecting the learned concepts with current and past scenes

Published in: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) ( Volume: 37, Issue: 2, April 2007)

Page(s): 451 - 462

Date of Publication: 12 March 2007

ISSN Information:

PubMed ID: 17416171

DOI: 10.1109/TSMCB.2006.885309

References is not available for this document.

Contents

I. Introduction

As human beings, we discriminate various concepts through formation of “iconic representations” of them. Judgments of their resemblance or difference are based on similarity and difference comparisons of these iconic representations [1]. We also interpret their meaning through language. Therefore, in a sense, we ground the meanings of language to its perceptual context. For a robot to be more like a human, it must understand the sound patterns of words and understand their meanings. It must ground language in its world as mediated by its perceptual, motor, and cognitive capacities. Under such a scenario, it must analyze the current scene along with the associated utterance, integrate the extracted information, and then finally acquire their meanings.

Select All

S. Harnad, "The symbol grounding problem", Physica D, vol. 42, no. 13, pp. 335-346, Jun. 1990.

CrossRef Google Scholar

J. Siskind, "A computational study of cross-situational techniques for learning word-to-meaning mappings", Cognition, vol. 61, no. 1/2, pp. 39-91, Oct./Nov. 1996.

CrossRef Google Scholar

J. Siskind, "Learning word-to-meaning mappings" in Models of Language Acquisition: Inductive and Deductive Approaches, U.K., London:Oxford Univ. Press, pp. 121-153, Jul. 2000.

Google Scholar

S. Wachsmuth, G. Socher, H. Brandt-Pook, F. Kummert and G. Sagerer, "Integration of vision and speech understanding using Bayesian networks", Videre: J. Comput. Vis. Res., vol. 1, no. 4, pp. 61-83, 2000.

CrossRef Google Scholar

A. L. Gorin, D. Petrovksa-Delacretaz, G. Riccardi and J. Wright, "Learning spoken language without transcriptions", Proc. IEEE Workshop Speech Recog. and Understanding, pp. 293-296, 1999.

Google Scholar

T. Regier, The Human Semantic Potential: Spatial Language and Constrained Connectionism, MA, Cambridge:MIT Press, 1996.

Google Scholar

T. Oates, Z. Eyler-Walker and P. R. Cohen, "Toward natural language interfaces for robotic agents: Grounding linguistic meaning in sensors", Proc. 4th Int. Conf. Auton. Agents, pp. 227-228, 2000.

CrossRef Google Scholar

D. Roy and A. Pentland, "Learning words from sights and sounds: A computational model", Cogn. Sci., vol. 26, no. 1, pp. 113-146, 2002.

CrossRef Google Scholar

N. Iwahashi, "Language acquisition through a humanrobot interface by combining speech visual and behavioral information", Inf. Sci., vol. 156, no. 1/2, pp. 109-121, Nov. 2003.

CrossRef Google Scholar

10.

N. Iwahashi, "Active and unsupervised learning of spoken words through a multimodal interface", Proc. 13th IEEE Workshop Robot and Human Interactive Commun., pp. 437-442, 2004.

View Article

Google Scholar

11.

C. Yu and D. Ballard, "On the integration of grounding language and learning objects", Proc. 19th Nat. Conf. Artif. Intell. (AAAI), pp. 488-494, 2004-Jul.

Google Scholar

12.

L. Steels and P. Vogt, "Grounding adaptive language games in robotic agents", Proc. ECAL, pp. 474-482, 1997.

Google Scholar

13.

L. Steels and J.-C. Baillie, "Shared grounding of event descriptions by autonomous robots", Robot. Auton. Syst., vol. 43, no. 2/3, pp. 163-173, 2003.

CrossRef Google Scholar

14.

L. Steels, F. Kaplan, A. McIntyre and J. Van Looveren, "Crucial factors in the origins of word-meaning" in The Transition to Language, U.K., Oxford:Oxford Univ. Press, pp. 252-271, 2002.

View Article

Google Scholar

15.

P. Vogt, "The emergence of compositional structures in perceptually grounded language games", Artif. Intell., vol. 167, no. 1/2, pp. 206-242, Sep. 2005.

CrossRef Google Scholar

16.

J. Weng, J. McClelland, A. Pentland, O. Sporns, I. Stockman, M. Sur, et al., "Autonomous mental development by robots and animals", Science, vol. 291, no. 5504, pp. 599-600, 2001.

CrossRef Google Scholar

17.

J. Elman, "Learning and development in neural networks: The importance of starting small", Cognition, vol. 48, no. 1, pp. 71-99, 1993.

CrossRef Google Scholar

18.

S. Thrun and T. Mitchell, "Learning one more thing", Proc. IJCAI, pp. 1217-1223, 1995-Aug.

View Article

Google Scholar

19.

S. Imai, T. Kobayashi, K. Tokuda, T. Masuko and K. Koishida, Speech signal processing toolkit: Sptk version 3.0, 2002.

Google Scholar

20.

A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, MA, Boston:Kluwer, 1992.

CrossRef Google Scholar

21.

C. S. Myers and L. R. Rabiner, "A comparative study of several dynamic time-warping algorithms for connected word recognition", Bell Syst. Tech. J., vol. 60, no. 7, pp. 1389-1409, Sep. 1981.

View Article

Google Scholar

22.

F. Shen and O. Hasegawa, "An incremental network for on-line unsupervised classification and topology learning", Neural Netw., vol. 19, no. 1, pp. 90-106, Jan. 2006.

CrossRef Google Scholar

23.

D. Roy, K. Hsiao and N. Mavridis, "Mental imagery for a conversational robot", IEEE Trans. Syst. Man Cybern. B Cybern., vol. 34, no. 3, pp. 1374-1383, Jun. 2004.

View Article

Google Scholar

24.

K. Hsiao, N. Mavridis and D. Roy, "Coupling perception and simulation: Steps towards conversational robotics", Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., pp. 928-933, 2003.

View Article

Google Scholar

25.

J. Piaget and B. Inhelder, The Childs Conception of Space, U.K., London:Routledge and Kegan Paul, 1956.

Google Scholar

References is not available for this document.

Developmental Word Grounding Through a Growing Neural Network With a Humanoid Robot

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Developmental Word Grounding Through a Growing Neural Network With a Humanoid Robot

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?