Loading [MathJax]/extensions/MathMenu.js
Active Learning on Service Providing Model: Adjustment of Robot Behaviors Through Human Feedback | IEEE Journals & Magazine | IEEE Xplore

Active Learning on Service Providing Model: Adjustment of Robot Behaviors Through Human Feedback


Abstract:

As robots are put into humans' daily life, the assigned tasks to robots are varied, and the different needs of people interacting with robots are immense. As a result, wh...Show More

Abstract:

As robots are put into humans' daily life, the assigned tasks to robots are varied, and the different needs of people interacting with robots are immense. As a result, when facing different users, it is important for robots to personalize the interactions and provide user-desired services. This paper, therefore, proposes a learning strategy on the service-providing model. Through human feedback, the strategy enables the robot to learn the users' needs, as well as preferences, and adjust its behaviors. Here, we assume that users' needs and preferences may vary with time; hence the goal of this paper is to let the adjustment of robot behaviors be able to adapt to those variations. In turn, the service-providing model of the robot could adjust online as well. That is, it can select a new action from those favorable actions that have already been selected or an action that is not an unfavorable action but has annoyed humans recently. To implement our system, the service robot under discussion is applied to the home environment. For performance evaluation, we have performed extensive experiments that satisfactorily demonstrate that our robot can provide services to different users and adapt to their preference change.
Published in: IEEE Transactions on Cognitive and Developmental Systems ( Volume: 10, Issue: 3, September 2018)
Page(s): 701 - 711
Date of Publication: 23 November 2017

ISSN Information:

Funding Agency:

References is not available for this document.

I. Introduction

Human–robot interaction (HRI) [1] has emerged as a research field that is dedicated to understanding, designing, and evaluating robotic systems for use by or with humans. While multimodal channels and sensor fusion allow for more robust communications between them, study on how to improve the natural HRI is still undergoing. The result on human acceptance of the robot [26] can be deemed as a preliminary pioneer that may be helpful to developing natural communication interfaces. After shifting from a robot-centered point of view, more and more researchers have been drawn to study the interactions and reciprocal effects between robots and humans. A new generation of robots is expected to adapt to users, to interact with them naturally, and to participate in their daily lives. One important perspective is that robots should interact with humans in their most natural ways.

Select All
1.
M. Goodrich and A. Schultz, "Human–robot interaction: A survey" in Foundations and Trends in Human–Computer Interaction, Hanover, MA, USA:Now, 2007.
2.
T. Kuriyama and Y. Kuniyoshi, "Co-creation of human–robot interaction rules through response prediction and habituation/dishabituation", Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), pp. 4990-4995, Oct. 2009.
3.
T. Taha, J. V. Miró and G. Dissanayake, "A POMDP framework for modelling human interaction with assistive robots", Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 544-549, May 2011.
4.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Cambridge, MA, USA:MIT Press, 1998.
5.
P. Kormushev, B. Ugurlu, S. Calinon, N. G. Tsagarakis and D. G. Caldwell, "Bipedal walking energy minimization by reinforcement learning with evolving policy parameterization", Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), pp. 318-324, Sep. 2011.
6.
T. Hester, M. Quinlan and P. Stone, "Generalized model learning for reinforcement learning on a humanoid robot", Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 2369-2374, May 2010.
7.
P. Henry, C. Vollmer, B. Ferris and D. Fox, "Learning to navigate through crowded environments", Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 981-986, May 2010.
8.
W. B. Knox and P. Stone, "Reinforcement learning from human reward: Discounting in episodic tasks", Proc. 21st IEEE Int. Symp. Robot Human Interact. Commun. (RO-MAN), pp. 878-885, Sep. 2012.
9.
A. L. Thomaz, G. Hoffman and C. Breazeal, "Real-time interactive reinforcement learning for robots", Proc. AAAI Workshop Human Comprehensible Mach. Learn., pp. 1000-1005, 2005.
10.
R. Stiefelhagen et al., "Enabling multimodal human–robot interaction for the Karlsruhe humanoid robot", IEEE Trans. Robot., vol. 23, no. 5, pp. 840-851, Oct. 2007.
11.
A. Atrash and J. Pineau, "A Bayesian reinforcement learning approach for customizing human–robot interfaces", Proc. 13th Int. Conf. Intell. User Interfaces, pp. 355-360, 2009.
12.
N. Mitsunaga, C. Smith, T. Kanda, H. Ishiguro and N. Hagita, "Adapting robot behavior for human–robot interaction", IEEE Trans. Robot., vol. 24, no. 4, pp. 911-916, Aug. 2008.
13.
Á. Castro-González, F. Amirabdollahian, D. Polani, M. Malfaz and M. A. Salichs, "Robot self-preservation and adaptation to user preferences in game play a preliminary study", Proc. IEEE Int. Conf. Robot. Biomech. (ROBIO), pp. 2491-2498, Dec. 2011.
14.
J. Chan and G. Nejat, "A learning-based control architecture for an assistive robot providing social engagement during cognitively stimulating activities", Proc. IEEE Int. Conf. Robot. Autom. (ICRA), pp. 3928-3933, 2011.
15.
M. L. Puterman, Markov Decision Processes, New York, NY, USA:Wiley, 1994.
16.
R. I. Brafman and M. Tennenholtz, "R-max—A general polynomial time algorithm for near-optimal reinforcement learning", J. Mach. Learn. Res., vol. 3, pp. 213-231, Oct. 2002.
17.
C. M. Cannon and M. R. Bseikri, "Is dopamine required for natural reward?", Physiol. Behav., vol. 81, no. 5, pp. 741-748, 2004.
18.
W. Schultz, "Predictive reward signal of dopamine neurons", J. Neurophysiol., vol. 8, no. 1, pp. 1-27, 1998.
19.
C. D. Fiorillo, P. N. Tobler and W. Schultz, "Discrete coding of reward probability and uncertainty by dopamine neurons", Science, vol. 299, no. 5614, pp. 1898-1902, 2003.
20.
E. M. Palmer, T. S. Horowitz, A. Torralba and J. M. Wolfe, "What are the shapes of response time distributions in visual search?", J. Exp. Psychol. Human Perception Perform., vol. 37, no. 1, pp. 58-71, 2011.
21.
W. E. Hockley, "Analysis of response time distributions in the study of cognitive processes", J. Exp. Psychol. Learn. Memory Cogn., vol. 10, no. 4, pp. 598-615, 1984.
22.
C. Watkins, "Learning from delayed rewards", 1989.
23.
E. M. Palmer, T. S. Horowitz, A. Torralba and J. M. Wolfe, "What are the shapes of response time distributions in visual search?", J. Exp. Psychol. Human Perception Perform., vol. 37, no. 1, pp. 58-71, 2011.
24.
B. Rosman, M. Hawasly and S. Ramamoorthy, "Bayesian policy reuse", Mach. Learn., vol. 104, no. 1, pp. 99-127, 2016.
25.
P. Hernandez-Leal, E. M. de Cote and L. E. Sucar, "A framework for learning and planning against switching strategies in repeated games", Connection Sci., vol. 26, no. 2, pp. 103-122, 2014.
26.
A. B. Koku, A. Sekmen and A. Alford, "Towards socially acceptable robots", Proc. IEEE Int. Conf. Syst. Man Cybern., pp. 894-899, 2000.
Contact IEEE to Subscribe

References

References is not available for this document.