I. Introduction
In recent years, the emergence of collaborative robots have the advantage of light-weight, easy to disassemble and install, and with higher safety. The potential of human-robot co-working is very promising. How human and machine work can together has been an important issue. In order to make the human-robot interaction more intelligent, we believe that the natural language can help human and machine exchanges messages and cooperation between each other. The interaction will be carried out through voice and comprehended with the help of computer vision to achieve a better human-robot interaction. In this paper we show how to integrate the natural language understanding with the computer vision for a collaborative robot interaction.