1. Introduction
With the development of deep learning, the ability of computer to perceive and understand visual information is evolving rapidly. Remarkable results have been achieved in the fields of vision, and a large number of excellent models have been applied. However, in terms of music content understanding, the work is progressing slowly. Although several related tasks have been put forward, these are not enough to fully congnize music. Music auto-tagging aims to annotate music with a series of tags, and the purpose of music captioning is to give us a simple description of the music. Neither of these can effectively help people congnize music content. This paper therefore propose the Music Question Answering (MQA). It takes music and questions as input, and predicts the answers as the output. We hope this can be a new chapter in music cognition.