Video Dialog via Multi-Grained Convolutional Self-Attention Context Multi-Modal Networks | IEEE Journals & Magazine | IEEE Xplore