Fusing Temporally Distributed Multi-Modal Semantic Clues for Video Question Answering | IEEE Conference Publication | IEEE Xplore