Learning to Assemble Neural Module Tree Networks for Visual Grounding | IEEE Conference Publication | IEEE Xplore