Audio-visual Keyword Spotting for Mandarin Based on Discriminative Local Spatial-Temporal Descriptors | IEEE Conference Publication | IEEE Xplore