Abstract:
Automatic speech recognition (ASR) results based on clean references are much more accurate than those based on ASR transcripts in spoken language understanding (SLU). Ef...Show MoreNotes: This DOI was registered to an article that was not presented by the author(s) at this conference. As per section 8.2.1.B.13 of IEEE's "Publication Services and Products Board Operations Manual," IEEE has chosen to exclude this article from distribution. We regret any inconvenience.
Metadata
Abstract:
Automatic speech recognition (ASR) results based on clean references are much more accurate than those based on ASR transcripts in spoken language understanding (SLU). Effective utilization of manually-checked clean transcripts is key to improving SLU performance. This paper proposes a siamese network with contrastive learning to enhance SLU effects. A siamese network on sentence pairs that are composed of ASR transcripts and clean transcripts is used for the SLU task. During training, contrastive learning brings closer the sentence-level semantic representations of ASR transcripts and clean transcripts. During inference, k-nearest neighbors (KNN) semantic search via the siamese network first finds the pseudo clean transcript, then forms a sentence pair based on the ASR transcript and pseudo clean transcript for prediction. Experiments on three benchmark datasets prove the effectiveness of our proposed approach, which improves the Intent Classification (IC) performance by over 1.3% on the SLURP dataset.
Notes: This DOI was registered to an article that was not presented by the author(s) at this conference. As per section 8.2.1.B.13 of IEEE's "Publication Services and Products Board Operations Manual," IEEE has chosen to exclude this article from distribution. We regret any inconvenience.
Published in: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information: