Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases

Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases | IEEE Conference Publication | IEEE Xplore

IEEE Account

Purchase Details

Profile Information

Need Help?