Acoustic VR in the mouth: A real-time speech-driven visual tongue system | IEEE Conference Publication | IEEE Xplore

Acoustic VR in the mouth: A real-time speech-driven visual tongue system


Abstract:

We propose an acoustic-VR system that converts acoustic signals of human language (Chinese) to realistic 3D tongue animation sequences in real time. It is known that dire...Show More

Abstract:

We propose an acoustic-VR system that converts acoustic signals of human language (Chinese) to realistic 3D tongue animation sequences in real time. It is known that directly capturing the 3D geometry of the tongue at a frame rate that matches the tongue's swift movement during the language production is challenging. This difficulty is handled by utilizing the electromagnetic articulography (EMA) sensor as the intermediate medium linking the acoustic data to the simulated virtual reality. We leverage Deep Neural Networks to train a model that maps the input acoustic signals to the positional information of pre-defined EMA sensors based on 1,108 utterances. Afterwards, we develop a novel reduced physics-based dynamics model for simulating the tongue's motion. Unlike the existing methods, our deformable model is nonlinear, volume-preserving, and accommodates collision between the tongue and the oral cavity (mostly with the jaw). The tongue's deformation could be highly localized which imposes extra difficulties for existing spectral model reduction methods. Alternatively, we adopt a spatial reduction method that allows an expressive subspace representation of the tongue's deformation. We systematically evaluate the simulated tongue shapes with real-world shapes acquired by MRI/CT. Our experiment demonstrates that the proposed system is able to deliver a realistic visual tongue animation corresponding to a user's speech signal.
Date of Conference: 18-22 March 2017
Date Added to IEEE Xplore: 06 April 2017
ISBN Information:
Electronic ISSN: 2375-5334
Conference Location: Los Angeles, CA, USA
References is not available for this document.

1 Introduction

The human tongue is a muscular organ that plays an essential role during speech production. A high-quality visual representation of the human tongue for specific speech sounds is of importance in the domain of speech research and has numerous potential applications. For example, in the rehabilitation of speech disorders [16], a realistic visualization of 3D tongue motion could provide a visible paradigm that helps an individual achieve the correct articulation of the tongue during the production of various speech sounds.

Select All
1.
Y. S. Akgul, C. Kambhamettu and M. Stone, "Automatic extraction and tracking of the tongue contours", IEEE Transactions on Medical Imaging, vol. 18, no. 10, pp. 1035-1045, 1999.
2.
S. S. An, T. Kim and D. L. James, "Optimizing cubature for efficient integration of subspace deformations", ACM Trans. Graph., vol. 27, no. 5, pp. 165:1-165:10, Dec 2008.
3.
B. S. Atal, J. J. Chang, M. V. Mathews and J. W. Tukey, "Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique", The Journal of the Acoustical Society of America, vol. 63, no. 5, pp. 1535-1555, 1978.
4.
P. Badin, G. Bailly, L. Reveret, M. Baciu, C. Segebarth and C. Savari-aux, "Three-dimensional linear articulatory modeling of tongue lips and face based on mri and video images", Journal of Phonetics, vol. 30, no. 3, pp. 533-553, 2002.
5.
T. Baer, J. Gore, S. Boyce and P. Nye, "Application of mri to the analysis of speech production", Magnetic resonance imaging, vol. 5, no. 1, pp. 1-7, 1987.
6.
J. Barbič and D. L. James, "Real-time subspace integration for st. venant-kirchhoff deformable models" in ACM transactions on graphics (TOG), ACM, vol. 24, pp. 982-990, 2005.
7.
R. W. Bisseling and A. L. Hof, "Handling of impact forces in inverse dynamics", Journal of biomechanics, vol. 39, no. 13, pp. 2438-2444, 2006.
8.
S. Buchaillard, P. Perrier and Y. Payan, "A biomechanical model of cardinal vowel production: Muscle activations and the impact of gravity on tongue positioning", The Journal of the Acoustical Society of America, vol. 126, no. 4, pp. 2033-2051, 2009.
9.
M. G. Choi and H.-S. Ko, "Modal warping: Real-time simulation of large rotational deformation and manipulation", IEEE Transactions on Visualization and Computer Graphics, vol. 11, no. 1, pp. 91-101, Jan. 2005.
10.
P. Cignoni, C. Rocchini and R. Scopigno, "Metro: measuring error on simplified surfaces" in Computer Graphics Forum, Wiley Online Library, vol. 17, pp. 167-174, 1998.
11.
M. Fu, M. S. Barlaz, J. L. Holtrop, J. L. Perry, D. P. Kuehn, R. K. Shosted, et al., "High-frame-rate full-vocal-tract 3d dynamic speech imaging", Magnetic resonance in medicine, 2016.
12.
J. M. Gérard, J. Ohayon, V. Luboz, P. Perrier and Y. Payan, "Indentation for estimating the human tongue soft tissues constitutive law: application to a 3d biomechanical model" in Medical Simulation, Springer, pp. 77-83, 2004.
13.
J. M. Gérard, P. Perrier and Y. Payan, "3d biomechanical tongue modeling to study speech production", Speech production: Models phonetic processes and techniques, pp. 85-102, 2006.
14.
K. O. Grenab, "Atlas of topographical and applied human anatomy", head and neck, 1965.
15.
B. P. Halpern, "Functional anatomy of the tongue and mouth of mammals" in Drinking behavior, Springer, pp. 1-92, 1977.
16.
M. N. Hegde, Introduction to communicative disorders. Pro Ed, 1995.
17.
M. Hirayama, E. Vatikiotis-Bateson and M. Kawato, "Inverse dynamics of speech motor control", Advances in neural information processing systems, pp. 1043-1050, 1994.
18.
S. Hiroya and M. Honda, "Estimation of articulatory movements from speech acoustics using an hmm-based speech production model", IEEE Transactions on Speech and Audio Processing, vol. 12, no. 2, pp. 175-185, 2004.
19.
T. Hueber, G. Aversano, G. Cholle, B. Denby, G. Dreyfus, Y. Oussar, et al., "Eigentongue feature extraction for an ultrasound-based silent speech interface", 2007 IEEE International Conference on Acoustics Speech and Signal Processing-ICASSP07, vol. 1, pp. 1-1245, 2007.
20.
P. W. Iltis, J. Frahm, D. Voit, A. A. Joseph, E. Schoonderwaldt and E. Altenmüller, "High-speed real-time magnetic resonance imaging of fast tongue movements in elite horn players", Quantitative imaging in medicine and surgery, vol. 5, no. 3, pp. 374-381, 2015.
21.
S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift", arXiv preprint arXiv:1502.03167, 2015.
22.
G. Irving, C. Schroeder and R. Fedkiw, "Volume conserving finite element simulations of deformable models", ACM Trans. Graph., vol. 26, no. 3, July 2007.
23.
R. D. Kent, "Researchon speech motor control and its disorders: A review and prospective", Journal of Communication disorders, vol. 33, no. 5, pp. 391-428, 2000.
24.
S. A. King and R. E. Parent, "A 3d parametric tongue model for animated speech", The Journal of Visualization and ComputerAnimation, vol. 12, no. 3, pp. 107-115, 2001.
25.
M. Li, C. Kambhamettu and M. Stone, "Automatic contour tracking in ultrasound images", Clinical linguistics phonetics, vol. 19, no. 6–7, pp. 545-554, 2005.
26.
S. G. Lingala, B. P. Sutton, M. E. Miquel and K. S. Nayak, "Recommendations for real-time speech mri", Journal of Magnetic Resonance Imaging, vol. 43, no. 1, pp. 28-44, 2016.
27.
S. G. Lingala, Y. Zhu, Y.-C. Kim, A. Toutios, S. Narayanan and K. S. Nayak, "A fast and flexible mri system for the study of dynamic vocal tract shaping", Magnetic resonance in medicine, 2016.
28.
A. J. Lundberg and M. Stone, "Three-dimensional tongue surface reconstruction: Practical considerations for ultrasound data", The Journal of the Acoustical Society of America, vol. 106, no. 5, pp. 2858-2867, 1999.
29.
J. Luo, K. Ying and J. Bai, "Savitzky—golay smoothing and differentiation filter for even number data", Signal Processing, vol. 85, no. 7, pp. 1429-1434, 2005.
30.
S. Mcleod, "Speech—language pathologists knowledge of tongue/palate contact for consonants", Clinical linguistics phonetics, vol. 25, no. 11-12, pp. 1004-1013, 2011.
Contact IEEE to Subscribe

References

References is not available for this document.