Orthogonal Region Selection Network for Laryngeal Closure Detection in Laryngoscopy Videos | IEEE Conference Publication | IEEE Xplore

Orthogonal Region Selection Network for Laryngeal Closure Detection in Laryngoscopy Videos


Abstract:

Vocal folds (VFs) play a critical role in breathing, swallowing, and speech production. VF dysfunctions caused by various medical conditions can significantly reduce pati...Show More

Abstract:

Vocal folds (VFs) play a critical role in breathing, swallowing, and speech production. VF dysfunctions caused by various medical conditions can significantly reduce patients' quality of life and lead to life-threatening conditions such as aspiration pneumonia, caused by food and/or liquid "invasion" into the windpipe. Laryngeal endoscopy is routinely used in clinical practice to inspect the larynx and to assess the VF function. Unfortunately, the resulting videos are only visually inspected, leading to loss of valuable information that can be used for early diagnosis and disease or treatment monitoring. In this paper, we propose a deep learning-based image analysis solution for automated detection of laryngeal adductor reflex (LAR) events in laryngeal endoscopy videos. Laryngeal endoscopy image analysis is a challenging task because of anatomical variations and various imaging problems. Analysis of LAR events is further challenging because of data imbalance since these are rare events. In order to tackle this problem, we propose a deep learning system that consists of a two-stream network with a novel orthogonal region selection subnetwork. To our best knowledge, this is the first deep learning network that learns to directly map its input to a VF open/close state without first segmenting or tracking the VF region, which drastically reduces labor-intensive manual annotation needed for mask or track generation. The proposed two-stream network and the orthogonal region selection subnetwork allow integration of local and global information for improved performance. The experimental results show promising performance for the automated, objective, and quantitative analysis of LAR events from laryngeal endoscopy videos.Clinical relevance- This paper presents an objective, quantitative, and automatic deep learning based system for detection of laryngeal adductor reflex (LAR) events in laryngoscopy videos.
Date of Conference: 20-24 July 2020
Date Added to IEEE Xplore: 27 August 2020
ISBN Information:

ISSN Information:

PubMed ID: 33018436
Conference Location: Montreal, QC, Canada

I. Introduction

The vocal folds (VFs) are a pair of muscles in the larynx (voice box), the tubular structure that connects the throat to the windpipe (trachea). The VFs function like a valve in the upper airway, opening and closing as needed for breathing, swallowing, and speaking [24]. Therefore, VF dysfunction can result in breathing difficulty (dyspnea), swallowing dysfunction (dysphagia), and/or voice impairment (dysphonia), all of which can significantly reduce the patient’s quality of life [31]. However, dyspnea and dysphagia can become life-threatening as a result of VF "valving" impairment – inappropriate VF closure can obstruct breathing, while inappropriate VF opening can allow food and liquid into the airway, leading to choking and/or lung infection (aspiration pneumonia) [21]. While numerous medical conditions can result in life-threatening VF dysfunction, the most prevalent are neurological disorders (e.g., stroke, Parkinson’s disease, and amyotrophic lateral sclerosis) and head and neck cancer [27]. Aspiration pneumonia, is a leading cause of morbidity and mortality in these conditions/diseases [9], thus highlighting the clinical need for improved medical management of VF dysfunction. Our proposed solution entails incorporating deep learning-based automated video analysis methods into the current clinical test for VF dysfunction.

Contact IEEE to Subscribe

References

References is not available for this document.