Introduction
Myocardial contrast echocardiography (MCE) is a cardiac ultrasound imaging tool that utilizes microbubbles as contrast agents. The microbubbles are injected intravenously and flow within the blood vessels. This can improve endocardial visualization and the assessment of left ventricle (LV) structure and function, complementing the conventional B-mode echo [1]. Furthermore, MCE can assess myocardial perfusion through the controlled destruction and replenishment of microbubbles [2]. Such perfusion information is useful for the diagnosis of coronary artery diseases (CAD) [1].
However, the analysis of MCE has been restricted to human visual assessment. Such qualitative assessment is time consuming and relies heavily on the experience of the clinician [3]. Automatic MCE quantification is desired because it is faster, more accurate and less operator-dependent. Quantification can involve the measurements of LV volumes, ejection fraction, myocardial volumes and thickness. It can also involve the assessment of myocardial perfusion by analyzing myocardial intensity changes over time. Myocardial segmentation is a widely used method that serves as an intermediate step to obtaining such MCE quantifications. However, manual segmentation is time-consuming and requires high level of expertise and training. The motivation of this paper is to develop an automatic approach for fast and accurate myocardial segmentation in MCE data. The resultant segmentation can additionally be used as input for subsequent tasks such as initialization for tracking algorithms [4].
Although there have been much work on segmentation in B-mode echocardiography, most of these methods do not work well on MCE. MCE utilizes contrast-enhanced ultrasound imaging technique to detect microbubbles by retaining non-linear signals from microbubble oscillations while removing other linear signals from tissue. This is fundamentally different from B-mode echo which simply reflects and captures the linear tissue signals. This results in very different image appearances between MCE and B-mode echo. Specifically, one has to consider the following challenges when performing segmentation on MCE data.
Intensity variations in the image due to ultrasound speckle noise, shadowing and attenuation artifacts, low signal-to-noise ratio, contrast changes over time during microbubble destruction and perfusion imaging [5].
Geometrical variations in the pose and shape of the myocardium due to different apical chamber views, heart motion and probe motion. Each chamber view is acquired at a different probe position and captures a different 2D cross-section of the LV, resulting in variations of the 2D myocardial shape and orientation.
Misleading intensity information such as 1) presence of structures (papillary muscle) with similar appearance to the myocardium, 2) weak image gradient information resulting in unclear myocardial border (especially epicardium).
Speckle patterns in MCE are decorrelated due to the highly dynamic bubble signals as opposed to the static tissue speckle patterns in B-mode [6]. This poses challenges to some tracking algorithms which work by finding corresponding speckle patterns in different frames.
In this paper, we propose a fully automatic method that segments the myocardium on 2D MCE image. The method is based on random forest (RF) [7] and incorporates global shape information into the RF framework through the use of a statistical shape model that explicitly captures the myocardial shape. This provides stronger and more meaningful structural constraints that guide the RF segmentation more accurately. Our method takes the advantages of both the RF and the shape model in order to address the above challenges of MCE myocardial segmentation. The RF has strong local discriminative power and serves as a good intensity/appearance model that captures the large intensity variations of the MCE data. The shape model captures the myocardial shape variations of the MCE data and imposes a global shape constraint to guide the RF segmentation. This avoids inaccurate segmentation due to misleading intensity information on the MCE. Fig. 1 shows the overall pipeline of our myocardial segmentation approach. The main contributions of our proposed method are:
A novel shape model (SM) feature is introduced which incorporates the shape model into the RF framework. The feature improves RF segmentation by generating smoother and more coherent RF probability map that conforms to myocardial structure and shape. (Fig. 1c-d)
A shape model fitting algorithm is developed to fit the shape model to the RF probability map to produce a smooth and plausible myocardial contour. (Fig. 1d-e)
A convolutional neural network (CNN) is employed to automatically detect a bounding box enclosing the myocardium. This pre-processing step removes any pose variations of the myocardium and improves the subsequent RF segmentation using SM features. The two-stage process of bounding box detection followed by shape inference is similar to [9]. (Fig. 1a-b)
The approach is extended to full-cycle 2D+t MCE sequences in which a temporal constraint is imposed to ensure temporal consistency in the sequence segmentations.
The proposed approach is evaluated on a larger dataset of 2D and 2D+t MCE data from 21 subjects.
Overall pipeline of our myocardial segmentation approach. The three main components of the approach are bounding box detection (a-b), RF segmentation (c-d) and shape model fitting (d-e).
The rest of this paper is structured as follows. In Section II, we discuss some existing approaches related to cardiac segmentation. In Section III, we describe the different components of our segmentation method and extend it to sequence segmentation. In Section IV, we describe the experiments conducted to evaluate the proposed approach. In Section V, we report the segmentation results of our approach and show that it outperforms the other state-of-the-art methods. We conclude the paper in Section VI and discuss some limitations and future directions.
Related Work
In this section, we give a brief overview of the segmentation methods used for cardiac ultrasound and highlight their limitations when applied to our MCE data. Current literature focus mainly on either LV segmentation [9]–[15] or myocardial segmentation [4], [16]–[19] in B-mode echocardiography. There are fewer of them addressing segmentation in MCE [3], [20]–[22].
Existing methods can be classified into two categories. The first category defines the segmentation task as a contour finding problem in which the optimal contour is found by an optimization procedure based on image information regularized by constraints applied on the contour. Two representative approaches in this category are the active contour [23] and active shape model (ASM) [24]. Active contour uses a parametric representation of the curves regularized by geometrical constraints. It is used to detect endocardial border in short-axis B-mode echo by Mishra et al [10] and Mignotte and Meunier [11]. Malpica et al. [21] perform myocardial segmentation in MCE sequences using a coupled active contour which imposes distance constraints between the epi and endocardial contours guided by inter-frame motion estimation derived from optical flow. However, active contour approaches depend on edge information which is unreliable on MCE image. Unclear epicardial border and the presence of papillary muscle often lead to spurious contour estimation. Level sets adopted by Lin et al. [12] combines edge and region information for LV segmentation. However, region information is adversely affected by the large intensity variations in MCE. A recent paper by Pedrosa et al. [18] uses the B-spline explicit active surfaces framework for 3D B-mode myocardial segmentation and they compare different ways of coupling epi and endocardial segmentations.
ASM is another widely used approach in cardiac segmentation. It builds a statistical shape model from a set of training shapes by Principal Component Analysis (PCA). The model is then deformed to fit an image in ways similar to the variations observed in the training set. Butakoff et al. [19] uses an automatically constructed ASM for myocardial segmentation in 3D B-mode echo. Pickard et al. [22] uses ASM for MCE myocardial segmentation and they apply a specialized gradient vector flow field to increase its edge capture range. Bosch et al. [13] extends ASM by using a joint model of shape, appearance and motion to detect time-continuous LV endocardial contours over time-normalized 2D+t B-mode sequences. The advantage of ASM is that it can account for the myocardial shape variations in our MCE data and provide shape prior to help in cases where the myocardial border is unclear. However, the linear intensity model used by ASM is insufficient to characterize the appearance of MCE data which exhibit huge intensity variations. Therefore, a more powerful non-linear intensity model is required. Furthermore, some of the above methods require good manual initialization of the contour to prevent the solution from getting stuck in poor local minima.
The second category uses machine learning techniques to solve the segmentation task which is often cast as a pixel-wise classification problem. For example, Binder et al. [25] use an artificial neural network for pixel classification while Xiao et al. [26] combine maximum a posteriori and Markov random field methods for region-based segmentation. For the purpose of this paper, we focus on the RF [7] as the classifier due to its accuracy and computational efficiency. RF has been successfully applied to various segmentation tasks in the medical imaging field [27]–[29].
RF is well-suited for our MCE data because it effectively builds a non-linear appearance model based on local intensity regions. Therefore, it can better cope with the large intensity variations of the MCE data compared to the simple linear intensity model of ASM. However, the classic RF that utilizes only local appearance features has some limitations. First, the intensity information in MCE may be misleading and result in inaccurate segmentation. For instance, the RF can misclassify the papillary muscle as the myocardium because they have similar intensity. The epicardial boundary is also often difficult to identify based on intensity information alone. Second, the RF classifier assigns a class label to each pixel independently. Structural and contextual relationships between the pixel labels are ignored [28], [30]. This results in segmentation with inconsistent pixel labelling which leads to unsmooth boundaries, false detections in the background and holes in the regions of interests. Lastly, RF outputs an intermediate probability map which needs to be post-processed to obtain the final segmentation.
To overcome the above limitations of the classic RF, several works have incorporated local contextual information into the RF framework. Lempitsky et al. [17] demonstrates promising myocardial delineation on 3D B-mode echo by using image pixel coordinates as position feature for the RF in order to learn the shape of the myocardium implicitly. Verhoek et al. [4] further extends the method by using optical flow to propagate the single-frame RF segmentation for sequence tracking. Montillo et al. [28] introduces the entangled RF which uses intermediate probability predictions from higher levels of the tree as features for training the deeper levels. Kontschieder et al. [30] introduces the structured RF that incorporates structural relationships into the output predictions by predicting structured class labels for a patch region rather than an independent class label for each individual pixel. This is used by Oktay et al. [15] to extract a new boundary representation of the 3D B-mode echo data which is then utilized in a multi-atlas framework for LV segmentation. Milletari et al. [14] employs a Hough forests with implicit shape and appearance priors for the simultaneous detection and segmentation of LV in 3D B-mode echo. The Hough forest votes for the location of the LV centroid and the contour is subsequently estimated by using a code-book of surface patches associated with the votes. All the above works use contextual information to improve the RF segmentation. But they only impose weak structural constraint locally and do not explicitly learn the myocardial shapes from the MCE data.
Method
The overall pipeline for our myocardial segmentation method is summarized in Fig. 1. Given an input MCE image
A. Bounding Box Detection
As we will show later (Section V-A), our SM features only work well on RF input images which do not contain significant pose variations (translation, scaling, rotation) in the myocardium. This requires the rigid alignment of myocardium in RF input images. This can be done by image registration [31] but the running time can be long. To this end, we employ a CNN to automatically detect a bounding box containing the myocardium (Fig. 1b). The bounding box is then used to extract a sub-image from the original image and the sub-image is rescaled to a fixed size. This ensures the RF input image
Similar to [9], we define the bounding box
B. Shape Model Construction
In this section, we explain the construction of a statistical shape model of the myocardium which is required for the subsequent stages of our segmentation pipeline. Statistical shape model is extremely useful for representing objects with complex shapes. It captures plausible shape variations while removing variations due to noise. The 2D myocardial shape can be effectively represented using the point distribution model [24] which uses a set of landmarks to describe the shape. The model is built from a set of training shapes using PCA which captures the correlations of the landmarks among the training set. Each training shape is defined by a set of \begin{equation} \boldsymbol {x}=\bar { \boldsymbol {x} } + \boldsymbol {Pb} \end{equation}
(a) Manual annotations showing key landmarks in red and other landmarks in green. (b) First and second modes of variation of the shape model with
C. RF With SM Feature
Myocardial segmentation using RF can be treated as a pixel-wise binary classification problem [17]. The RF takes the sub-image
In the classic RF, the binary feature test is based on appearance/intensities of local image regions. This is either the mean pixel intensity of a displaced feature box from the reference pixel or the difference of the mean pixel intensities of two such feature boxes [36]. We introduce an additional novel SM feature [8] that is derived from the shape model constructed in Section III-B). An SM feature is constructed by randomly sampling some values of \begin{equation} { t }_{ \textrm {SM} }^{ \boldsymbol {b},\tau }(\boldsymbol {p})= \begin{cases} 1,&\textrm {if}\quad D(\boldsymbol {p},B(\bar { \boldsymbol {x} } +\boldsymbol {Pb}))>\tau \\ 0,& \textrm {otherwise}. \end{cases} \end{equation}
D. Shape Model Fitting
The RF probability map is an intermediate output that cannot be used directly for subsequent analysis and applications. Simple post-processing on the probability map such as thresholding and edge detection produce noisy segmentations with false positives and incoherent boundaries due to the pixel-based nature of the RF classifier. In this section, we again make use of the shape model by fitting it to the RF probability map [8]. This generates a final closed myocardial contour that is smooth and coherent. The shape model allows only plausible myocardial shape which improves the segmentation accuracy by imposing shape constraints that correct for some of the misclassifications made by the RF.
Shape model fitting is formulated as an optimization problem where we want to find the optimal values of the shape and pose parameters \begin{align}&\underset {\boldsymbol {b},\boldsymbol {\theta }}{\text {min}}~ { { \left \|{ { \boldsymbol {I} }_{ \textrm {RF} }-{ \boldsymbol {I} }_{ \textrm {M} }({ T }_{ \boldsymbol {\theta } }(\bar { \boldsymbol {x} } +\boldsymbol {Pb})) }\right \| }^{ 2 }+\alpha \frac { 1 }{ K } \sum _{ i=1 }^{ K }{ \frac { \left |{ { b }_{ i } }\right | }{ \sqrt { \lambda _{ i } } } } }\notag \\&\text {subject to} -{ s }\sqrt { \lambda _{ i } } <{ b }_{ i }<{ s }\sqrt { \lambda _{ i } }, \quad i = 1, \ldots, K. \end{align}
E. Sequence Segmentation
We extend the above segmentation method for single 2D MCE image to 2D+t MCE sequences. The proposed method introduces temporal consistency to sequence segmentation by ensuring that the segmentation of the current frame does not differ too much from that of the previous frame. Specifically, an additional temporal constraint term is added to (3) in the shape model fitting step. The new objective function to minimize becomes \begin{align}&\underset {\boldsymbol {b},\boldsymbol {\theta }}{\text {min}}~{{\left \|{ { \boldsymbol {I} }_{ \textrm {RF} }\!-\!{\boldsymbol {I} }_{ \textrm {M} }({ T }_{ \boldsymbol {\theta } }(\boldsymbol {x})) }\right \| }^{ 2 } \!+\!\alpha \frac { 1 }{ K } \sum _{ i=1 }^{ K }{ \frac {\left |{{b}_{i}}\right |}{\sqrt {\lambda _{i}}}}} {+\beta \frac { 1 }{ 2M } { \left \|{ \boldsymbol {x}\!-\!{ \boldsymbol {x} }_{ prev } }\right \| }^{ 2 } } \notag \\&\text {subject to} \,\, -{ s }\sqrt { \lambda _{ i } } <{ b }_{ i }<{ s }\sqrt { \lambda _{ i } }, \quad i = 1, \ldots, K. \end{align}
Experiments
A. Data and Annotations
Our dataset consists of healthy subjects and CAD patients who are defined as those demonstrating ≥70% luminal diameter stenosis of any major epicardial artery by qualitative coronary angiography. There are a total of 21 subjects of which 10 did not demonstrate CAD, 5 had single-vessel disease and 6 had multi-vessel disease. MCE exams from these 21 subjects were used in this paper. The data were acquired using a Philips iE33 ultrasound machine (Philips Medical Systems, Best, Netherlands) and SonoVue (Bracco Research SA, Geneva, Switzerland) as the contrast agent. For each of the 21 exams, MCE sequences in the apical 2, 3 and 4-chamber views are acquired. From these sequences, we randomly selected 242 2D MCE images to form
Manual annotations for the two datasets are done by an expert. For
B. Evaluation Metrics
Segmentation accuracy is evaluated quantitatively using Jaccard index which measures the overlap between two contours, mean absolute distance (MAD) which measures the average point-to-point distance between two contours and Hausdorff distance (HD) which measures the maximum distance between two contours. In addition, clinical indices such as endocardial and myocardial areas are also computed from the segmentations and compared to the ground truth in terms of correlation, bias and standard deviation. Correlation and Bland-Altman plots are also presented. Paired t-test is used to test for significant differences at 5% significance level.
C. Implementation Details
The CNN for the bounding box detection is implemented in Caffe [39] and runs on a machine with one NVIDIA GeForce GTX 950 GPU. The parameters used for the CNN are described in Section III-A.
The shape model is constructed with parameter
For the RF, we use 20 trees with a maximum tree depth of 24. Further increasing the number of trees in the forest and the depth of each tree adds to the computational cost with no significant improvement in segmentation accuracy. All training images for the RF are pre-processed using histogram equalization to reduce intensity variations between different images. 10% of the pixels from the training images are randomly selected for tree training. For the shape model fitting, we empirically set
Unless otherwise stated, 3-fold cross-validation is applied on the training set of
Using a machine with Intel Core i7-4770 at 3.40 GHz and 32 GB of memory, RF training takes 7.1 minutes for 1 tree. Given a test image, RF segmentation takes about 25.5s using 20 trees but tree prediction can be parallelized so that it takes 1.3s per tree. The bounding box detection takes 0.1s and the shape model fitting takes 7.8s. In total, our fully automatic algorithm takes around 9.2s to segment one image.
D. Comparisons With Other State-of-the-Art Approaches
We compare our proposed approach to ASM for static segmentation on
We use a modified ASM [40] which selects a set of optimal features for the appearance model around each landmark point instead of using the Mahalanobis distance. The parameters for shape model construction are
The non-rigid image registration is based on B-spline free-form deformations [31]. The error metric used is the sum-of-squared difference and two resolution levels are used. Smoothness penalty is set to 0.003 and B-spline control point spacing is set to 32 pixels. For each MCE sequence, the first frame is registered to all the other frames. A manual segmentation is performed on the first frame and it is propagated to the other frames in the sequence through the transformation fields found by registration.
Optical flow is based on the algorithm described in [41]. It is a variational model based on the gray level constancy and the gradient constancy assumptions, together with a smoothness constraint. The weighting for the smoothness term is set to 0.05. A multi-resolution approach is also used with a downsampling ratio set to 0.75 and the width of the coarsest level set to 10 pixels. Optical flow motion fields are computed between consecutive frame pairs in the sequence and are used to propagate the manual segmentation on the first frame to all the other frames.
The above methods all require some manual user inputs as initialization and the algorithm parameters are optimized using grid search.
Results
We evaluate different components of our segmentation method separately in the four sections below. Sections V-A to V-C evaluate the bounding box detection, the SM feature and the proposed static segmentation algorithm respectively using
A. Bounding Box Detection
In this experiment, we evaluate the accuracy of the CNN bounding box detection algorithm against the manual ground truth. We measure the detection error as
Top: Visual comparison of the bounding boxes estimated by CNN (blue) and the ground truth (red). Bottom: Visual comparison of the final contours estimated by our fully automatic approach M2 (blue) and the ground truth (red).
In the next experiment, we study the effect of bounding box on the RF probability map. Specifically, we analyze three cases in which different input images are used for RF training and testing. BBNone: Original full size image; BBCNN: Sub-image cropped from the bounding box detected by CNN; BBManual: Sub-image cropped from the manual bounding box; The RF probability maps obtained for these three cases are evaluated against the ground truth segmentations using Jaccard index as shown in Fig. 3. Our SM feature works under the assumption that the shape of interest in all the RF input images are aligned to a reference. The bounding box effectively performs this alignment by cropping out a sub-image that is free from any myocardial pose variations which leads to more accurate RF segmentation. BBNone has the worst segmentation accuracy because it does not account for any pose variations. BBCNN improves the segmentation accuracy but does not perform as well as BBManual. This is due to possible inaccuracy in the CNN-detected bounding box and pose variations may not be completely removed. However, BBCNN is fully-automatic while BBManual requires the manual annotation of bounding box which makes the overall segmentation method semi-automatic. An additional advantage of the bounding box is that it removes irrelevant image regions and reduces the image size so that subsequent RF segmentation and shape model fitting is faster.
Effect of bounding box detection on RF segmentation. Error bar denotes the standard deviation over Dataset1.
To further investigate the dependence of RF segmentation results on the bounding box detected, we conduct the following experiment. We perturb each ground truth bounding box
B. Shape Model Feature
We study the effect of SM feature on RF segmentation by comparing it with the classic RF that uses simple intensity features [36] as well as RF that uses other contextual features such as the entanglement features [28] and position features [17]. We use the sub-images extracted from the manual bounding boxes BBManual to train and test all the RFs in this section.
Fig. 4 compares the probability maps generated by RFs using different features. Our SM feature produces smoother and more coherent probability map. Other RFs often misclassify low intensity structure in the LV cavity as myocardium (row 1) and the high intensity region in the myocardium as background (row 2). RF with SM feature can overcome these problems by imposing a global shape constraint.
Fig. 5 shows the segmentation performance of different RFs at different tree depths. Our SM feature RF outperforms the other RFs at all tree depths. This is because the SM feature captures the explicit geometry of the myocardium to guide the segmentation. A binary test of the SM feature in the split node partitions the image space using meaningful myocardial shape based on the shape model. Position feature [17] also learns the myocardial shape implicitly but its binary test partitions the image space using simple straight line which is less effective. To learn complex shape like the myocardium, many more position feature tests are needed compared to SM feature tests which learn the myocardial shape directly. This causes SM feature to be the more discriminative feature at lower levels of the tree, leading to better results than other RFs especially at lower tree depths.
C. Static Segmentation on ${{Dataset1}}$
We compare the following different approaches for the evaluation of static segmentation on
ASM: Active shape Model [40]
M1: BBCNN+ SM feature RF + Canny edge detector
M2: BBCNN+ SM feature RF + Shape model fitting
M3: BBCNN+ SM feature RF + Shape model fitting (View specific)
M4: BBManual+ SM feature RF + Shape model fitting
Fig. 6a presents the segmentation accuracy results (Jaccard, MAD and HD) and Table III reports the evaluation on clinical indices such as endocardial area and myocardial area. ASM does not perform well because it uses a simple intensity profile model to search for the best landmark position. This model is not adequate for noisy MCE images. RF can provide a much stronger and discriminative intensity model. In M1, we apply our segmentation method using the CNN bounding box detection algorithm (BBCNN) and the RF with SM features. However, we replace the last shape model fitting step with a Canny edge detector to obtain a binary edge map as the final segmentation. This gives better results than the ASM but the final segmentation is not regularized by global shape constraint. To improve the results further, shape model fitting is added in M2 to give our fully automatic segmentation approach. It combines the local discriminative power of RF with the global shape constraint imposed by the shape model. The fitting guides the segmentation in regions where the RF probability map has low confidence predictions. It also ensures the final segmentation is a smooth and coherent contour that represents plausible myocardial shape. The bottom row of Fig. 7 shows the final myocardial contours predicted by M2. The method is able to segment the myocardium accurately even in the presence of shadowing and attenuation artifacts which result in unclear epicardial border. In M3, we trained three separate RF models and shape models for the three different apical chamber views. There is a small improvement in results since each model learns more specifically the different anatomy of each view although a general model that includes all views (M2) is also quite robust. Since there are less training data for each view model in M3, we expect the results to improve with more data. In M4, we replace the CNN bounding box in M2 with manual bounding box (BBManual). This results in a semi-automatic approach which accurately removes any myocardial pose variations and achieves the best results in the final segmentation.
Segmentation accuracy results of the various approaches for (a)
D. Sequence Segmentation on ${{Dataset2}}$
Fig. 6b presents the segmentation accuracy of the different approaches on
N1 is our static segmentation method which uses (3) for shape model fitting. It is the same as M2 and both RF model and shape model are trained on
N3 is our sequence segmentation method and it achieves the best results by extending N1 using (4) for shape model fitting. The temporal constraint allows N3 to produce temporally more consistent segmentations throughout the sequence with the most improvement reflected by the HD metric. The clinical indices of endocardial and myocardial areas are also computed for the automatic segmentations from N3 and compared to the manual segmentations in terms of correlation and Bland-Altman plots as shown in Fig. 8. It can be observed that our method results in overestimation for smaller areas and underestimation for larger areas. Fig. 9 shows the myocardial and endocardial area-time curves over one cardiac cycle. The curves are the mean across the 12 sequences in
Correlation plots (a,c) and Bland-Altman plots (b,d) of the endocardial and myocardial areas computed for the manual segmentations and estimated segmentations from N3. For BA plots, solid line: bias; dotted lines: limits of agreement
Mean endocardial (blue) and myocardial (red) areas of 12 sequences over one cardiac cycle computed from ground truth (dash lines) and N3 segmentations (solid lines).
Discussion and Conclusion
We have presented a fully automatic approach for fast and accurate segmentation of myocardium in 2D MCE image. The proposed method uses a statistical shape model to guide the RF segmentation by imposing global shape constraints. This is done by first incorporating a novel SM feature into the RF framework and then fitting the shape model to the RF probability map to obtain the final myocardial contour. The SM feature outperforms other contextual features by producing more accurate RF probability map while the shape model fitting step further improves the final segmentation results by producing a smooth and coherent contour. Bounding box detection using CNN serves as an important image alignment step that improves the performance of subsequent RF segmentation using SM feature. The method is further extended to 2D+t MCE sequence which imposes temporal consistency in the final sequence segmentations. The overall segmentation method combines the advantages of both ASM and RF, and outperforms either of this method used alone.
Our proposed method is generic and can be applied to other image data containing large intensity variations where prior knowledge of shape becomes important in guiding the segmentation. Current study only limits our approach to contrast echocardiography data. As future work, we will evaluate our method on other datasets of B-mode echocardiography or even medical data of different modalities in order to test the robustness and generality of the approach. In addition, our sequence dataset
Extension to 3D data should also work in principle. In this case, we need to compute the shortest distance from a point to a surface for the SM feature which can increase computational cost. Shape model fitting can also take longer if 3D binary volume needs to be generated from the mesh surface of 3D shape model. Since this is the limiting step with the longest running time in our segmentation pipeline, future work will look at more efficient ways of optimizing this step in order to make our approach real-time. One way is to define the shape model on distance maps directly instead of the landmarks as described in [42]. This will save the computational cost of converting the contour into a distance map or binary mask during SM feature computation and shape model fitting.
We have added a simple temporal constraint term to induce temporally smooth sequence segmentation. More sophisticated tracking algorithms such as block matching and optical flow could be incorporated into our framework to ensure temporal consistency. This is done in [4] where the RF segmentation is propagated using optical flow. Finally, effort should also be directed at more accurate bounding box detection since we have shown that the final segmentations obtained using our approach is strongly dependent on this step.
ACKNOWLEDGMENT
The authors would like to thank Dr Robert Eckersley, Prof. Daniel Rueckert and other members from the BioMedIA group for their help and advice. The work of Y. Li was supported by the Imperial College President’s Ph.D. Scholarships. The work of M.-X. Tang was supported by the Engineering and Physical Sciences Research Council, U.K. under Grant EP/M011933/1.