Introduction
Superpixel segmentation groups image pixels into perceptually consistent and homogeneous atomic regions, which is one of the powerful pre-processing tools in computer vision applications. Compared with pixels, superpixels provide richer region description and contour information. Particularly, superpixels can align well with the natural object boundaries with a small number of image primitives. As a result, using superpixel as a basic processing unit can significantly reduce the computational complexity of subsequent vision applications, such as image segmentation [1], [2], saliency detection [3], [4], object tracking [5], [6], 3D reconstruction, and semantic segmentation [7], [8].
Although intensive superpixel segmentation methods with different characteristics have been proposed, adaptively determining the number and the position of initial seeds is still a thorny problem. For example, as the most popular superpixel segmentation methods, SLIC [9], FCN [10], and ERS [11] require artificially enumerating the number of initial seeds and uniformly distributing the positions of initial seeds. However, the inappropriate number of initial seeds will significantly affect the redundancy and boundary adherence [12]. In addition, as shown in Fig. 1(a), uniform seed initialization is sensitive to noises and boundaries, especially when the initial seeds fall outside of small objects or slender regions [13].
(a) Uniform seed initialization, (b) Non-uniform seed initialization, (c) Uniform superpixel segmentation (SLIC), (d) Non-uniform superpixel segmentation (Ours).
Compared with uniform seed initialization, fewer schemes for non-uniform seed initialization are proposed. For example, Li et al. [14] used square-wise asymmetric partition with an unfixed size to initialize the number of seeds. Kang et al. [13] initialized the seeds by averaging the distribution of areas, which requires setting the maximum number of superpixels. Liu et al. [15] adjusted random seed initialization to non-uniform seed distribution by projecting samples. Wang et al. [16] initialized seeds based on the edge density distribution. Pan et al. [17] adjusted the position of initial seeds based on the structure of image region. Wang et al. [18] adjusted the seed position by calculating the pixel gradient within pre-defined region blocks. However, most of these methods still need to pre-set the initial seed number, or adjust the initial seed position under uniform seed distribution.
Efficient distance measure is another factor for high-precision superpixel segmentation. For instance, SLIC [9] constructed a distance measure based on color and spatial proximity, which is widely used for pixel label assignment. Note that the introduction of spatial proximity can ensure the regularity of superpixels. However, when the contrast between adjacent objects is small, the proposed distance measure is prone to produce weak object boundaries. The reason is that the distribution of seed positions or the consistency of color features reinforces the weight of spatial metric. Additionally, the label assignment of pixels based on weighted distance measure will produce small disconnected regions isolated from the superpixel body. Unfortunately, the disconnected regions are present in abundance, which can significantly affect the connectivity and regularity of the generated superpixels.
Motivated by the above issues, we propose an adaptive superpixel segmentation with non-uniform seed initialization. The proposed algorithm can adaptively determine the number and the position of initial seeds, and is robust to the segmentation of small objects and slender regions. Moreover, we propose a seed augmentation method according to the sparsity and distribution of seeds. Fig. 1(b) and (d) show the proposed non-uniform seed initialization and corresponding superpixel segmentation results. In addition, we conduct extensive experiments on the natural and traffic scene datasets, which demonstrate that the proposed algorithm can obtain favorable results against the state-of-the-art methods.
The contributions of our work are summarized as follows.
From the perspective of the side of the circumscribed rectangle of the small object, we propose a scheme of non-uniform seed initialization based on minimum step and interval boundary gradient. The proposed scheme can adaptively allocate fewer seeds in flat regions, while allocating more seeds in object regions with sharp gradient changes.
By limiting the search region proportional to minimum step, we propose a weighted distance measure with color and spatial constraints, which reduces the computational complexity and enhances the precision of pixel label assignment.
We quantify the disconnected regions are present in abundance, and propose a post-processing method for disconnected regions based on the area of predefined small objects. The proposed method can enhance the connectivity and regularity of the generated superpixels.
The rest of the paper is organized as follows. Section II reviews the state-of-the-art methods. Section III describes the proposed superpixel segmentation algorithm in detail. The extensive experimental results and analyses are presented in Section IV. Finally, Section V concludes our work.
Related Work
The concept of superpixels is first proposed by Ren et al. [19]. According to different label assignment modes, the existing superpixel segmentation methods can be classified into clustering-based methods, graph-based methods, and deep-network-based methods [20].
A. Clustering-Based Methods
Clustering-based methods iteratively refine the labels of pixels based on the feature similarity until certain convergence conditions are met. In particular, Achanta et al. [9] proposed a simple linear iterative clustering (SLIC) algorithm, which is the most widely-used and prominent model. Although SLIC has the advantages of the controllable number of superpixels and fast speed, the boundary adherence of the generated superpixels needs to be further optimized. Moreover, SLIC is sensitive to low-contrast objects and slender regions. Therefore, many extended SLIC methods were proposed. For example, Li et al. [21] mapped each image pixel to a ten-dimension feature space, and proposed a superpixel segmentation based on linear spectral clustering (LSC). This method can obtain better boundary adherence, but it is sensitive to noises; Liu et al. proposed the manifold SLIC (mSLIC) [22] and the intrinsic manifold SLIC (imSLIC) [15]. The advantages of the two algorithms are fast computation and random seed initialization; Nagar et al. [23] used reflection symmetry to optimize the size and shape of superpixels (SymmSLIC); SNIC [24] improved the number of iterations and connectivity, which requires lesser memory; Zhao et al. [25] proposed a spherical superpixel segmentation for wide-angle images (SphSLIC); Wu et al. [26] proposed a fuzzy SLIC (FSLIC) based on neighboring spatial information and onion peeling; Uziel et al. [12] proposed a bayesian adaptive superpixel segmentation (BASS), which is not easy to control the final number of superpixels. Inspired by BASS, Zhao et al. [27] proposed an adaptive superpixel segmentation (ASS) method to automatically select the number of superpixels.
In addition, Wu et al. [20] constructed texture-aware and structure-preserving to optimize the boundary adherence of superpixels (TaSp). Nevertheless, the proposed method needs to initialize the seeds uniformly. Li et al. [28] introduced depth information into the distance measure (FCSS), which is sensitive to the position of initial seeds. Xu et al. [29] introduced contour information to achieve high-quality superpixel segmentation (RD). Zhou et al. [30] used gradient information between adjacent pixels to update the position of the initial seeds (VSSS). Gong et al. [31] introduced differential evolution and objective function to generate compact superpixels (DES). Xiao et al. [32] achieved superpixel segmentation by iteratively adjusting the weights of features (CAS). Ban et al. [33] proposed a superpixel segmentation based on Gaussian mixture model (GMM), which can initialize seed number by setting the horizontal and vertical steps. Shen et al. [34] achieved real-time segmentation by fusing density-based clustering and merging small superpixels (DBSCAN). Wang et al. [16] fused geodesic distance and energy function to produce structure-sensitive superpixels (SSS). Pan et al. [17] acquired weights of image features through the neural network (TRS). Wang et al. [35] proposed a superpixel segmentation based on edge-weighted centroidal voronoi tessellations (VCells). Bergh et al. [36] proposed a method of hill-climbing optimization (SEEDS), which tends to generate irregular superpixels. Levinshtein et al. [37] adopted geometric-flow and local image gradient to generate superpixels (TP). Moore et al. [38] added vertical and horizontal topological constraints to generate superpixel grids (Lattices). However, the superpixels generated by Lattices are irregular. Sun et al. [39] proposed a local adaptive distance (LAD), which can enhance the discrimination of low-contrast regions by adjusting the feature and spatial spaces.
In general, clustering-based methods have the advantages of the controllable number of superpixels, high computational efficiency, and better regularity. However, clustering-based methods have poor boundary adherence and segmentation accuracy. Compared with the aforementioned clustering-based methods, the distinctions of our proposed algorithm lie in: 1) Non-uniform seed initialization; 2) Adaptive determination of initial seed number and position. As a result, our proposed algorithm can distribute more seeds in small objects and slender regions. The summary of the clustering-based methods is shown in Table I.
B. Graph-Based Methods
Graph-based methods formalize superpixel segmentation as an undirected graph partitioning problem. The pixel in an image is treated as a node in the graph, while the similarity of pixels is measured by the weight between nodes. The superpixels are generated by minimizing the objective function defined over the graph. For instance, Li et al. [14] formalized superpixel segmentation as a square-wise partition, and constructed a combinatorial optimization strategy to generate superpixels (SwA). Ni et al. [40] proposed a superpixel segmentation combining dual similarity and entropy rate, which is robust to remote images with complex textures (AOS). Fu et al. [41] used spatial structure to generate regular superpixels (RPS). Felzenszwalb et al. [42] proposed a graph segmentation method based on minimum spanning tree (FH). Liu et al. [11] constructed the entropy rate of random walk and the objective function of balancing term (ERS). Nevertheless, ERS is prone to generate irregular superpixels. Vargas-Munoz et al. [43] proposed a superpixel segmentation based on the sequences of image foresting transforms (ISF), which used grid sampling strategy to locate seeds. Shen et al. [44] optimized the results of superpixel segmentation through the energy function construction (LRW). It has high computational complexity, especially when the image size or the initial seed number is large. Kang et al. [13] added dynamic nodes to the random walk model, and reduced redundant computation by limiting the walk range (DRW). It has linear computational complexity. Wang et al. [18] used the gradient of all pixels to adjust the initial seed position, and then adopted nonlocal random walk (ANRW) to generate superpixels. However, this method requires uniform seed initialization and artificial initial seed number.
Overall, graph-based methods have an advantage in boundary adherence. Compared to clustering-based methods, graph-based methods have higher complexity, and are prone to generate irregular superpixels. The summary of the graph-based methods is shown in Table II.
C. Deep-Network-Based Methods
Deep-network-based methods utilize deep neural network to capture local image features and build associations between pixels and surrounding grids. For example, Jampani et al. [45] generated superpixels (SSN) by fusing deep neural network and differentiable SLIC. It is the first end-to-end trainable network for superpixel segmentation. Pan et al. [46] proposed a fast superpixel generation algorithm that can generate superpixels with lattice topology (LT). Yang et al. [10] integrated convolutional neural network (CNN) and fullyconvolutional network to generate superpixels on a regular grid. Wang et al. [7] constructed pixel-grid level context and boundary-perceiving loss to improve the boundary adherence of superpixels (AINet). Eliasof et al. [47] proposed a different objective function for refining object boundaries (RUN). Zhu et al. [48] proposed an unsupervised CNN-based method (LNS-Net), which does not require manual labels. Tu et al. [49] used pixel affinity net to predict pixel affinities, and constructed a segmentation-aware loss to generate superpixels (SEAL-ERS).
In general, most deep-network-based methods rely on supervised information provided by pre-trained deep neural network, and require a combination of boundary optimization or subsequent iterative updates. However, the construction of associations tends to generate superpixels that align well with the whole object, which helps to capture more context information of pixels. The summary of the aforementioned deep-network-based is shown in Table III.
Proposed Algorithm
Focusing on high-precision and connectivity, we propose a superpixel segmentation algorithm with non-uniform seed initialization (NuSI). From the perspective of the side of the circumscribed rectangle of the small object, the proposed algorithm can adaptively determine the number and the position of initial seeds. Moreover, the size of the generated superpixels can be adjusted according to the gradient of the object boundary. Our proposed algorithm is a clustering-based method, which mainly includes three stages: non-uniform seed initialization, weighted distance measure, and post-processing of disconnected regions.
A. Non-Uniform Seed Initialization Based on Minimum Step and Interval Boundary Gradient
The number and the position of initial seeds are critical for the quality of superpixel segmentation. To obtain higher boundary adherence, most of the existing methods need to increase the number of initial seeds. However, the enumerated seed initialization will introduce a new hyperparameter indirectly. Moreover, the larger number of initial seeds will produce redundant information. In particular, uniform seed initialization is sensitive to noises and object boundaries, especially when the initial seeds fall outside of small objects or slender regions.
To adaptively determine the number and the position of initial seeds according to image characteristics, we propose a scheme of non-uniform seed initialization based on the side of the circumscribed rectangle of the small object and interval boundary gradient. Suppose the image size is
First, we define minimum step d, which is the side of the circumscribed rectangle of the small object. As shown in Fig. 2, the minimum step d can be adaptively determined according to the datasets or image characteristics. Note that the minimum step d can guarantee the seed distribution within the small object. Therefore, the determination of the minimum step d can provide important prior information for the adaptive and high-quality distribution of initial seeds.
Second, we calculate the number of grids for the initial seeds, and the distribution of seeds on each grid. Specifically, the number of grids Ng of initial seeds in an image is calculated:
\begin{equation*}
\text{Ng} = {\mathrm{ floor }}\left\{ {H/d,{\mathrm{ or }}W/d} \right\}, \tag{1}
\end{equation*}
In order to accurately locate the seeds on each grid, we propose a boundary detection method with interval boundary gradient. The reason is that boundary can roughly reflect the distribution and shape of objects, and then guide the number and the position of initial seeds. Moreover, we use interval gradient to represent gradient changes around pixels, which can avoid the interference of noises. Based on the normalized LAB color components, the boundary point Bp is defined as:
\begin{align*}
Bp{\mathrm{ }} =& \left\{ {\begin{array}{ll} i,& {\mathrm{ if }}{{D}_{\mathrm{a}}} \geq {{\mathrm{T}}_L} \& amp; {{D}_I} \geq {{\mathrm{T}}_L}\\
0, & \text{else} \end{array}} \right. \tag{2}\\
{{D}_{\mathrm{a}}} =& {\mathrm{ }} \min{\mathrm{ }}\{ f({{\mathrm{L}}_{i + 1}}) - f({{L}_i}),{\mathrm{ }}f({{A}_{i + 1}}) \\
& -f({{A}_i}),{\mathrm{ }}f({{B}_{i + 1}}) - f({{B}_i})\} \tag{3}\\
{{D}_I} =& {\mathrm{ }} \min{\mathrm{ }}\{ f({{\mathrm{L}}_{i + 2}}) - f({{L}_i}),{\mathrm{ }}f({{A}_{i + 2}}) \\
& - f({{A}_i}),{\mathrm{ }}f({{B}_{i + 2}}) - f({{B}_i})\} \tag{4}
\end{align*}
Based on the boundary points calculated in (2), we calculate the position coordinates of the neighboring boundary points on each grid. Then, we define the center of the adjacent boundary points as the position of the initial seeds. Although the boundary determination relying only on the color space can accurately locate the boundary, it may lead to fewer seeds on the grid with large boundary distances. Therefore, we construct a seed augmentation method for grids with sparse seed distributions, which can enhance the regularity of superpixels.
For grids without seeds, we add seeds in units of
\begin{align*}
\mathrm{S}(\mathrm{i}) =& S(y) + w * \mathrm{E}t,{\mathrm{ }}w = 1,2,\ldots,t \tag{5}\\
\text{Et} =& round{\mathrm{ }}({{D}_w}/(t + 1)) \tag{6}\\
{{D}_w} =& \left| {S(\mathrm{y} + 1) - \mathrm{S}(y)} \right|, \tag{7}\\
t =& floor{\mathrm{ }}({{D}_w}/(kl * d)), \tag{8}
\end{align*}
Finally, combining LAB color components, texture feature [50], and seed coordinates, we construct a 6-dimension seed metrics
B. Distance Measure With Search Region and Feature Constraints
Distance measure is another factor for high-precision superpixel segmentation. However, the label assignment of pixels based on weighted distance measure is sensitive to the position of initial seeds, and tends to generate under-segmentation superpixels. Moreover, efficient segmentation of low-contrast regions is also crucial for high-precision superpixel segmentation. Therefore, we propose a distance measure with search region and feature constraints.
First, we limit the search region proportional to
Second, to distinguish low-contrast regions, we introduce constraints on color and spatial distance. The proposed constraints can effectively ensure the boundary adherence of superpixels. Specifically, the constrained color distance
\begin{align*}
D{{^{\prime}}_c} =& {\mathrm{ }}\max {\mathrm{ }}\left\{ {\left| {{{l}_i} - {{l}_s}} \right|,\left| {{{a}_i} - {{a}_s}} \right|,\left| {{{b}_i} - {{b}_s}} \right|} \right\} \tag{9}\\
D{{^{\prime}}_p} =& {\mathrm{ }}\max {\mathrm{ }}\left\{ {\left| {{{x}_i} - {{x}_s}} \right|,\left| {{{y}_i} - {{y}_s}} \right|} \right\} \tag{10}
\end{align*}
If
\begin{align*}
{{D}_c} =& \sqrt {{{{({{l}_i} - {{l}_s})}}^2} + {{{({{a}_i} - {{a}_s})}}^2} + {{{({{b}_i} - {{b}_s})}}^2}} \tag{11}\\
{{D}_p} =& \sqrt {{{{({{x}_i} - {{x}_s})}}^2} + {{{({{y}_i} - {{y}_s})}}^2}} \tag{12}\\
{{D}_{\mathrm{t}}} =& \sqrt {{{{({{\mathrm{t}}_i} - {{t}_s})}}^2}} \tag{13}\\
D =& \sqrt {{{{({{D}_c})}}^2} + {{{(\frac{\alpha }{d}{{D}_p})}}^2} + \beta {{{({{D}_{\mathrm{t}}})}}^2}} \tag{14}\\
Label(i) =& {\mathrm{ }}\arg \min {\mathrm{ }}\left\{ {D(s)} \right\} \tag{15}
\end{align*}
Finally, we assign seed labels to each pixel, and update the 6-dimension features for each seed. In addition, we adopt an iterative approach to adjust the position of seeds, which can enhance the regularity of superpixels.
C. Post-Processing of Disconnected Regions
Disconnected regions are defined as isolated regions that are not connected to the superpixel body. The reason for this phenomenon is that there is no seed inside the disconnected region, or the spatial distance between the pixel and the surrounding seeds is large. In addition, the limit of the search region is also a factor for the generation of disconnected regions. Some examples of disconnected regions are shown in Fig. 4(b).
To enhance the connectivity of superpixels, most existing methods [31], [9] roughly merge these disconnected regions into the nearest neighbor superpixels. However, the forced merging relying only on adjacency tends to generate under-segmentation superpixels. In particular, some small objects with large contrast are coarsely merged into adjacent superpixels. Fortunately, the proposed non-uniform seed initialization is an adaptive seed initialization scheme, which can reduce disconnected pixels with flat regions. Moreover, we introduce the constraint of spatial distance in weighted distance measure, which can also reduce pixels being assigned to seed labels with larger spatial distance. Nevertheless, the proposed non-uniform seed initialization tends to aggregate more seeds in regions where the boundary gradient changes sharply, which may lead to the increase of disconnected regions. Therefore, we propose a post-processing method for disconnected regions based on the area of small objects defined by the minimum step d.
First, we calculate the connectivity of each superpixel. After removing the superpixel body, we calculate the position and the number of disconnected regions. Second, we calculate the area
\begin{equation*}
Label(\text{Dr}) = {{\mathrm{N}}_S} * i + k \tag{16}
\end{equation*}
For the disconnected region with small area, we calculate the superpixel labels corresponding to the disconnected pixels in 8-neighborhood (excluding the original superpixel label of disconnected pixels). Then, the L-component gradient Ddp between the disconnected pixel and adjacent seeds are calculated:
\begin{align*}
{{D}_{\text{dp}}}(s) =& \left| {\mathrm{L}(j) - {{L}_s}(j)} \right| \tag{17}\\
Label(j) =& \arg \min {\mathrm{ }}\left\{ {{{D}_{\text{dp}}}(s)} \right\} \tag{18}
\end{align*}
Finally, we complete the label re-assignment of each disconnected pixel, and the post-processing of disconnected regions. The summary of the proposed superpixel segmentation algorithm is shown in Algorithm 1.
Algorithm 1: Adaptive Superpixel Segmentation With Non-Uniform Seed Initialization (NuSI).
Input: Input image I, minimum step d, gradient threshold TL, constant kl, proportional parameter v, color threshold Tc, spatial threshold Tp, normalization parameter α, β, iteration number Iter, and scale parameter r.
Output: Superpixel label for each pixel.
Initialize the label
Initialize the distance
Set the minimum step d, and calculate the number of grids Ng.
Boundary detection with interval boundary gradient.
Initialize seed position and add seeds.
Limit search region and extract pixel features.
for each iteration do
Calculate the constrained color and spatial distance,
if
Calculate the distance measure D,
if
end
end
Update seed features.
end
Calculate the connectivity of each superpixel.
Calculate the area of each disconnected region.
if
Assign new superpixel label to disconnected regions,
else
Assign superpixel label with the L-component gradient.
end
D. Complexity
Suppose the image size is
Experiments
To verify the effectiveness of our proposed superpixel segmentation algorithm, we first give the widely-used datasets and evaluation metrics. Second, the effectiveness of non-uniform seed initialization and post-processing of disconnected regions are verified. After that, the performance of our proposed algorithm is demonstrated compared with the state-of-the-art methods qualitatively and quantitatively. Finally, the application verification of superpixel segmentation algorithm is presented.
We implemented our proposed algorithm on MATLAB R2018a with Intel Xeon Silver 4210 2.10 GHz CPU and 32GB RAM. In addition, the configurations required to run the deep-network-based methods are: 64bit-Ubuntu 20.04 operating system, 10 GB GeForce RTX 2080 Ti GPU, Pytorch 1.7.1 deep learning framework, Python 3.8 data processing. Through extensive experimental verification, the parameter ranges of our proposed algorithm can be given empirically:
A. Datasets and Evaluation Metrics
We conduct comparison experiments on Berkeley Segmentation Dataset (BSDS) and Cambridge-driving Labeled Video Database (CamVid). BSDS [51] is composed of 500 outdoor images of natural scenes, which is the most widely-used superpixel segmentation evaluation dataset. The resolution of the image is 321×481 or 481×321. Each image contains 5 manually annotated ground-truths. Nevertheless, some boundary annotations of ground-truths are not accurate.
CamVid [52] is a traffic scene dataset consisting of 701 images, which provides 32 ground-truth labels. The resolution of the image is 360×480. The images of CamVid are taken from the perspective of a driving car, which contains multiple types of objects, complex backgrounds, and different lighting conditions. The ground-truth of each image contains some small objects and noises, which exacerbate the difficulty of superpixel segmentation.
Three widely-used evaluation metrics in the field of superpixel segmentation are selected: under-segmentation error (UE), boundary recall (BR), and achievable segmentation accuracy (ASA).
1) Under-segmentation Error (UE) [53]: It is a metric of boundary adherence, which measures the fraction of pixel leaks across the boundary of ground-truth. The smaller the UE, the better segmentation performance is achieved. It penalizes superpixels that cross multiple objects, which is defined as:
\begin{equation*}
UE{\mathrm{(}}G,S{\mathrm{)}} = \frac{1}{N}\sum\limits_{{{G}_i}} {\sum\limits_{{{S}_j} \cap {{G}_i} \ne \Phi } {\min \left\{ {\left| {{{S}_j} \cap {{G}_i}} \right|,\left| {{{S}_j} - {{G}_i}} \right|} \right\}} } \tag{19}
\end{equation*}
2) Boundary Recall (BR) [54]: It measures the percentage of the superpixel boundary and the true boundary of the object. A higher BR indicates that more ground-truth boundaries are found. The BR is defined as:
\begin{equation*}
BR{\mathrm{(}}s{\mathrm{)}} = \frac{{\sum\nolimits_{p \in B{\mathrm{(}}g{\mathrm{)}}} {I\left[ {{{{\min }}_{q \in B{\mathrm{(}}s{\mathrm{)}}}}\left\| {p - q} \right\| < \varepsilon } \right]} }}{{\left| {B{\mathrm{(}}g{\mathrm{)}}} \right|}} \tag{20}
\end{equation*}
3) Achievable Segmentation Accuracy (ASA) [53]: It measures the maximum coincidence between the region where the superpixel is located and the ground-truth region. It reflects the upper bound of the region segmentation accuracy that can be achieved. ASA is defined as:
\begin{equation*}
ASA{\mathrm{(}}G,S{\mathrm{)}} = \frac{1}{N}\sum\limits_{{{S}_j}} {\mathop {\max }\limits_{{{G}_i}} } \left\{ {\left| {{{S}_j} \cap {{G}_i}} \right|} \right\} \tag{21}
\end{equation*}
B. Ablation Experiments
1) Effectiveness of Non-uniform Seed Initialization: In this subsection, we will verify the proposed scheme of non-uniform seed initialization can not only adaptively generate the number of initial seeds, but also adaptively adjust the position of initial seeds according to the minimum step d and interval boundary gradient.
First, under the conditions of d = 10 & TL = 0.01, d = 20 & TL= 0.01, and d = 20 & TL= 0.02, the numbers of initial seeds corresponding to each image on BSDS dataset are verified. As shown in Fig. 5, under different d and TL conditions, the number of initial seeds on BSDS dataset are mostly distributed between [50, 1500], which is consistent with the range of the enumerated number of initial seeds. Importantly, the initial seed number can be adaptively adjusted according to the image characteristics. Moreover, the larger minimum step d is, the fewer grids are traversed and the fewer seeds are distributed. Furthermore, as TL increased, fewer initial seeds are detected. The reason is that the larger TL, the pixels with small contrast may be judged as the same object. However, the number of initial seeds for individual images also presented a larger case. The reason is that there are more independent small regions in the image, and more boundary points are detected.
To verify the effectiveness of the proposed non-uniform seed initialization intuitively, we presented the distribution of boundary points. As can be observed in Fig. 6(a), the boundary points are detected with high precision. Obviously, the object regions with sharp gradient changes are assigned more boundary points. It verified the method of interval boundary gradient can accurately locate the boundary of objects. Furthermore, under the same initial seed number, we gave the distributions of uniform and non-uniform seed initialization. As shown in Fig. 6(b) and (c), uniform seed initialization has strong randomness, which makes it difficult to ensure that the seeds are distributed within the objects. In contrast, non-uniform seed initialization distributed fewer seeds in regions with flat density. Meanwhile, more seeds are distributed to regions with large feature differences. In particular, non-uniform seed initialization ensured seeds cover the small objects and slender regions.
2) Effectiveness of Weighted Distance Measure: An effective distance measure is essential to obtain high boundary adherence and regular superpixels. In the distance measure section, three main hyperparameters are included: proportional parameter v, color threshold Tc, and spatial threshold Tp.
As can be observed in Table IV, the larger the proportional parameter v is, the running time of the proposed algorithm increases significantly and the performance of the proposed algorithm decreases. The reason is that the proportional parameter v determines the number of seeds around the pixel, which results in more seeds being traversed. As a result, more distant pixels are assigned within the same superpixel. Second, the introduction of color threshold Tc and spatial threshold Tp improved the performance of superpixel segmentation, but increased the running time of the algorithm. In addition, under the conditions of v = 4, Tc = 40, and Tp = 80, our proposed algorithm achieved the best experimental results. It reflected that a large search is not necessary for superpixel segmentation algorithms. Notably, it proved again that the introduction of color threshold Tc and spatial threshold Tp is useful for achieving high-precision superpixel segmentation. Bold values show the best performance.
3) Effectiveness of Post-processing of Disconnected Regions: Quantity detection and efficient processing of disconnected regions are crucial for high-quality superpixel segmentation. First, the number of disconnected regions and disconnected pixels corresponding to each image on BSDS and CamVid datasets are calculated. As illustrated in Fig. 7, the images on BSDS and CamVid datasets generally have disconnected regions (Dr) and disconnected pixels (Dp), and the numbers of disconnected regions and disconnected pixels are large. Moreover, the images on BSDS and CamVid datasets correspond to different numbers of disconnected regions and disconnected pixels, which is related to the image characteristics. Therefore, it is necessary for post-processing of disconnected regions.
The number of disconnected regions (Dr) and disconnected pixels (Dp) on BSDS and CamVid datasets.
To further verify the ubiquity of disconnected regions, we gave the number of disconnected regions and disconnected pixels generated by uniform and non-uniform seed initialization, respectively. For objective comparison, we represent the result of uniform seed initialization with the number of disconnected regions and disconnected pixels generated by SLIC [9]. Meanwhile, with the same initial seed number, we do not introduce color and spatial constraints in SLIC. Also, the adaptive method TRS [17] for non-uniform seed initialization was chosen as the comparison algorithm. From Table V, we can see that SLIC and TRS also generated a large number of disconnected regions and disconnected pixels, which further proved the disconnected regions and disconnected pixels are ubiquitous in clustering-based methods. The main reason is that the distribution of seed positions and the smaller spatial distance exacerbate the generation of disconnected regions and disconnected pixels. Furthermore, the number of disconnected regions and disconnected pixels generated by the non-uniform seed initialization methods (i.e., our proposed algorithm and TRS) are strongly influenced by the minimum step d and threshold parameter TL. Compared with uniform seed initialization, non-uniform seed initialization can also generate fewer disconnected regions and disconnected pixels. The reason is that more initial seeds are distributed inside the small disconnected regions.
To handle disconnected regions and disconnected pixels, we propose a scheme based on the area of predefined small objects. The visual comparison results of the proposed scheme before and after processing are verified. According to Fig. 8, there are a large number of disconnected regions and disconnected pixels, which seriously affected the connectivity and regularity of superpixels. Moreover, the proposed post-processing scheme based on the area of predefined small objects significantly reduced the number of disconnected regions and disconnected pixels, and generated regular superpixels.
The scale parameter r determines the area of predefined small objects. Therefore, we quantitatively gave the effect of different scale parameters r on the segmentation results. Table VI presented the three evaluation metrics improved as the cale parameter r increased. The reason for this phenomenon is that the larger scale parameter r divides more small objects into individual superpixels, which ensures the presence of small objects and the accuracy of the boundary segmentation as much as possible. Moreover, as the scale parameter r gradually increases, this influence will become weaker.
C. Comparison to the State-of-the-Art Methods
To comprehensively evaluate the performance of our proposed algorithm (NuSI), we compare it with the following state-of-the-art methods, i.e., SLIC [9], LSC [21], SNIC [24], LRW [44], ERS [11], FCN [10], RUN [47], DBSCAN [34], and BASS [12]. Among them, SLIC, LSC, and SNIC are representative methods based on clustering, LRW and ERS are representative methods based on graph, FCN and RUN are representative methods based on deep networks. All comparison algorithms are implemented with the original source code provided by the authors or code library [53], [54].
1) Visual Comparison: To visually verify the effectiveness of our proposed algorithm, we will give some segmentation examples on BSDS and CamVid datasets. To ensure the objectivity of the comparison results, all visual segmentation results of the comparison algorithms on BSDS and CamVid datasets are performed at the same initial number of superpixels.
As shown in Fig. 9, our proposed algorithm (NuSI) has a significant advantage in segmenting small objects as well as slender regions. This is attributed to the proposed non-uniform seed initialization method, which enables the seed points to fall precisely inside small objects and slender regions. Second, the contours of the superpixels generated by our proposed algorithm (NuSI) are smoother and finer, especially for the small objects and slender regions. Particularly, our proposed algorithm allocated more seeds in regions with sharp gradient changes, which proved that the scheme of non-uniform seed initialization can adaptively distribute the position of seeds. In addition, BASS also obtained better experimental results, which noted that the adaptive approach (i.e., our proposed algorithm and BASS) is state-of-the-art in terms of visual comparison results. Under the same number of initial seeds, all comparison algorithms can obtain better segmentation results. However, the method of uniform seed initialization is susceptible to the randomness of seed distribution, and it is difficult to ensure accurate segmentation of small objects and slender regions. Furthermore, our proposed algorithm is able to generate more regular superpixels and fewer under-segmentation superpixels. In general, except for the ERS, most of the state-of-the-art methods generated regular superpixels.
2) Quantitative Comparison: In this subsection, we compared the proposed algorithm with the state-of-the-art methods quantitatively. First, under the adaptive number of initial seeds, the performance comparisons between our proposed algorithm and the state-of-the-art methods on BSDS and CamVid datasets are given, which are shown in Tables VII and VIII, respectively. (Parameters:
It can be seen from Table VII that our proposed algorithm obtained the best UE and BR compared with the state-of-the- art methods on BSDS dataset. In particular, our proposed algorithm has a significant advantage over the comparison algorithms in terms of under-segmentation error. This trait benefits from the proposed scheme of non-uniform seed initialization can locate more effective seeds. As a result, more small objects and slender regions are segmented into independent superpixels. Although BASS achieved the best ASA on BSDS dataset, our proposed algorithm also obtained the competitive ASA.
Table VIII shows that ERS and our proposed algorithm achieved the best segmentation performance under the three quantitative metrics on CamVid dataset. The lowest UE further proved that the superpixels generated by our proposedalgorithm do not cross multiple objects. ERS (i.e., graph-based method) achieved the best BR and ASA, which reflected that pixel-by-pixel traversal helps to obtain superpixels with higher boundary adherence. Moreover, compared to FCN and RUN, our proposed unsupervised algorithm still has advantages in segmentation performance on CamVid dataset. It proved that unsupervised superpixel segmentation methods can achieve the best segmentation performance in specific application scenarios. In addition, the experimental results on CamVid dataset are generally lower compared to the BSDS dataset. The reason is that CamVid dataset has complex scenes and contains multiple types of objects.
On the other hand, to verify the robustness of our proposed algorithm, we gave the variation interval of the segmentation accuracy on BSDS and CamVid datasets. (Parameters:
Finally, we calculated the average running time of each image on BSDS testset between our proposed algorithm and comparison algorithms. As shown in Table IX, DBSCAN has a significant advantage in terms of running time, which is due to the fast pixel label assignment by density-based clustering and C++-based programming environment. Moreover, our proposed algorithm is only shorter than the LRW method. The reason is that the extraction of grid boundaries during the non-uniform seed initialization and the re-assignment of disconnected pixels both require a certain amount of computational time. In addition, the running time of our proposed algorithm is related to the minimum step d and the gradient threshold TL. It indirectly proved that the smaller the initial number of superpixels, the shorter the running time of our proposed algorithm.
D. Application Verification
Semantic segmentation of traffic scene is one of the application fields of superpixel segmentation. To verify the applicability of our proposed algorithm, we used superpixel segmentation to refine the results of semantic segmentation on CamVid dataset. The representative semantic segmentation algorithms FCNSS [55] and DDR [56] are selected. As can be observed in Fig. 11, our proposed algorithm is able to maintain the boundaries of small objects and slender regions for semantic segmentation on CamVid dataset. In addition, our proposed algorithm can also erase noisy regions, which in turn improved the connectivity and accuracy of semantic segmentation.
Visual Comparison of semantic segmentation results with superpixels on CamVid dataset.
To provide a comprehensive quantitative analysis, we presented the refinement results of different superpixel segmentation algorithms for semantic segmentation under four evaluation metrics (i.e., UE, BR, ASA, and mIoU [56]). From Table X, the results of semantic segmentation are both improved by the refinement of SLIC and our proposed algorithm. Compared to SLIC, our proposed algorithm achieved higher performance improvement, which proved that our proposed algorithm has more advantageous in accurately segmenting objects. Finally, under all evaluation metrics, our proposed algorithm achieved the best segmentation performance. This further proved the applicability and robustness of our proposed algorithm on different semantic segmentation results.
Conclusion
Superpixel segmentation can provide fine-grained raw boundaries and few atomic regions for subsequent computer vision applications. By mining image characteristics, we propose a clustering-based superpixel segmentation algorithm with linear computational complexity. First, we propose a scheme of non-uniform seed initialization, which can adaptively determine the number and the position of initial seeds. Moreover, we quantify the disconnected regions and disconnected pixels are ubiquitous in clustering-based methods. Experiments on the nature and traffic scene datasets demonstrate that the proposed scheme of non-uniform seed initialization is effective, and our proposed superpixel segmentation algorithm can obtain outstanding performance compared with the state-of-the-art methods. Finally, in the traffic scene, we demonstrated the applicability of our proposed algorithm by applying superpixel segmentation to refine the results of semantic segmentation.