I. Introduction
Existing multioutput learning mainly aims at determining multiple outputs for a given input. In many cases, the output often involves a structure that is helpful to the training models, e.g., sequences, strings, trees, lattices, or graphs. In order to infer the structured outputs from an observation sequence rather than a data point, label sequence learning or sequence labeling has been widely studied, where the output sequence has inherent interconnections rather than a simple concatenation of individual units. It is also an important step in the most natural language processing (NLP) applications and has been applied to numerous real-world tasks including but not limited to part-of-speech (POS) tagging [1], named entity recognition (NER) [2], [3], and speech recognition [4].