Journals & Magazines >IEEE Access >Volume: 9

A Two-Stage Biomedical Event Trigger Detection Method Based on Hybrid Neural Network and Sentence Embeddings

The framework of the two-stage biomedical event trigger detection method based on hybrid neural network and sentence embeddings.

Abstract:

Biomedical event extraction is a challenging task in biomedical text mining, which plays an important role in improving biomedical research and disease prevention. As the...Show More

Metadata

Abstract:

Biomedical event extraction is a challenging task in biomedical text mining, which plays an important role in improving biomedical research and disease prevention. As the crucial and prerequisite step in event extraction, biomedical trigger detection has attracted much attention. Previous approaches usually depended on feature engineering with unbalanced data. In this paper, we propose a two-stage method based on hybrid neural network for trigger detection, which divides trigger detection into recognition stage and classification stage. In the first stage, we build a BiLSTM based recognition model integrating attention mechanism (Att-BiLSTM). In the second stage, the classification model based on Passive-Aggressive online algorithm is constructed. Furthermore, to enrich sentence-level features, we establish sentence embeddings and add reading gate. On the multi-level event extraction (MLEE) corpus test dataset, our method achieves an F-score of 80.26%, which achieves the state-of-the-art systems.

The framework of the two-stage biomedical event trigger detection method based on hybrid neural network and sentence embeddings.

Published in: IEEE Access ( Volume: 9)

Page(s): 81926 - 81935

Date of Publication: 03 June 2021

Electronic ISSN: 2169-3536

DOI: 10.1109/ACCESS.2021.3085992

Funding Agency:

Contents

CCBY - IEEE is not the copyright holder of this material. Please follow the instructions via https://creativecommons.org/licenses/by/4.0/ to obtain full-text articles and stipulations in the API documentation.

SECTION I.

Introduction

With the development of Internet, the resources are expanding at an exponential speed, especially in the biomedical domain, which has made it harder than ever for scientists to research and extract knowledge from the vast unstructured biomedical scientific literature. Therefore, biomedical information extraction techniques arise and develop rapidly. Biomedical information extraction mainly studies how to extract useful information from a large number of biomedical literatures automatically. Event extraction, which is an effective way to represent the structured knowledge from unstructured text, is a fundamental technology for information extraction. As the prerequisite step of biomedical event extraction, biomedical event trigger identification has received extensive attention. Biomedical event extraction can help biomedical scientists to do research conveniently, and provide inspiration and basis for the diagnosis, prevention, treatment and new drug research. Also, there are many useful applications for biomedical event task, such as domain search engine [1], pathway curtain [2], medical research [3] and so on. Meanwhile, many evaluation tasks have been organized for providing novel methods of biomedical event extraction tasks, such as BioNLP 2009 [4], BioNLP 2011 [5], BioNLP 2013 [6], and BioNLP 2016 [7].

As described in BioNLP 2009 share task, a biomedical event includes an event trigger and one or more arguments. Event trigger, especially a word or a phrase, typically signifies the occurrence of a biomedical event. Generally, event triggers are verbs or gerunds, and arguments are often biomedical entities or other events. For example, in the sentence fragment of Figure 1, there are three biomedical events: E1, E2 and E3. E1 is a Gene_expression type event signaled by the trigger word “production”, and “PROTE-10” is the Theme type argument. E2 is a Positive_regulation type event, including a trigger word “induction”, a Cause argument “Cdc41”, and a Theme type argument which is a nested event E1; E3 is a Negative_regulation type event, which consists of a trigger word “prevented” and a Theme type argument E2. The structures of the three events are as follows:

FIGURE 1.

An example of biomedical event in a fragment of text.

Show All

Event E1 (Type: Gene_expression, Trigger: production, Theme: PROTE-10);

Event E2 (Type: Positive_regulation, Trigger: induction, Theme: E1, Cause: Cdc41);

Event E3 (Type: Negative_regulation, Trigger: prevented, Theme: E2).

As we know, there are some challenges for biomedical event trigger task. One is the most current methods [8]–[11], [36] usually rely on the artificial designed features and need complex natural language processing (NLP) tools with poor generalizability, which lack of reusability and timeliness. Hence, high level features for the semantic information based on the deep neural networks are needed [12]–[18]. The other is that the dataset is small and sparse. Especially, biomedical event trigger identification is a multi-classification task of 19 categories, which sub class has sparse data. Therefore, it needs the effective method to improve the classification performance. Thus, we propose a hybrid structure of BiLSTM and Passive-Aggressive (PA) online algorithm. The BiLSTM retains information over long distances of both directions in time successfully [30]. The PA online algorithm is a family of margin based online algorithms for binary and multiclass categorization, which has been verified the effect on biomedical event extraction [31]. Furthermore, to filter out the irrelevant noise and find the important units in the input sequence which are influential for the output, we integrate attention mechanism in the proposed architecture.

In addition, most of the state-of-the-art trigger detection methods are based on one-stage model. That is, the recognition and classification are integrated into one task. One stage methods classify the instances into a negative instance or the other 19 classes directly. The negative instances with large proportion may be classified incorrectly. To further improve the performance, we propose a two-stage method, which divides trigger detection into recognition stage and classification stage. The two-stage method decomposes a complex problem into two simple problems, which reduces the difficulty of the problem; at the same time, it alleviates the data imbalance. In the first stage, triggers in biomedical literatures are distinguished from non-triggers without identifying their types; then, in the second stage, the correct trigger types are determined. Since a large number of negative examples are filtered, the recognition performance can be improved. We build a BiLSTM model integrating attention mechanism (Att-BiLSTM) for binary classification in the first stage. Meanwhile, considering the dataset is small after filtering negative instances, PA online algorithm is adopted for multi-classification in the second stage. Moreover, we construct the sentence embeddings, which can enrich sentence-level features. Based on the above, our two-stage event trigger detection method based on hybrid neural network and sentence embeddings is expected to further improve the performance of trigger detection. Experimental results show that our method can achieve an F-score of 80.26% on the MLEE corpus [8], which outperforms the state-of-the-art performance.

To sum up, the contributions of our method are listed as follows:

We present a novel effective hybrid machine network for precisely identifying event triggers, including attention based bidirectional long short term memory (Att-BiLSTM) and Passive-Aggressive (PA) online algorithm.
We utilize two-stage method to combine the hybrid machine network. The two-stage method divides trigger detection into two subtasks: trigger recognition and classification, which can alleviate the problem of class imbalance effectively.
We build the sentence embeddings, calculated by the pre-trained word embeddings and the supplementary fine-tuned word embeddings with the training process. The sentence embeddings can get the entire sentence feature representation, and enrich the input representation of data.

The rest part of this paper is organized as follows: Section 2 analyzes related work on biomedical event trigger detection. Section 3 describes the details of our proposed method. Then, experimental results and analyses are illustrated in Section 4. Section 5 gives the discussion. Finally, conclusion and future work are shown in Section 6.

SECTION II.

Related Work

In the past years, many methods have been proposed for biomedical event trigger detection. Rule-based methods are firstly proposed to deal with the limited resources of annotated texts [19], [20]. These methods focus on the definition of a set of extraction rules, and commonly employ pattern recognition techniques to generate word patterns with pre-defined dictionary. The rule-based methods are time consuming and require a lot of domain knowledge. Meanwhile, it is difficult to cover all types of trigger pattern.

Machine learning methods treat trigger detection as a multi-class classification task. On the MLEE corpus, Pyysalo et al. [8] employed a SVM-based approach. They designed salient lexical and local context features manually, including whether a word has a capital letter or a number and so on, and fed them into a one-versus-the-rest SVM classifier to detect event trigger. Zhou and Zhong [10] presented a semi-supervised learning framework to identify the trigger based on hidden topics. In this framework, the sentences in the un-annotated corpus are elaborately and automatically assigned with event annotations based on their distances to these sentences. Zhou et al. [11] learned biomedical domain knowledge from a large text corpus built from Medline and embedded it into word features. The embedded features were then combined with syntactic and semantic context features using the multiple kernel learning method. Our previous work [36] proposed a two-stage trigger detection method based on SVM classifier, which designed rich features and integrated feature selection. For the biomedical event extraction evaluation tasks, pipeline-based event extraction systems confirm the feasibility on many datasets, such as TEES [21], EventMine [22] and so on. The above methods rely on the hand-crafted features which are time consuming. Also, different features are needed to tailor for different task, thus not making them generalizable.

On the other hand, various neural networks have been applied into biomedical event extraction task successfully in recent years. Wang et al. [12] employed a neural network architecture to learn significant feature representation based on dependency relation tree, and dynamically adjusted the embeddings while training for adapting to the trigger classification task. Nie et al. [13] proposed a word EANNP architecture to conduct event identification. Wang et al. [14] employed CNN to exploit higher-level features automatically for trigger identification. They kept one candidate trigger along with $N$ -words around it and entity mention features as a raw input, giving up complex input with hand-designed features. Although these methods can effectively alleviate the problem of manually extracting features, however, most of them rely on local sentence representation features only within a window, which may be insufficient for trigger identification. To capture the semantic information of long-distance words, RNN are employed in trigger identification recently. Rahul et al. [15] proposed a Bidirectional RNN based method to extract the higher level features across the sentence. They employed bidirectional LSTM (F-score is 78.71%) and bidirectional GRU (F-score is 79.11%) to identify triggers respectively. To obtain more syntactic information and capture the most crucial semantic information, our previous works [16] trained dependency-based word embeddings and integrated attention mechanism based on the BiLSTM neural network. [17] constructed sentence vector and proposed the multi-level attention. However, the class imbalance problem needs to be further solved.

In recent years, to take advantage of traditional machine methods on small datasets, a scheme to combine these methods with neural networks emerges. Ebert et al. [23] proposed a method which combines CNN and SVM for the emotional classification on Twitter posts. Huang et al. [24] employed LSTM and SVM for drug-drug interaction (DDI) extraction, and the experimental results demonstrated the effectiveness of the proposed approach. To incorporate the advantages of SVM and CNN, Ahlawat and Choudhary [25] adopted a hybrid model for handwritten digit recognition.

The above advanced one-stage methods have their notable advantages for trigger detection. However, the impact of data imbalance on the results is not considered, thus the improvements in small dataset is limited. As [26] mentioned, the problem of data balance depends on the ratio of the number of instances in the minority category to the number of instances in the majority category in the data. On the commonly used event extraction dataset MLEE, there are 14964 negative instances and 1756 positive instances, where the class imbalance is serious. Accordingly, we propose a two-stage method to alleviate the problem of class imbalance. Furthermore, to take the advantage of the hybrid model, we employ BiLSTM in feature extraction, and utilize the effectiveness of PA online algorithm in deal with small datasets. In addition, sentence embeddings is constructed to supplement global information.

SECTION III.

Methodology

Figure 2 is our framework of trigger detection, which consists of two parts: the input representation of the data, and the trigger detection model based on hybrid neural network integrating two-stage method.

FIGURE 2.

Our framework of trigger detection.

Show All

In the first part, the original texts will be converted to the corresponding pre-trained embeddings as an input. The fine-tuned embeddings with training are another input. In addition, the sentence-level embeddings generated by the pre-trained word embeddings and fine-tuned word embeddings are the supplemental input information. The three kinds of inputs will be sent to the second part for detection. In the second part, trigger detection is divided into two stages, namely trigger recognition and trigger classification. In the first stage, the candidate word is judged whether it is a trigger word, not determining its specific type. Then the predicted positive instances of the first stage will be sent to the second stage. In this stage, the predicted positive instances are classified to specific categories. In the first stage, we build an Att-BiLSTM model for the binary-class classification. In the second stage, we use Att-BiLSTM and PA algorithm to classify the predicted trigger words respectively, and the experimental results verify the effectiveness of the PA algorithm on small dataset. Finally, the negative prediction instances filtered out in the first stage are added back to form the complete prediction result for the final performance evaluation.

A. Input and Representation of Data

1) Dependency-Based Word Embeddings

In recent years, word embeddings have been widely used in information processing. Different from other NLP tasks, biomedical event extraction needs more information in dependency contexts than in linear contexts [27]. Therefore, we employ Word2vecf [28] to train dependency-based word embeddings as feature representation, which can yield more focused embeddings, capturing more functional and less topical similarity.

In this work, we download 5.7G PubMed abstracts, and parse them with Gdep parser, which is a dependency parse tool specialized for biomedical texts. Then, we employ Word2vecf to train dependency-based word embeddings with the word contexts in syntactic relations derived in the previous step.

2) Sentence Embeddings

There is a strong association among events appearing in a sentence, therefore, the global information of the sentence is critical to biomedical event extraction. However, the sentence-level features may be ignored only using the word-level embeddings. Thus, we construct sentence embeddings as supplementary inputs in our BiLSTM architecture.

The previous works [16], [29] have validated the effectiveness of sentence embeddings on biomedical event extraction and biomedical named entity recognition (NER). With similar approach, two different kinds of word embeddings in the whole training process are employed. One is the pre-trained dependency-based word embeddings $x_{t}$ , the other is retained word embeddings $x_{t}$ ’, the initial value of $x_{t}$ ’ is the same as $x_{t}$ . However, $x_{t}$ ’ is fine-tuned with training process. The pre-trained dependency-based word embeddings can obtain the potential feature information from large scale unlabeled corpus, while, the fine-tuned word embeddings contain richer information associated with the biomedical event trigger. In order to take advantage of both word embeddings, we construct the sentence embeddings based on our LSTM framework. We have four ways to obtain the sentence embeddings: on the one hand, considering the fine-tuning and pre-trained word embeddings of the whole sentence; on the other hand, calculating the difference and sum of the above two word embeddings. Then, we compute the sentence embeddings by averaging and maximizing the word embeddings. Finally, we construct the sentence embeddings by averaging the difference of the two kinds of word embeddings, which has the best performance.

Figure 3 gives the memory cell of the LSTM integrating sentence embeddings. As (1) shown, the sentence vector $d_{0}$ is generated by averaging the differences of all the words’ two word embeddings in a sentence, and $n$ is the length of the sentence. In addition, a reading gate $r_{t} \in \left [{ {0,1} }\right]^{n}$ is added to control what information should be retrained for future time steps. Experimental results show that the sentence embeddings can improve the performance of our trigger identification significantly.\begin{equation*} d_{0} =\frac {1}{n}\left({\sum \limits _{t=1}^{T} ({x}'_{t} -x_{t})}\right)\tag{1}\end{equation*} View Source

FIGURE 3.

Memory cell of BLSTM integrating sentence embeddings.

Show All

B. Two-Stage Method Based on Hybrid Neural Network

1) Recognition Stage

In this stage, the event triggers in biomedical literatures are distinguished from non-triggers without identifying their types. We build a BiLSTM neural network for binary-class classification and integrate attention mechanism to focus on the key words in the sentence. Also, the sentence embeddings are constructed to obtain the sentence-level features.

a: BiLSTM Integrating Sentence Embeddings

BiLSTM networks process the data in both directions with two separate hidden layers, which are then fed forwards to the same output layer. The forward pass output ($\vec {h_{t}}$ ) and the backward pass output ($\vec {h_{t}}$ ) are combined by summation. The output at $t$ moment is shown as (2).

In addition, our new BiLSTM architecture after adding the fine-tuned word embeddings is described as (3) to (6). The reading gate is shown as (7). The sentence information can be calculated as (8) at $t$ moment. After integrating the sentence embeddings, the cell value $c_{t} $ is modified to (9). $x$ is the input embeddings at time $t$ , and $i$ , $f$ , $o$ and $c$ are input gate, forget gate, output gate and the proposed values respectively, all of which are the same size as the hidden vector $h$ . $w_{xh} $ , $w_{hh} $ are the input connections, recurrent connections respectively, and $b_{h}$ is the bias value. $\sigma $ represents the logistic sigmoid function, $\odot $ denotes the element-wise multiplication, and $c_{t}$ means the true cell value at time $t$ .\begin{align*} h_{t}=&[\vec {h_{t}} \oplus \vec {h_{t}}] \tag{2}\\ i_{t}=&\sigma (x_{t} \cdot w_{xh}^{i} +{x}'_{t} \cdot w_{x'h}^{i} +h_{t-1} \cdot w_{h{h}'}^{i} +b_{h}^{i}) \tag{3}\\ f_{t}=&\sigma (x_{t} \cdot w_{xh}^{f} +{x}'_{t} \cdot w_{x'h}^{f} +h_{t-1} \cdot w_{h{h}'}^{f} +b_{h}^{f}) \tag{4}\\ o_{t}=&\sigma (x_{t} \cdot w_{xh}^{o} +{x}'_{t} \cdot w_{x'h}^{o} +h_{t-1} \cdot w_{h{h}'}^{o} +b_{h}^{o}) \tag{5}\\ \tilde {c}_{t}=&\tanh (x_{t} \cdot w_{xh}^{c} +{x}'_{t} \cdot w_{x'h}^{c} +h_{t-1} \cdot w_{h{h}'}^{c} +b_{h}^{c}) \tag{6}\\ r_{t}=&\sigma (x_{t} \cdot w_{xh}^{r} +x_{t}^{\prime } \cdot w_{x'h}^{r} +h_{t-1} \cdot w_{hh'}^{r} +b_{h}^{r}) \tag{7}\\ d_{t}=&r_{t} \odot d_{t-1} \tag{8}\\ c_{t}=&i_{t} \odot \tilde {c}_{t} +f_{t} \odot c_{t-1} +\tanh (d_{t})\tag{9}\end{align*} View Source

b: Attention Mechanism

Attention-based neural networks have recently demonstrated success in various tasks, which can filter out the irrelevant noise and find the important units in the input sequence influential to the output. Different from setting a calculation formula, we set a random initial weight matrix and let it tune with training process. Then, the common features of triggers can be learned by neural networks automatically, and the corresponding weight of the words with these features will be increased, thus, the critical information will be captured. As shown in (10), $H\in R^{d_{w} \times L}$ represents the final BiLSTM hidden layer output vector matrix, consisting of output vectors [$h_{1}, h_{2}, \ldots, h_{L}$ ], where $L$ is the sentence length, $d_{w}$ refers to the dimension of the word vectors. In (11), the attention mechanism will produce a vector $\alpha $ of attention weights, where $w$ denotes a trained parameter vector and $w^{T}$ is a transpose. Then, in (12), a weighted representation $\gamma $ is formed by a weighted sum of the output vectors $H$ . Finally, the ultimate semantic information of the sentence is produced from (13), where $h_{i}^{\ast } $ denotes the final sentence representation. The dimension of $\alpha $ , $w$ , $\gamma $ and $h^{\ast } $ is $L$ , $d_{w}$ , $d_{w}$ , $d_{w}$ separately.\begin{align*} N=&\tanh (H) \tag{10}\\ \alpha=&soft\max (w^{T}N) \tag{11}\\ \gamma=&H\alpha ^{T} \tag{12}\\ h^{\ast }=&\tanh (\gamma)\tag{13}\end{align*} View Source

2) Classification Stage

In this stage, the correct trigger type for each trigger candidate is determined. Att-BiLSTM and PA algorithm are utilized for multi-class classification respectively. Since a large number of negative instances are filtered out in the first stage, the semantic contextual information is insufficient. Therefore, the PA algorithm is more suitable for multi-class classification in the second stage, and experimental results verify this.

a: Trigger Prediction Based on Att-BiLSTM

Trigger identification aims to assign each token to a specific event trigger type or a negative class if it does not belong to a trigger class. In this work, we treat each token of sentences as a trigger candidate instance. Then, the hidden state $h_{i}^{\ast } $ of each word is generated by the BiLSTM model integrating attention mechanism. Finally, we employ softmax classifier to predict label $\hat {y}$ of each trigger candidate. The classifier takes the hidden state $h_{i}^{\ast } $ as input:\begin{align*} \hat {p}(y\vert x)=&soft\max (Wh_{i}^{\ast } +b) \tag{14}\\ \hat {y}=&\arg \max _{y} \hat {p}(y\vert x)\tag{15}\end{align*} View Source

In our model, the objective function is the cross-entropy loss defined as (16). In (16), $t_{i}^{j} $ denotes the $j$ -th type distribution of the $i$ -th instance, and $\hat {p}_{i}^{j} $ is the predicted distribution.\begin{equation*} L(\theta)=-\sum \limits _{i} \sum \limits _{j} {t_{i}^{j} \log (\hat {p}_{i}^{j})}\tag{16}\end{equation*} View Source

b: Trigger Prediction Based on PA Online Algorithm

The previous work has demonstrated the effect on biomedical event extraction [31]. To improve the robustness of a classifier and reduce the number of possible combinations, parameter $C$ is utilized to optimize the classifiers, and the mean of selected classifiers is adopted. In this work, the trigger multiclass categorization label with the highest score is regarded as the predicted class label when using online algorithms. The PA algorithm has good generalization ability as SVM, and the interested readers can refer to [32] for more details.

SECTION IV.

Experiments and Analysis

A. Corpus and Evaluation

Our experiments are conducted on the commonly used dataset multi-level event extraction (MLEE) [8]. The MLEE corpus encompasses all levels of biomedical organization from the molecular level to the whole organism. The related event triggers are divided into four categories (i.e. Anatomical, Molecular, General and Planned), containing 19 pre-defined trigger classes, such as “Blood vessel development”, “Regulation” and “Cell proliferation”. The multiple existing triggers of each sentence are labeled with numbers, types, locations and words on the corpus. The overall static distribution of the MLEE corpus is shown in Table 1. It contains 262 event documents, including 2608 sentences and 6677 events. The events are divided into 19 sub-categories, and event trigger detection aims to identify the correct sub-class of an event.

TABLE 1 The Static Distribution of MLEE Corpus

We combine the training and development datasets of the MLEE corpus for training, use development dataset for tuning parameters, and employ the test dataset for testing. The commonly used evaluation criteria $P$ (recision)/$R$ (ecall)/$F$ (score) for NLP classification tasks is adopted, which is defined as (17), where $TP$ , $FP$ and $FN$ are short for True Positives, False Positives and False Negatives respectively.\begin{align*} P=&\frac {TP}{TP+FP}, \\ R=&\frac {TP}{TP+FN}, \\ F=&\frac {2^\ast P^\ast R}{P+R}\tag{17}\end{align*} View Source

B. Hyper-Parameters Settings

In this section, we provide the training details and hyper-parameters of our model. We use GPU to speed up the training of neural network, and the graphics card we use is RTX 2080. Our framework is implemented based on Theano [33], the batch_size is set to 64, the number of layers is selected as 1 from the set {1, 2, 4}. The dimensions of all the word-embeddings employed in the experiments are 200. In addition, we set the dropout [34] rate to 0.5, the number of hidden nodes to 200, the maximum number of iterations to 100. We select the Adadelta [35] as the stochastic-gradient descent algorithm. The learning rate is selected as 0.001 from the set {0.01, 0.001, 0.0001}. The parameter $C$ of PA algorithm is selected as 0.00005 from the set {0.00001, 0.00005, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05}. The main hyper-parameters settings are shown in Table 2.

TABLE 2 The Setting of Hyper-Parameters

C. Experiments and Results

As mentioned in the Related Work, there are some advanced approaches to detect event triggers. They are listed as follows.

SVM1: a SVM based classifier proposed by Pyysalo et al. [8], which defined rich hand-crafted features. It has significant potential over existing systems, and we select this method as the baseline method.

SVM2: a SVM based classifier proposed by Zhou and Zhong [10], which integrated hidden topics and rich hand-crafted features.

SVM3: our proposed two-stage trigger detection method based on SVM [36], which designed rich features and integrated feature selection.

EANNP: a word embedding assisted neural network proposed by Nie et al. [13], which incorporated the semantic and syntactic information.

CNN: a CNN classifier proposed by Wang et al. [14], which utilized CNN with different windows.

LSTM: a LSTM classifier proposed by Rahul et al. [15], which employed LSTM network with entity type embedding and word embeddings.

GRU: a GRU classifier proposed by Rahul et al. [15], which used GRU network with entity type information and word embeddings.

LSTMs: These LSTM classifiers are proposed in our previous work [16], [17]. References [16] integrated dependency word embeddings and attention mechanism. References [17] proposed multi-level attention, which consists of word level and sentence level attention.

The above methods have their notable advantages. As shown in Table 3, we compare the performance of our proposed approach with other methods on the MLEE corpus. It can be found that:

On the whole, the methods based on deep neural network almost obtain better performance than the traditional methods. It verifies that these deep learning methods can capture the effective high semantic representations, which are more suitable for trigger detection task. In addition, the best performance of the traditional method [36] is also based on the two-stage method, demonstrating the effectiveness of the two-stage method.
The performances of LSTM and GRU models are all better than CNN model on average F-score, which verifies that the recurrent neural network can obtain global contextual information by the biomedical event surrounding information. Also, it proves that sequential models are more appropriate for biomedical event text.
The best F-score value is achieved by our proposed model, which outperforms the state-of-the-art method Att-BLSTM by 0.53% on F-score, GRU by 1.15% on F- score, and the SVM approach by 0.51% on F-score. Although [36] has achieved comparable performance, it needs to design complex features, which is time costing and lack of generalizability. It is worth mentioning that we use no additional handcrafted features compared with the above methods and our model has superior precision, and F-score than most systems. The results illustrate the effectiveness of our method.

TABLE 3 Overall Experimental Results of Event Trigger Detection

D. Detailed Analysis

1) Impact of Different Components

To further verify the effectiveness of different components in our proposed network, we design several sub-networks as shown in Table 4. Line 1 to Line 4 are the four different one-stage method results, line 5 and line 6 list two different two-stage method results. As shown in Table 4, we can find that:

Sentence embeddings can establish the connection between word-level features and sentence-level features, enrich global sentence information and obtain more accurate hidden layer representation, which are helpful for the complex biomedical event extraction. As shown in Table 4, the F-score of integrating sentence embeddings (line 2) is improved by 3.75% than the BiLSTM model (line 1) significantly, achieving 77.96%, which validates the effectiveness of the sentence embeddings.
The F-score of integrating attention mechanism (line 3) is improved by 3.92% than the BiLSTM model (line 1), which demonstrates the impact of attention. Furthermore, the results of line 4 integrating both sentence embeddings and attention achieve the better performance on 79.73% F-score. It is worth mentioning that the recall is improved markedly after integrating attention mechanism. The main reason may be that the attention mechanism helps to filter out the irrelevant noise and find the important units in the input sequence, so that more event trigger words are recalled.
To verify the effectiveness of the two-stage method based on hybrid neural network, we employ Att-BiLSTM neural network (line 5) and PA algorithm (line 6) respectively in the second stage. In this stage, the corpus scale is small and the contextual semantic information is seriously damaged, therefore, PA algorithm is more suitable than BiLSTM neural network. In addition, compared with the result of one-stage method (line 4), the hybrid neural network based on two-stage method (line 6) performs better. As shown in Table 4, the two-stage method based on hybrid neural network achieves the best F-score, which demonstrate the effectiveness of our proposed method.

TABLE 4 Impact of Different Components

2) Detailed Category Performance Analysis

There are 19 event types on the MLEE corpus, “Regulation”, “Positive regulation”, “Negative regulation” and “Binding” are complex biomedical event types, which include multiple arguments or nest other events. The other 15 event types are simple events which comprise one trigger and one argument. To further investigate the potential of our proposed method, we list the detailed F-scores for all 19 event types in Table 5, which includes the results of our experiment, the baseline method [8] and Wang et al. ’s method [14]. Since He et al. [16] didn’t give the detailed performance, therefore we select Wang et al. ’s results for comparison, which has the best performance of all the other methods. From Table 5, it can be observed that the proposed method outperforms the Pyysalo et al. ’s baseline method on 13 event types, and achieves the same F-score on 1 event types. For instance, in the complex event categories “Regulation”, “Positive regulation” and “Negative regulation”, the F-scores of the proposed approach are 3.4%, 6.4% and 2.94% higher than the baseline method respectively; and the proposed method’s F-score of event categories “Growth”, “Synthesis”, “Catabolism” and “Phosphorylation” is over 10% higher than the baseline method. Furthermore, our method outperforms Wang et al. ’s method on 9 event types and achieves the same F-score on 3 event types. Such as in the “Binding” and “Positive regulation” complex event categories, the F-scores are improved by 7.84% and 3.84% respectively. The results show that the proposed method has advantage of detecting complex event type trigger, which needs more compositional semantic features. It demonstrates the effectiveness of the sentence embeddings and hybrid neural network.

TABLE 5 Detailed Performance of 19 Event Types

As shown in Table 5, some event trigger types have a very large percentage in the test dataset, and some are very small. To evaluate the performance of our method more effectively, we list the trigger types which account for more than 3% of the total event triggers as main trigger types (Figure. 4A). Figure. 4B gives the performance of the baseline method [8], Wang et al.’s method [16] and our proposed method on the main trigger types corresponding to Figure. 4A. It can be observed that our proposed method has better performance in most main trigger types than the baseline method and Wang et al.’s method, especially on the “Positive regulation”, “Negative regulation”, “Regulation” and “Gene expression” trigger types which occupy very large proportion. The experimental results illustrate the effectiveness and generalization ability of our model.

FIGURE 4.

(A) Percentage of the main trigger types. (B) Performance of the main trigger types.

Show All

E. Error Analysis

As shown in Table 5, our model has achieved good performance in most sub categories, especially the F-scores of “Positive_regulation” and “Negative_regulation” of complex events. But for some simple event triggers, such as “Development” and “Breakdown”, the F-scores are relatively low. According to the error prediction results, we classify the error causes as follows:

Lack of training data: Some of the trigger types appeared in the test set have very few samples in the training set, such as “Dephosphorylation” and “Remodeling”, which affect the classification performance of these sub categories.
Data ambiguity: Same words may trigger different types of events in the training set and test set respectively.
Co-reference: For the same entity, the expressions may be different. For example, people usually use “it” instead of “PROTE-10” mentioned in the previous text. This problem may lead to the sparsity of entity labels and misclassification.

SECTION V.

Discussion

From the above experimental results, we can conclude that our model outperforms most state-of-the-art systems. The detailed analysis for the improvement is as follows:

A. Two-Stage Method Based on Hybrid Neural Network

The two-stage method divides trigger detection into trigger recognition stage and classification stage. Recognition stage judges whether the candidate word is a trigger word. Classification stage determines the predicted trigger word as a specific type. The merits of the two-stage method can be summarized as follows: firstly, it decomposes one complex problem into two simple problems, which reduces the difficulty of the problem. Secondly, it can alleviate the problem of class imbalance effectively. Since the original raw data of biomedical events are very sparse, therefore, in the one stage method, the proportion of all negative instances and the minority positive instance is great disparity. However, in the two-stage method, the two subclasses consist of negative instances and all the positive instances in the recognition stage. Meanwhile, the multi subclasses are composed of only the positive instances in the multi-class classification stage. Therefore, the class imbalance can be alleviated in both stages. Also, the two-stage method costs shorter training time.

Furthermore, in the first stage, our model can automatically exploit the high level and latent features by BiLSTM, which avoid complex artificial features based on specific task. In the second stage, the positive instance set scale is small and the contextual semantic information is insufficient, therefore, PA algorithm is more suitable for the classification stage. Therefore, the two-stage method based on hybrid neural network performs well.

B. Sentence Embeddings

To take advantage of both the pre-trained word embeddings and fine-tuned word embeddings, we construct the sentence embeddings according to the two kind word embeddings of all the words within a sentence. Sentence embeddings, as a supplementary input of the proposed architecture, can establish the connection between word-level features and sentence-level features, enrich global sentence information and obtain more accurate hidden layer representation. In addition, for our trigger identification task, there is a strong association among the events appearing in a sentence. There are multiple events in a sentence, and each event has its own triggers and arguments. The semantic information among the triggers and arguments may be helpful to identify each other. For example, in the sentence “The results indicate that alpha-MSH exerts modulatory effects on the activation of $\ldots$ ”, it is difficult to identify “effects” as a Positive_regulation type trigger just depending on its word embeddings because “effects” also appears as other trigger types in the training dataset. In this case, the global sentence-level features are more important than the word embeddings. According to the other trigger word “activation”, which always appears as Positive_regulation type trigger, it is more helpful to classify “effects” correctly.

Also, there may be nested events in a sentence, which means a trigger may be the argument of other events in the sentence (such as event E2 and E3 of Figure. 1). That means, there might be interrelation between any two words in a sentence. Therefore, the global information of the sentence is important for event extraction. Thus, we construct the sentence embeddings to capture the global sentence-level features. The experimental results reveal that the sentence embeddings have the significant impact on trigger detection.

C. Attention Mechanism

Attention mechanism can automatically focus on the words that have decisive effect on classification, and capture the most important semantic information in a sentence, without using extra knowledge and NLP systems. Thus, the irrelevant noise can be filtered out, and the critical semantic information is reinforced, which is benefit to trigger identification. For example, in the sentence “Inhibition of angiogenesis has been shown to be an effective strategy in the therapy.” (Figure 5), there are three triggers, which are “Inhibition”, “angiogenesis” and “therapy” respectively. All the words are treated equally before integrating attention. However, after integrating the attention mechanism in our architecture, as shown in Figure 5, most of the verbs (“has”, “shown”) and nouns (“Inhibition”, “angiogenesis”, “therapy”) in the sentence are strengthened by the attention weights in the training process. Since the event triggers are usually verbs or gerunds, and arguments are usually nouns or other triggers (nested events), therefore, the potential triggers and arguments might be focused on via the attention, which might be helpful for trigger identification. Also, the enhancement of the argument information might be helpful for determining the trigger to correct types. In this work, the attention employed is not designed for specific task, which enhances the reusability of the proposed model. Figure 5 shows to what extent the attentive model focuses on the contextual representations. Overall, more triggers are identified after integrating the attention mechanism.

FIGURE 5.

Visualization of attentions over words.

Show All

SECTION VI.

Conclusion and Future Work

In this paper, we propose a novel effective two-stage event trigger detection method based on hybrid neural network and sentence embeddings. The two-stage method can alleviate the class imbalance problem. BiLSTM architecture is employed to capture higher level features, and PA algorithm is utilized for small scale dataset. Furthermore, sentence embeddings enrich the sentence-level information and obtain more abundant contextual information for the events within a sentence. Meanwhile, attention mechanism captures the most important semantic information. Experimental results conducted on a real-word multi-level event extraction (MLEE) corpus dataset demonstrate the effectiveness of our proposed method.

In the future, we would like to integrate domain knowledge for improving the performance of biomedical trigger detection and try to solve the problems mentioned in the error analysis. Meanwhile, we would like to explore more different effective neural networks to detect event triggers and other information extraction tasks.

References is not available for this document.

MIT Libraries

MIT Libraries

A Two-Stage Biomedical Event Trigger Detection Method Based on Hybrid Neural Network and Sentence Embeddings

Abstract:

Metadata

Abstract:

Funding Agency:

Introduction

Related Work