I. Introduction
Assertion detection is a key task within the area of Clinical Natural Language Processing (NLP) [1]. It usually involves identifying the assertion types for medical concepts in the clinical text, namely certainty (whether the medical concept is positive, negated, possible, or hypothetical), temporality (whether the medical concept is for present or the previous history), and experiencer (whether the medical concept is described for the patient or a family member). Figure 1 shows an example of medical concepts and the corresponding assertions. This task plays a crucial role in understanding medical concepts from the free-text Electronic Health Records (EHRs), directly impacting the accuracy of clinical decision-making and the efficiency of patient care. As a core component of clinical NLP, assertion detection also holds significant potential for enhancing information retrieval and automated clinical reasoning. However, it faces challenges such as class distribution imbalance and the unstructured nature of clinical notes. Particularly challenging is the classification of assertions like ‘Possible’ and ‘Family’, which are often less frequently occurring and ambiguously expressed. Previous studies have widely applied rule-based methods such as NegEx [2] and ConText [1] in clinical NLP software, setting a benchmark in medical informatics with applications in tools like OHNLP Toolkit [3], MedTagger [4], medspaCy [5], and cTAKES [6]. However, these rule-based approaches are limited by their fixed patterns and inability to exhaust all possibilities, often leading to low recall rates. To overcome these limitations, deep learning methods like convolutional neural networks (CNNs) and Long short-term memory (LSTM) [7]–[9] were introduced. Although these approaches show promise, they still require substantial amounts of labeled data and tend to underperform when dealing with small or imbalanced datasets.
Examples of assertions in clinical texts. Medical concepts and the corresponding assertions are highlighted.