Abstract:
Recent studies in Legal NLP showed the lack of structured data to train Deep Learning models in several tasks. With the increased importance of privacy policies in the cu...Show MoreMetadata
Abstract:
Recent studies in Legal NLP showed the lack of structured data to train Deep Learning models in several tasks. With the increased importance of privacy policies in the current digital world, the research community released multiple datasets related to privacy policies in the last few years. However, other empirical studies have shown the lack of transferability between domain-specific language models in a legal subdomain to other more separate subdomains. With the focus on privacy policies, models are not tested on other policies. In this work, we release the CSIAC-DoDIN V1.0 dataset, focused on cybersecurity policies, responsibilities, and procedures of the organizations involved. This first version offers classic Legal NLP tasks such as several Multiclass Classification tasks and text co-occurrence. Furthermore, we also provide a baseline for this dataset and tasks with experiments using classic transformer-based language models such as BERT, RoBERTa, Legal-BERT, and PrivBERT.
Published in: 2023 International Conference on Computational Science and Computational Intelligence (CSCI)
Date of Conference: 13-15 December 2023
Date Added to IEEE Xplore: 19 July 2024
ISBN Information: