QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers | IEEE Conference Publication | IEEE Xplore

QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers


Abstract:

The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is tru...Show More

Abstract:

The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is true for Knowledge Graph Question Answering (KGQA) systems that provide the access to Semantic Web data via natural language interface. While following our research agenda on the multilingual aspect of accessibility of KGQA systems, we identified several ongoing challenges. One of them is the lack of multilingual KGQA benchmarks. In this work, we extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages provided by native speakers, and transferring the SPARQL queries of QALD-9 from DBpedia to Wikidata, s.t., the usability and relevance of the dataset is strongly increased. Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before. The latter two of the languages are considered as “endangered” by UNESCO. We call the extended dataset QALD-9-plus and made it available online11Figshare: https://doi.org/10.6084/m9.figshare.16864273. GitHub: https://github.com/Perevalov/qald_9_plus.
Date of Conference: 26-28 January 2022
Date Added to IEEE Xplore: 23 March 2022
ISBN Information:
Print on Demand(PoD) ISSN: 2325-6516
Conference Location: Laguna Hills, CA, USA
No metrics found for this document.

I. Introduction

The core task of a Knowledge Graph Question Answering system is to represent a natural language question in the form of a structured query (e.g., SPARQL) to a knowledge graph (KG). In other words, KGQA systems provide access to the data in KGs via a natural-language user interface, s.t., end users are not required to learn a particular query language for fetching data manually. Obviously, the relevance (or accuracy) of the answers given by such system should strive to human performance and reduce labor costs for learning a particular query language; otherwise, the system is useless. Many researchers are aiming at measuring and increasing the Question Answering (QA) quality or the quality of a particular KGQA sub-tasks, such as named entity linking (e.g., [1]), expected answer type prediction (e.g., [2]), etc. However, the accessibility

The accessibility for the Web is defined by W3C: https://www.w3.org/standards/webdesign/accessibility

characteristic of the KGQA systems often stays overlooked. In this context, the perfect accessibility denotes an equivalent experience to all user groups of a particular KGQA system. Hence, such research questions as: “How many people can really take advantage of the high-quality KGQA system?” and “Who are these people?” as well as “How diverse they are?” are often left unnoticeable.

Usage
Select a Year
2025

View as

Total usage sinceMar 2022:371
0246810JanFebMarAprMayJunJulAugSepOctNovDec890000000000
Year Total:17
Data is updated monthly. Usage includes PDF downloads and HTML views.
Contact IEEE to Subscribe

References

References is not available for this document.