Loading [a11y]/accessibility-menu.js
QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers | IEEE Conference Publication | IEEE Xplore

QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers


Abstract:

The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is tru...Show More

Abstract:

The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is true for Knowledge Graph Question Answering (KGQA) systems that provide the access to Semantic Web data via natural language interface. While following our research agenda on the multilingual aspect of accessibility of KGQA systems, we identified several ongoing challenges. One of them is the lack of multilingual KGQA benchmarks. In this work, we extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages provided by native speakers, and transferring the SPARQL queries of QALD-9 from DBpedia to Wikidata, s.t., the usability and relevance of the dataset is strongly increased. Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before. The latter two of the languages are considered as “endangered” by UNESCO. We call the extended dataset QALD-9-plus and made it available online11Figshare: https://doi.org/10.6084/m9.figshare.16864273. GitHub: https://github.com/Perevalov/qald_9_plus.
Date of Conference: 26-28 January 2022
Date Added to IEEE Xplore: 23 March 2022
ISBN Information:
Print on Demand(PoD) ISSN: 2325-6516
Conference Location: Laguna Hills, CA, USA
References is not available for this document.

I. Introduction

The core task of a Knowledge Graph Question Answering system is to represent a natural language question in the form of a structured query (e.g., SPARQL) to a knowledge graph (KG). In other words, KGQA systems provide access to the data in KGs via a natural-language user interface, s.t., end users are not required to learn a particular query language for fetching data manually. Obviously, the relevance (or accuracy) of the answers given by such system should strive to human performance and reduce labor costs for learning a particular query language; otherwise, the system is useless. Many researchers are aiming at measuring and increasing the Question Answering (QA) quality or the quality of a particular KGQA sub-tasks, such as named entity linking (e.g., [1]), expected answer type prediction (e.g., [2]), etc. However, the accessibility

The accessibility for the Web is defined by W3C: https://www.w3.org/standards/webdesign/accessibility

characteristic of the KGQA systems often stays overlooked. In this context, the perfect accessibility denotes an equivalent experience to all user groups of a particular KGQA system. Hence, such research questions as: “How many people can really take advantage of the high-quality KGQA system?” and “Who are these people?” as well as “How diverse they are?” are often left unnoticeable.

Select All
1.
D. Diefenbach, K. Singh, A. Both, D. Cherix, C. Lange and S. Auer, "The Qanary ecosystem: Getting new insights by composing question answering pipelines", Web Engineering - 17th International Conference ICWE 2017 Rome Italy June 5–82017 Proceedings, vol. 10360, pp. 171-189, 2017.
2.
A. Perevalov and A. Both, "Augmentation-based answer type classification of the SMART dataset", Proceedings of the SeMantic AnsweR Type prediction task (SMART) at ISWC 2020, vol. 2774, pp. 1-9, 2020.
3.
R. Usbeck, R. H. Gusmita, A. N. Ngomo and M. Saleem, 9th challenge on question answering over linked data (QALD-9), 2018.
4.
I. Rybin, V. Korablinov, P. Efimov and P. Braslavski, "RuBQ 2.0: An innovated russian question answering dataset" in The Semantic Web, Cham:Springer International Publishing, pp. 532-547, 2021.
5.
R. Cui, R. Aralikatte, H. Lent and D. Hershcovich, "Multilingual compositional Wikidata questions", arXiv, 2021.
6.
E. Loginova, S. Varanasi and G. Neumann, "Towards end-to-end multilingual question answering", Information Systems Frontiers (ISF), vol. 22, pp. 1-14, 3 2020.
7.
A. Neves, A. Lamurias and F. Couto, "Biomedical question answering using extreme multi-label classification and ontologies in the multilingual panorama", Semantic Indexing and Information Retrieval for Health Held in conjunction with the 42nd European Conference on Information Retrieval (SIIRH@ECIR), 2020.
8.
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak and Z. Ives, "DBpedia: A nucleus for a web of open data" in The semantic web, Springer, 2007.
9.
K. Höffner, S. Walter, E. Marx, R. Usbeck, J. Lehmann and A.-C. Ngonga Ngomo, "Survey on challenges of question answering in the semantic web", Semantic Web, vol. 8, 11 2016.
10.
D. Sorokin and I. Gurevych, "Modeling semantics with gated graph neural networks for knowledge base question answering", arXiv preprint, 2018.
11.
D. Diefenbach, A. Both, K. Singh and P. Maret, "Towards a question answering system over the semantic web", Semantic Web, vol. 11, pp. 421-439, 2020.
12.
L. Siciliani, P. Basile, P. Lops and G. Semeraro, "MQALD: Evaluating the impact of modifiers in question answering over knowledge graphs", Semantic Web, vol. Pre-press, pp. 1-17, 09 2021.
13.
D. Vrandečić and M. Krötzsch, "Wikidata: A free collaborative knowledgebase", Communications of the ACM, vol. 57, no. 10, pp. 78-85, Sep. 2014.
14.
D. Keysers, N. Schärli, N. Scales, H. Buisman, D. Furrer, S. Kashubin, N. Momchev, D. Sinopalnikov, L. Stafiniak, T. Tihon et al., "Measuring compositional generalization: A comprehensive method on realistic data", arXiv preprint, 2019.
15.
K. Bollacker, C. Evans, P. Paritosh, T. Sturge and J. Taylor, "Free-base: a collaboratively created graph database for structuring human knowledge", Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1247-1250, 2008.
16.
Codes for the representation of names of languages - part 1: Alpha-2 code, Geneva, CH:International Organization for Standardization, Jul. 2002.
17.
R. Usbeck, M. Röder, A.-C. Ngonga Ngomo, C. Baron, A. Both, M. Brümmer, et al., "Gerbil: General entity annotator benchmarking framework", Proceedings of the 24th International Conference on World Wide Web, pp. 1133-1143, 2015.
18.
M. Burtsev, A. Seliverstov, R. Airapetyan, M. Arkhipov, D. Bay-murzina, N. Bushkov, et al., DeepPavlov: Open-source library for dialogue systems, Melbourne, Australia:Association for Computational Linguistics, pp. 122-127, 2018.
19.
T. P. Tanon, M. D. de Assuncao, E. Caron and F. M. Suchanek, "Demoing Platypus-a multilingual question answering platform for Wikidata", European Semantic Web Conference, pp. 111-116, 2018.
Contact IEEE to Subscribe

References

References is not available for this document.