Conferences >2022 IEEE 16th International ...

QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is tru...Show More

Metadata

Abstract:

The ability to have the same experience for different user groups (i.e., accessibility) is one of the most important characteristics of Web-based systems. The same is true for Knowledge Graph Question Answering (KGQA) systems that provide the access to Semantic Web data via natural language interface. While following our research agenda on the multilingual aspect of accessibility of KGQA systems, we identified several ongoing challenges. One of them is the lack of multilingual KGQA benchmarks. In this work, we extend one of the most popular KGQA benchmarks - QALD-9 by introducing high-quality questions' translations to 8 languages provided by native speakers, and transferring the SPARQL queries of QALD-9 from DBpedia to Wikidata, s.t., the usability and relevance of the dataset is strongly increased. Five of the languages - Armenian, Ukrainian, Lithuanian, Bashkir and Belarusian - to our best knowledge were never considered in KGQA research community before. The latter two of the languages are considered as “endangered” by UNESCO. We call the extended dataset QALD-9-plus and made it available online¹¹Figshare: https://doi.org/10.6084/m9.figshare.16864273. GitHub: https://github.com/Perevalov/qald_9_plus.

Published in: 2022 IEEE 16th International Conference on Semantic Computing (ICSC)

Date of Conference: 26-28 January 2022

Date Added to IEEE Xplore: 23 March 2022

ISBN Information:

Print on Demand(PoD) ISSN: 2325-6516

DOI: 10.1109/ICSC52841.2022.00045

Conference Location: Laguna Hills, CA, USA

References is not available for this document.

Contents

I. Introduction

The core task of a Knowledge Graph Question Answering system is to represent a natural language question in the form of a structured query (e.g., SPARQL) to a knowledge graph (KG). In other words, KGQA systems provide access to the data in KGs via a natural-language user interface, s.t., end users are not required to learn a particular query language for fetching data manually. Obviously, the relevance (or accuracy) of the answers given by such system should strive to human performance and reduce labor costs for learning a particular query language; otherwise, the system is useless. Many researchers are aiming at measuring and increasing the Question Answering (QA) quality or the quality of a particular KGQA sub-tasks, such as named entity linking (e.g., [1]), expected answer type prediction (e.g., [2]), etc. However, the accessibility²

The accessibility for the Web is defined by W3C: https://www.w3.org/standards/webdesign/accessibility

characteristic of the KGQA systems often stays overlooked. In this context, the perfect accessibility denotes an equivalent experience to all user groups of a particular KGQA system. Hence, such research questions as: “How many people can really take advantage of the high-quality KGQA system?” and “Who are these people?” as well as “How diverse they are?” are often left unnoticeable.

Select All

D. Diefenbach, K. Singh, A. Both, D. Cherix, C. Lange and S. Auer, "The Qanary ecosystem: Getting new insights by composing question answering pipelines", Web Engineering - 17th International Conference ICWE 2017 Rome Italy June 5–82017 Proceedings, vol. 10360, pp. 171-189, 2017.

CrossRef Google Scholar

A. Perevalov and A. Both, "Augmentation-based answer type classification of the SMART dataset", Proceedings of the SeMantic AnsweR Type prediction task (SMART) at ISWC 2020, vol. 2774, pp. 1-9, 2020.

Google Scholar

R. Usbeck, R. H. Gusmita, A. N. Ngomo and M. Saleem, 9th challenge on question answering over linked data (QALD-9), 2018.

Google Scholar

I. Rybin, V. Korablinov, P. Efimov and P. Braslavski, "RuBQ 2.0: An innovated russian question answering dataset" in The Semantic Web, Cham:Springer International Publishing, pp. 532-547, 2021.

CrossRef Google Scholar

R. Cui, R. Aralikatte, H. Lent and D. Hershcovich, "Multilingual compositional Wikidata questions", arXiv, 2021.

Google Scholar

E. Loginova, S. Varanasi and G. Neumann, "Towards end-to-end multilingual question answering", Information Systems Frontiers (ISF), vol. 22, pp. 1-14, 3 2020.

Google Scholar

A. Neves, A. Lamurias and F. Couto, "Biomedical question answering using extreme multi-label classification and ontologies in the multilingual panorama", Semantic Indexing and Information Retrieval for Health Held in conjunction with the 42nd European Conference on Information Retrieval (SIIRH@ECIR), 2020.

Google Scholar

S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak and Z. Ives, "DBpedia: A nucleus for a web of open data" in The semantic web, Springer, 2007.

CrossRef Google Scholar

K. Höffner, S. Walter, E. Marx, R. Usbeck, J. Lehmann and A.-C. Ngonga Ngomo, "Survey on challenges of question answering in the semantic web", Semantic Web, vol. 8, 11 2016.

Google Scholar

10.

D. Sorokin and I. Gurevych, "Modeling semantics with gated graph neural networks for knowledge base question answering", arXiv preprint, 2018.

Google Scholar

11.

D. Diefenbach, A. Both, K. Singh and P. Maret, "Towards a question answering system over the semantic web", Semantic Web, vol. 11, pp. 421-439, 2020.

CrossRef Google Scholar

12.

L. Siciliani, P. Basile, P. Lops and G. Semeraro, "MQALD: Evaluating the impact of modifiers in question answering over knowledge graphs", Semantic Web, vol. Pre-press, pp. 1-17, 09 2021.

Google Scholar

13.

D. Vrandečić and M. Krötzsch, "Wikidata: A free collaborative knowledgebase", Communications of the ACM, vol. 57, no. 10, pp. 78-85, Sep. 2014.

CrossRef Google Scholar

14.

D. Keysers, N. Schärli, N. Scales, H. Buisman, D. Furrer, S. Kashubin, N. Momchev, D. Sinopalnikov, L. Stafiniak, T. Tihon et al., "Measuring compositional generalization: A comprehensive method on realistic data", arXiv preprint, 2019.

Google Scholar

15.

K. Bollacker, C. Evans, P. Paritosh, T. Sturge and J. Taylor, "Free-base: a collaboratively created graph database for structuring human knowledge", Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 1247-1250, 2008.

CrossRef Google Scholar

16.

Codes for the representation of names of languages - part 1: Alpha-2 code, Geneva, CH:International Organization for Standardization, Jul. 2002.

Google Scholar

17.

R. Usbeck, M. Röder, A.-C. Ngonga Ngomo, C. Baron, A. Both, M. Brümmer, et al., "Gerbil: General entity annotator benchmarking framework", Proceedings of the 24th International Conference on World Wide Web, pp. 1133-1143, 2015.

CrossRef Google Scholar

18.

M. Burtsev, A. Seliverstov, R. Airapetyan, M. Arkhipov, D. Bay-murzina, N. Bushkov, et al., DeepPavlov: Open-source library for dialogue systems, Melbourne, Australia:Association for Computational Linguistics, pp. 122-127, 2018.

Google Scholar

19.

T. P. Tanon, M. D. de Assuncao, E. Caron and F. M. Suchanek, "Demoing Platypus-a multilingual question answering platform for Wikidata", European Semantic Web Conference, pp. 111-116, 2018.

Google Scholar

References is not available for this document.

QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?