Loading [MathJax]/extensions/MathMenu.js
HITSnDIFFs: From Truth Discovery to Ability Discovery by Recovering Matrices with the Consecutive Ones Property | IEEE Conference Publication | IEEE Xplore

HITSnDIFFs: From Truth Discovery to Ability Discovery by Recovering Matrices with the Consecutive Ones Property


Abstract:

We analyze a general problem in a crowd-sourced setting where one user asks a question (also called item) and other users return answers (also called labels) for this que...Show More

Abstract:

We analyze a general problem in a crowd-sourced setting where one user asks a question (also called item) and other users return answers (also called labels) for this question. Different from existing crowd sourcing work which focuses on finding the most appropriate label for the question (the “truth”), our problem is to determine a ranking of the users based on their ability to answer questions. We call this problem “ability discovery” to emphasize the connection to and duality with the more well-studied problem of “truth discovery”. To model items and their labels in a principled way, we draw upon Item Response Theory (IRT) which is the widely accepted theory behind standardized tests such as SAT and GRE. We start from an idealized setting where the relative performance of users is consistent across items and better users choose better fitting labels for each item. We posit that a principled algorithmic solution to our more general problem should solve this ideal setting correctly and observe that the response matrices in this setting obey the Consecutive Ones Property (C1P). While C1P is well understood algorithmically with various discrete algorithms, we devise a novel variant of the HITS algorithm which we call “HITSnDIFFs” (or HnD), and prove that it can recover the ideal C1P-permutation in case it exists. Unlike fast combinatorial algorithms for finding the consecutive ones permutation (if it exists), HnD also returns an ordering when such a permutation does not exist. Thus it provides a principled heuristic for our problem that is guaranteed to return the correct answer in the ideal setting. Our experiments show that HnD produces user rankings with robustly high accuracy compared to state-of-the-art truth discovery methods. We also show that our novel variant of HITS scales better in the number of users than ABH, the only prior spectral C1P reconstruction algorithm.
Date of Conference: 13-16 May 2024
Date Added to IEEE Xplore: 23 July 2024
ISBN Information:

ISSN Information:

Conference Location: Utrecht, Netherlands

Funding Agency:

References is not available for this document.

I. Introduction

Motivation. We first present a couple of examples from a class to a more general crowdsourcing context.

Select All
1.
"Amazon mechanical turk", 2023, [online] Available: https.//www.mturk.com/.
2.
"Piazza", 2023, [online] Available: https://piazza.com/.
3.
W. E. Arnoldi, "The principle of minimized iterations in the solution of the matrix eigenvalue problem", Quarterly of applied mathematics, vol. 9, no. 1, pp. 17-29, 1951.
4.
J. E. Atkins, E. G. Boman and B. Hendrickson, "A spectral algorithm for seriation and the consecutive ones problem", SIAM Journal on Computing, vol. 28, no. 1, pp. 297-310, 1998.
5.
R. D. Bock, "Estimating item parameters and latent ability when responses are scored in two or more nominal categories", Psychometrika, vol. 37, no. 1, pp. 29-51, 1972.
6.
K. S. Booth and G. S. Lueker, "Testing for the consecutive ones property interval graphs and graph planarity using PQ-tree algorithms", Journal of Computer and System Sciences, vol. 13, no. 3, pp. 335-379, 1976.
7.
C. Chai, G. Li, J. Li, D. Deng and J. Feng, "Cost-effective crowdsourced entity resolution: A partial-order approach", SIGMOD, pp. 969-984, 2016.
8.
Z. Chen, S. Mitra, R. Ravi and W. Gatterbauer, "HITSnDIFFs: From truth discovery to ability discovery by recovering matrices with the consecutive ones property: Code and experiments", 2023, [online] Available: https://github.com/northeastern-datalab/HITSnDIFFs/.
9.
HITSnDIFFs: From truth discovery to ability discovery by recovering matrices with the consecutive ones property (extended version), 2023, [online] Available: https://arxiv.org/abs/2401.00013.
10.
J. K. Cullum and R. A. Willoughby, Lanczos algorithms for large symmetric eigenvalue computations: Vol. I: Theory. SIAM, 2002, [online] Available: https://doi.org/10.1137/1.9780898719192.
11.
N. Dalvi, A. Dasgupta, R. Kumar and V. Rastogi, "Aggregating crowdsourced binary ratings", WWW, pp. 285-294, 2013.
12.
A. P. Dawid and A. M. Skene, "Maximum likelihood estimation of observer error-rates using the EM algorithm", Applied statistics, pp. 20-28, 1979.
13.
X. L. Dong, L. Berti-Equille, Y. Hu and D. Srivastava, "Global detection of complex copying relationships between sources", Proceedings of the VLDB Endowment, vol. 3, no. 1–2, pp. 1358-1369, 2010.
14.
X. L. Dong, L. Berti-Equille and D. Srivastava, "Integrating conflicting data: the role of source dependence", Proceedings of the VLDB Endowment, vol. 2, no. 1, pp. 550-561, 2009.
15.
X. L. Dong, B. Saha and D. Srivastava, "Less is more: Selecting sources wisely for integration", Proceedings of the VLDB Endowment, vol. 6, no. 2, pp. 37-48, 2012.
16.
J. Fan, G. Li, B. C. Ooi, K. Tan and J. Feng, "icrowd: An adaptive crowdsourcing framework", SIGMOD, pp. 1015-1030, 2015.
17.
M. Fiedler, "Laplacian of graphs and algebraic connectivity", Banach Center Publications, vol. 25, no. 1, pp. 57-70, 1989.
18.
T. Finin, W. Murnane, A. Karandikar, N. Keller, J. Martineau and M. Dredze, "Annotating named entities in twitter data with crowdsourcing", NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pp. 80-88, 2010.
19.
M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh and R. Xin, "CrowdDB: answering queries with crowdsourcing", SIGMOD, pp. 61-72, 2011.
20.
G. Frobenius, "Uber matrizen aus nicht negativen elementen", Sitzungs-berichte der Königlich Preussischen Akademie der Wissenschaften, pp. 456-477, 1912.
21.
D. R. Fulkerson and O. A. Gross, "Incidence matrices and interval graphs", Pacific Journal of Mathematics, vol. 15, no. 3, pp. 835-855, 1965.
22.
A. Ghosh, S. Kale and P. McAfee, "Who moderates the moderators?: crowdsourcing abuse detection in user-generated content", EC, pp. 167-176, 2011.
23.
J. Hauschild and F. Pollmann, "Efficient numerical simulations with Tensor Networks: Tensor Network Python (TeNPy)", SciPost Phys. Lect. Notes, pp. 5, 2018.
24.
H. Hu, Y. Zheng, Z. Bao, G. Li, J. Feng and R. Cheng, "Crowdsourced POI labelling: Location-aware result inference and task assignment", ICDE, pp. 61-72, 2016.
25.
Y. Hu and J. Scott, "A fast multilevel fiedler and profile reduction code", RAL-TR-2003–36 2003.
26.
H. Kajino, Y. Tsuboi and H. Kashima, "A convex formulation for learning from crowds", AAAI, pp. 73-79, 2012.
27.
D. Kendall, "Incidence matrices interval graphs and seriation in archeology", Pacific Journal of mathematics, vol. 28, no. 3, pp. 565-570, 1969.
28.
N. M. Kingston and N. J. Dorans, "The feasibility of using item response theory as a psychometric model for the GRE aptitude test", ETS Research Report Series, vol. 1982, no. 1, 1982.
29.
J. M. Kleinberg, "Authoritative sources in a hyperlinked environment", J. ACM, vol. 46, no. 5, pp. 604-632, 1999.
30.
C. Lanczos, "An iteration method for the solution of the eigenvalue problem of linear differential and integral operators", 1950, [online] Available: https://doi.org/DOI:10.6028/JRES.045.026.

Contact IEEE to Subscribe

References

References is not available for this document.