Conferences >2022 IEEE International Confe...

Efficient Data Analytics on Augmented Similarity Triplets

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Data analysis requires a pairwise proximity measure over objects. Recent work has extended this to situations where the distance information between objects is given as c...Show More

Metadata

Abstract:

Data analysis requires a pairwise proximity measure over objects. Recent work has extended this to situations where the distance information between objects is given as comparison results of distances between three objects (triplets). Humans find comparison tasks much easier than the exact distance computation, and such data can be easily obtained in big quantities via crowdsourcing. In this work, we propose triplets augmentation, an efficient method to extend the triplets data by inferring the hidden implicit information from the existing data. Triplets augmentation improves the quality of kernel-based and kernel-free data analytics. We also propose a novel set of algorithms for common data analysis tasks based on triplets. These methods work directly with triplets and avoid kernel evaluations, thus are scalable to big data. We demonstrate that our methods outperform the current best-known techniques and are robust to noisy data.

Published in: 2022 IEEE International Conference on Big Data (Big Data)

Date of Conference: 17-20 December 2022

Date Added to IEEE Xplore: 26 January 2023

ISBN Information:

DOI: 10.1109/BigData55660.2022.10021104

Conference Location: Osaka, Japan

Contents

I. Introduction

To extract knowledge from data, it is generally assumed that input data is drawn from a vector space with a well-defined p airwise p roximity m easure [ 3]. H owever, b ig data comes in a wide variety; and, in many cases (e.g., sequences, images, and text), the data objects are not given as feature vectors. Such data needs to be mapped to a meaningful feature space, where vector-space-based data mining algorithms can be employed. Many data mining and machine learning methods, such as support vector machines (SVM), kernel-PCA, and agglomerative clustering, do not explicitly require input data as feature vectors; instead, they only utilize the pairwise distance information [2]. Moreover, the choice of a distance or similarity measure is often arbitrary and does not necessarily capture the inherent similarity between objects.

References is not available for this document.

MIT Libraries

MIT Libraries

Efficient Data Analytics on Augmented Similarity Triplets

Abstract:

Metadata

Abstract:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

MIT Libraries

MIT Libraries

Efficient Data Analytics on Augmented Similarity Triplets

Alerts

Abstract:

Metadata

Abstract:

I. Introduction

References