Fast Computation of Persistent Homology with Data Reduction and Data Partitioning | IEEE Conference Publication | IEEE Xplore

Fast Computation of Persistent Homology with Data Reduction and Data Partitioning


Abstract:

Persistent homology is a method of data analysis that is based in the mathematical field of topology. Unfortunately, the run-time and memory complexities associated with ...Show More

Abstract:

Persistent homology is a method of data analysis that is based in the mathematical field of topology. Unfortunately, the run-time and memory complexities associated with computing persistent homology inhibit general use for the analysis of big data. For example, the best tools currently available to compute persistent homology can process only a few thousand data points in ℝ3. Several studies have proposed using sampling or data reduction methods to attack this limit. While these approaches enable the computation of persistent homology on much larger data sets, the methods are approximate. Furthermore, while they largely preserve the results of large topological features, they generally miss reporting information about the small topological features that are present in the data set. While this abstraction is useful in many cases, there are data analysis needs where the smaller features are also significant (e.g., brain artery analysis). This paper explores a combination of data reduction and data partitioning to compute persistent homology on big data that enables the identification of both large and small topological features from the input data set. To reduce the approximation errors that typically accompany data reduction for persistent homology, the described method also includes a mechanism of “upscaling” the data circumscribing the large topological features that are computed from the sampled data. The designed experimental method provides significant results for improving the scale at which persistent homology can be performed.
Date of Conference: 09-12 December 2019
Date Added to IEEE Xplore: 24 February 2020
ISBN Information:
Conference Location: Los Angeles, CA, USA

I. Introduction

We live in the age of data. Everyday, massive volumes of data are analyzed to extract meaningful information. This task is generally referred to as data analysis or data mining. Data analysis has grown over the past few decades to be a vast and interdisciplinary field of study encompassing statistics, mathematics, and computer science. Numerous methods have been developed to analyze large and complex data sets to extract useful knowledge. An emerging method of data analysis is based in the mathematical field of topology. Topology is the study of the properties of space that are preserved under certain types of deformations [1]. Over the last 15 years, substantial efforts have been put together to use topological methods for solving problems related to large and complicated data sets. This gave birth to a field of study called Topological Data Analysis (TDA) [2] –[6]. The fundamental idea is that topological methods can be used to study patterns or shapes that are preserved despite the presence of noise and variations in the data. The ability of TDA to identify shapes under certain deformations renders it immune to noise and leads to discovering properties of data that are not discernible by conventional methods of data analysis [3], [4].

Contact IEEE to Subscribe

References

References is not available for this document.