Loading [MathJax]/extensions/MathMenu.js
A Rank-by-Feature Framework for Unsupervised Multidimensional Data Exploration Using Low Dimensional Projections | IEEE Conference Publication | IEEE Xplore

A Rank-by-Feature Framework for Unsupervised Multidimensional Data Exploration Using Low Dimensional Projections


Abstract:

Exploratory analysis of multidimensional data sets is challenging because of the difficulty in comprehending more than three dimensions. Two fundamental statistical princ...Show More

Abstract:

Exploratory analysis of multidimensional data sets is challenging because of the difficulty in comprehending more than three dimensions. Two fundamental statistical principles for the exploratory analysis are (1) to examine each dimension first and then find relationships among dimensions, and (2) to try graphical displays first and then find numerical summaries [1]. We implement these principles in a novel conceptual framework called the rank-by-feature framework. In the framework, users can choose a ranking criterion interesting to them and sort 1D or 2D axis-parallel projections according to the criterion. We introduce the rank-by-feature prism that is a color-coded lower-triangular matrix that guides users to desired features. Statistical graphs (histogram, boxplot, and scatterplot) and information visualization techniques (overview, coordination, and dynamic query) are combined to help users effectively traverse 1D and 2D axis-parallel projections, and finally to help them interactively find interesting features.
Date of Conference: 10-12 October 2004
Date Added to IEEE Xplore: 24 January 2005
Print ISBN:0-7803-8779-3
Print ISSN: 1522-404X
Conference Location: Austin, TX, USA
References is not available for this document.

1 Introduction

Multidimensional data sets are common in various data analysis applications; e.g., microarray data analysis, census data analysis, and market basket analysis. A data set that can be represented in a spreadsheet where there are more than three columns can be thought of as multidimensional. Without losing generality, we can assume that each column is a dimension (or a variable), and each row is a data item. Dealing with multidimensionality has been challenging for researchers in many disciplines due to the difficulty in comprehending more than three dimensions and the computational overhead.

Select All
1.
D. S. Moore and G. P. McCabe, Introduction to the Practice of Statistics, W.H. Freeman and Company, New York, NY, 3rd ed., 1999.
2.
T. Kohonen, Self-Organizing Maps, 3rd ed., Springer, New York, 2000.
3.
J. H. Friedman, "Exploratory Projection Pursuit," J. Am. Statistical Assoc., Vol. 82, No. 397, pp. 249-266, 1987.
4.
D. Asimov, "The Grand Tour: a Tool for Viewing Multidimensional Data," The SIAM Journal of Scientific and Statistical Computing, Vol. 6, No. 1, pp. 128-143, 1985.
5.
P. J. Huber, "Projection Pursuit," The Annals of Statistics, Vol. 13, No. 2, pp. 435-475, 1985.
6.
D. R. Cook, A. Buja, J. Cabtea, and H. Hurley, "Grand Tour and Projection Pursuit", Journal of Computational and Graphical Statistics, Vol. 23, pp. 225-250, 1995.
7.
M. O. Ward, "XmdvTool: Integrating Multiple Methods for Visualizing Multivariate Data," Proc. IEEE Visualization '94, pp. 326-336, 1994.
8.
D. Guo, "Coordinating Computational and Visual Approaches for Interactive Feature Selection and Multivariate Clustering," Information Visualization, Vol. 2, pp. 232-246, 2003.
9.
Spotfire DecisionSite, Spotfire., http://www.spotfire.com/
10.
H. Liu, H. Motoda, Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic Publishers, Boston, 1998.
11.
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications," Proc. SIGMOD'98, pp. 94-105, 1998.
12.
C. C. Aggarwal, C. Procopiuc, J. Wolf, P. Yu, and J.-S. Park, "Fast Algorithms for Projected Clustering," Proc. SIGMOD'99, pp. 61-72, 1999.
13.
M. Ankerst, S. Berchtold, and D.A. Keim, "Similarity Clustering of Dimensions for an Enhanced Visualization of Multidimensional Data," Proc. Int'l Symp. Information Visualization, pp. 52-60, 1998.
14.
M. Friendly, Corrgrams, "Exploratory Displays for Correlation Matrices," The American Statistician, Vol. 19, pp. 316-325, 2002.
15.
A. Inselberg and B. Dimsdale, "Parallel Coordinates: A Tool for Visualizing Multidimensional Geometry," Proc. Int'l Symp. Information Visualization, pp. 361-375, 1990.
16.
J. Seo and B. Shneiderman, "Interactively Exploring Hierarchical Clustering Results," IEEE Computer, Vol. 35, No. 7, pp. 80-86, 2002.
17.
J. Yang, W. Peng, M. O. Ward, and E. A. Rundensteiner, "Interactive Hierarchical Dimension Ordering, Spacing and Filtering for Exploration of High Dimensional Datasets", Proc. Int'l Symp. Information Visualization, pp 105-112, October 2003.

Contact IEEE to Subscribe

References

References is not available for this document.