Abstract:
Signature files have been studied extensively, as an access method for textual databases. Many approaches have been proposed for searching signatures files efficiently. H...Show MoreMetadata
Abstract:
Signature files have been studied extensively, as an access method for textual databases. Many approaches have been proposed for searching signatures files efficiently. However, different methods make different assumptions and use different performance measures, making it difficult to compare their performance. In this paper, we study three basic methods proposed in the literature, namely, the indexed descriptor file, the two-level superimposed coding scheme, and the partitioned signature file approach. The contribution of this paper is two-fold. First, we present a uniform analytical performance model so that the methods can be compared fairly and consistently. The analysis shows that the two-level superimposed coding scheme, if stored in a transposed file, has the best performance. Second, we extend the two-level superimposed coding method into a multilevel superimposed coding method, we obtain the optimal number of levels for the multilevel method and show that for databases with reasonable size the optimal value is much larger than 2, which is assumed in the two-level method. The accuracy of the analytical formula is demonstrated by simulation.<>
Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 7, Issue: 3, June 1995)
DOI: 10.1109/69.390248
References is not available for this document.
Select All
1.
S. Christodoulakis and C. Faloutsos,, "Design considerations for a message file server", IEEE Trans. Software Eng., vol. 10, no. 2, pp. 201-210, Mar. 1984.
2.
S. Christodoulakis,, M. Theodoridou,, F. Ho,, M. Papa, and A. Pathria,, "Multimedia document presentation information extraction and documentformation in MINOS: A model and a system", ACM Trans Office Information Systems, vol. 4, no. 4, pp. 345-383, Oct. 1986.
3.
U. Deppisch,, "S-Tree: A dynamic balanced signature index for office retrieval", Proc. ACM Conf. Research and Development Information Retrieval, pp. 77-87, 1986-Sept.
4.
C. Faloutsos,, "Signature-based text retrieval methods: A survey", Data Engineering Bulletin, vol. 13, no. 1, pp. 25-32, Mar. 1990.
5.
C. Faloutsos,, "Signature files: Design and performance comparison of some signatureextraction methods", Proc. 1985 SIGMOD Conf., pp. 63-82, 1985-May,.
6.
C. Faloutsos and S. Christodoulakis,, "Description and performance analysis of signature file methods for officefiling", ACM Trans. Office Information Systems, vol. 5, no. 3, pp. 237-257, July 1987.
7.
C. Faloutsos and S. Christodoulakis,, "Optimal signature extraction and information loss", ACM Trans. Database Systems, vol. 12, no. 3, pp. 395-428, Sept. 1987.
8.
D.L. Lee and C.-W. Leng,, "Partitioned signature files: Design issues and performance evaluation", ACM Trans Information Systems, vol. 7, no. 2, pp. 158-180, Apr. 1989.
9.
D.L. Lee and C.-W. Leng,, "A partitioned signature file structure for multiattribute and text retrieval", Proc. Sixth Intl Conf. Data Engineering, pp. 389-397, 1990-Feb.
10.
D.J. Lipman and W.R. Pearson,, "Rapid and sensitive protein similarity searches", Science, vol. 227, pp. 1,435-1,441, Mar., 1985.
11.
J.L. Pfaltz,, W.J. Berman, and E.M. Cagley,, "Partial match retrieval using indexed descriptor files", Comm. ACM, vol. 23, no. 9, pp. 522-528, Sept., 1980.
12.
R. Sacks-Davis,, A. Kent, and K. Ramamohanarao,, "Multikey access methods based on superimposed coding techniques", ACM Trans. Database Systems, vol. 12, no. 4, pp. 655-696, Dec. 1987.
13.
R. Sacks-Davis, and K. Ramamohanarao,, "A two-level superimposed coding scheme for partial match retrieval", Information Systems, vol. 8, no. 4, pp. 273-280, 1983.
14.
C. Stanfill and B. Kahle,, "Parallel free-text search on the connection machine system", Comm. ACM, vol. 29, no. 12, pp. 1,229-1,239, Dec. 1986.
15.
W.W. Chang and H.J. Schek,, "A signature access method for the Starburst database system", Proc. 15th Intl Conf. Very Large Data Bases, pp. 145-153, 1989-Aug.
16.
C. Faloutsos,, "Access methods for text", ACM Computing Surveys, vol. 17, no. 1, pp. 49-74, Mar. 1985.
17.
C. Faloutsos and S. Christodoulakis,, "Signature files: An access method for documents and its analytical performanceevaluation", ACM Trans. Office Information Systems, vol. 2, no. 4, pp. 267-288, Oct. 1984.
18.
D.L. Lee,, "A word-parallel bit-serial signature processor for superimposed coding", Proc. Second Intl Conf. Data Engineering, pp. 352-359, 1986-Feb.
19.
C.W. Leng and D.L. Lee,, "Optimal weight assignment for signature generation", ACM Trans Database Systems, vol. 17, no. 2, pp. 346-373, June, 1992.
20.
C.S. Roberts,, "Partial-match retrieval via method of superimposed codes", Proc. IEEE, vol. 67, no. 12, pp. 1,624-1,642, Dec. 1979.
21.
B. Yuwono and D.L. Lee,, "A backend text retrieval machine for signature-based document ranking", Proc. Intl Conf. Computing and Information (ICCI 91), pp. 288-297, 1991-May.