Date of Award

1986

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Abstract

There are three topics discussed in this work. The first topic is an investigation of the topological properties of the p-norm model of Salton, Fox, and Wu. It is shown that certain properties of the p-norm model that one would expect to hold, given the topological origin of the model, do not in fact hold. These properties include the ability to change the query by changing p, and the ability to adequately separate documents. Since these properties do hold in the model as actually constructed, it must be that the properties do not follow from the topological origin of the model. The second topic is a search for a usable model with an adequate theoretical basis. In order to construct such a model, the topological paradigm is defined. This paradigm establishes a minimal set of requirements that any system with a topological foundation should have. A particular example of the paradigm, the Topological Information Retrieval System (TIRS), is constructed. It is shown that all of the desired properties of the p-norm model hold for the TIRS model. A discussion of the various query systems that may be used with TIRS is given. These query systems include a natural language interface and a weighted boolean query system, as well as two specialized interfaces. The weighted boolean query system has the property that pairs, when treated as units, have all of the properties of the non-weighted boolean lattice. The run time of the system is estimated, once for an inverted file implementation, and once for an implementation using kd-trees. These run times are much better than for traditional systems. The third topic is a reexamination of the standard models of information retrieval, considered as cases of the topological paradigm. The paradigm is shown to be a unifying model, in that all of the standard models, i.e., the boolean, vector space, fuzzy set theoretic, and probabilistic models, as well as a hierarchical model, are shown to be instances of the paradigm. An appendix contains a review of relevant topics from topology and abstract algebra.

Pages

93

Share

COinS