LSU Doctoral Dissertations

Efficient Indexing for Structured and Unstructured Data

Identifier

etd-08182014-125357

Manish Madhukar Patil, Louisiana State University and Agricultural and Mechanical CollegeFollow

Degree

Doctor of Philosophy (PhD)

Department

Computer Science

Document Type

Dissertation

Abstract

The collection of digital data is growing at an exponential rate. Data originates from wide range of data sources such as text feeds, biological sequencers, internet traffic over routers, through sensors and many other sources. To mine intelligent information from these sources, users have to query the data. Indexing techniques aim to reduce the query time by preprocessing the data. Diversity of data sources in real world makes it imperative to develop application specific indexing solutions based on the data to be queried. Data can be structured i.e., relational tables or unstructured i.e., free text. Moreover, increasingly many applications need to seamlessly analyze both kinds of data making data integration a central issue. Integrating text with structured data needs to account for missing values, errors in the data etc. Probabilistic models have been proposed recently for this purpose. These models are also useful for applications where uncertainty is inherent in data e.g. sensor networks. This dissertation aims to propose efficient indexing solutions for several problems that lie at the intersection of database and information retrieval such as joining ranked inputs, full-text documents searching etc. Other well-known problems of ranked retrieval and pattern matching are also studied under probabilistic settings. For each problem, the worst-case theoretical bounds of the proposed solutions are established and/or their practicality is demonstrated by thorough experimentation.

Date

2014

Document Availability at the Time of Submission

Release the entire work immediately for access worldwide.

Recommended Citation

Patil, Manish Madhukar, "Efficient Indexing for Structured and Unstructured Data" (2014). LSU Doctoral Dissertations. 785.
https://repository.lsu.edu/gradschool_dissertations/785

Committee Chair

Shah, Rahul

DOI

10.31390/gradschool_dissertations.785

Download

Included in

Computer Sciences Commons

COinS

LSU Doctoral Dissertations

Efficient Indexing for Structured and Unstructured Data

Identifier

Degree

Department

Document Type

Abstract

Date

Document Availability at the Time of Submission

Recommended Citation

Committee Chair

DOI

Included in

Search

Browse

Author Corner

SPONSORED BY

LSU Doctoral Dissertations

Efficient Indexing for Structured and Unstructured Data

Identifier

Author

Degree

Department

Document Type

Abstract

Date

Document Availability at the Time of Submission

Recommended Citation

Committee Chair

DOI

Included in

Share

Search

Browse

Author Corner

SPONSORED BY