Semester of Graduation

Summer 2022

Degree

Master of Science in Computer Science (MSCS)

Department

Computer Science

Document Type

Thesis

Abstract

The field of stringology studies algorithms and data structures used for processing strings efficiently. The goal of this thesis is to investigate 2-dimensional (2D) variants of some fundamental string problems, including \textit{Exact Pattern Matching} and \textit{Longest Common Substring}.

In the 2D pattern matching problem, we are given a matrix $\M[1\dd n,1\dd n]$ that consists of $N = n \times n$ symbols drawn from an alphabet $\Sigma$ of size $\sigma$. The query consists of a $ m \times m$ square matrix $\PP[1\dd m, 1\dd m]$ drawn from the same alphabet, and the task is to find all the locations of $\PP$ in $\M$. For such square patterns, data structures such as suffix trees and suffix arrays exist for the task of efficient pattern matching. However, a suffix tree occupies $O(N \log N)$ bits, which is significantly more than that of the original text's size of $N\log \sigma$ bits. Therefore, the design of compressed data structures, that supports pattern matching queries efficiently and occupies space close to the original text's size, is imperative. In this thesis, we show an interesting result by designing a compact text index of size $O(N \log\log N + N \log\sigma)$ bits that at least supports efficient inverse suffix array queries. Although, the question of designing a compressed text index that would lead to efficient pattern matching is still evasive, this index gives a hope on the existence of a full 2D compressed text index with all functionalities similar to that of 1D case.

On the other hand, the Longest Common 2D substring problem consists of two 2D strings (matrices), and the task is to report the size of the longest common 2D substring (submatrix) of these 2D strings. It is interesting to know if there exists a sublinear-time algorithm for solving this task. We answer this question positively by presenting a sublinear-time \textit{quantum} algorithm. In addition to this, we prove that any quantum algorithm requires at least $\tilde{\Omega}(N^{2/3})$ time to solve this problem.

Date

7-26-2022

Committee Chair

Shah, Rahul

DOI

10.31390/gradschool_theses.5642

Share

COinS