Master of Science in Computer Science (MSCS)
Indexing the genome is the basis for many of the bioinformatics applications. Read mapping (sequence alignment) is one such application to align millions of short reads against reference genome. Several tools like BLAST, SOAP, BOWTIE, Cloudburst, and Rapid Parallel Genome Indexing with MapReduce use indexing technique for aligning short reads. Many of the contemporary alignment techniques are time consuming, memory intensive and cannot be easily scaled to larger genomes. Suffix tree is a popular data structure which can be used to overcome the demerits of other alignment techniques. However, constructing the suffix tree is highly memory intensive and time consuming. In this thesis, a MapReduce based parallel construction of the suffix tree is proposed. The performance of the algorithm is measured on the hadoop framework over commodity cluster with each node having 8GB of primary memory. The results show a significantly less time for constructing suffix tree for a big data like human genome.
Document Availability at the Time of Submission
Secure the entire work for patent and/or proprietary purposes for a period of one year. Student has submitted appropriate documentation which states: During this period the copyright owner also agrees not to exercise her/his ownership rights, including public use in works, without prior authorization from LSU. At the end of the one year period, either we or LSU may request an automatic extension for one additional year. At the end of the one year secure period (or its extension, if such is requested), the work will be released for access worldwide.
Satish, Umesh Chandra, "Parallel Suffix Tree Construction for Genome Sequence Using Hadoop" (2013). LSU Master's Theses. 1665.