Master of Science in Electrical Engineering (MSEE)
Electrical and Computer Engineering
In recent years, scientific applications have become increasingly data intensive. The increase in the size of data generated by scientific applications necessitates collaboration and sharing data among the nation's education and research institutions. To address this, distributed storage systems spanning multiple institutions over wide area networks have been developed. One of the important features of distributed storage systems is providing global unified name space across all participating institutions, which enables easy data sharing without the knowledge of actual physical location of data. This feature depends on the ``location metadata'' of all data sets in the system being available to all participating institutions. This introduces new challenges. In this thesis, we study different metadata server layouts in terms of high availability, scalability and performance. A central metadata server is a single point of failure leading to low availability. Ensuring high availability requires replication of metadata servers. A synchronously replicated metadata servers layout introduces synchronization overhead which degrades the performance of data operations. We propose an asynchronously replicated multi-master metadata servers layout which ensures high availability, scalability and provides better performance. We discuss the implications of asynchronously replicated multi-master metadata servers on metadata consistency and conflict resolution. Further, we design and implement our own asynchronous multi-master replication tool, deploy it in the state-wide distributed data storage system called PetaShare, and compare performance of all three metadata server layouts: central metadata server, synchronously replicated multi-master metadata servers and asynchronously replicated multi-master metadata servers.
Document Availability at the Time of Submission
Release the entire work immediately for access worldwide.
Akturk, Ismail, "Asynchronous replication of metadata across multi-master servers in distributed data storage systems" (2009). LSU Master's Theses. 4113.