Identifier

etd-11142012-040550

Degree

Master of Science in Engineering Science (MSES)

Department

Engineering Science (Interdepartmental Program)

Document Type

Thesis

Abstract

Rhetorical Structure Theory (Mann et al. 1988), a popular approach for analyzing discourse coherence, suggests that coherent text can be placed into a hierarchical organization of clauses. Identification of a text’s rhetorical structure through automatic discourse analysis is a crucial element for many of today’s Natural Language Processing tasks, but no sufficient tool is available. The current state-of -the-art discourse parser, SPADE (Soricut et al. 2003), is limited to parsing discourse within a single sentence. HILDA (Hernault et al. 2010) extends the parsing abilities of SPADE to the document level, but with a decrease in performance. This study achieved document-level discourse parsing without sacrificing performance. Provided text was already segmented into elementary discourse units, the task of discourse parsing was separated into three steps: structuring, nuclearity labeling, and relation labeling. An algorithm was developed for classifying relation existence, nuclearity, and relation label that improved upon previous methods. New features were explored for all three steps to maintain state-of-the-art performance when parsing at the document-level.

Date

2012

Document Availability at the Time of Submission

Student has submitted appropriate documentation to restrict access to LSU for 365 days after which the document will be released for worldwide access.

Committee Chair

Knapp, Gerald

Share

COinS