Date of Award

1993

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical and Computer Engineering

First Advisor

Suresh Rai

Abstract

This dissertation presents improved techniques for analyzing network-connected (NCF), 2-connected (2CF), task-based (TBF), and subcube (SF) functionality measures in a hypercube multiprocessor with faulty processing elements (PE) and/or communication elements (CE). These measures help study system-level fault tolerance issues and relate to various application modes in the hypercube. Solutions discussed in the text fall into probabilistic and deterministic models. The probabilistic measure assumes a stochastic graph of the hypercube where PE's and/or CE's may fail with certain probabilities, while the deterministic model considers that some system components are already failed and aims to determine the system functionality. For probabilistic model, MIL-HDBK-217F is used to predict PE and CE failure rates for an Intel iPSC system. First, a technique called CAREL is presented. A proof of its correctness is included in an appendix. Using the shelling ordering concept, CAREL is shown to solve the exact probabilistic NCF measure for a hypercube in time polynomial in the number of spanning trees. However, this number increases exponentially in the hypercube dimension. This dissertation, then, aims to more efficiently obtain lower and upper bounds on the measures. Algorithms, presented in the text, generate tighter bounds than had been obtained previously and run in time polynomial in the cube dimension. The proposed algorithms for probabilistic 2CF measure consider PE and/or CE failures. In attempting to evaluate deterministic measures, a hybrid method for fault tolerant broadcasting in the hypercube is proposed. This method combines the favorable features of redundant and non-redundant techniques. A generalized result on the deterministic TBF measure for the hypercube is then described. Two distributed algorithms are proposed to identify the largest operational subcubes in a hypercube $C\sb{n}$ with faulty PE's. Method 1, called LOS1, requires a list of faulty components and utilizes the CMB operator of CAREL to solve the problem. In case the number of unavailable nodes (faulty or busy) increases, an alternative distributed approach, called LOS2, processes m available nodes in O(mn) time. The proposed techniques are simple and efficient.

Pages

201

Share

COinS