Doctor of Philosophy (PhD)


Engineering Science

Document Type



In this research, we focus on the application of reinforcement learning (RL) in automated agent tasks involving considerable target variability (i.e., characterized by stochastic distributions); in particular, learning of inspect/correct tasks. Examples include automated identification & correction of rivet failures in airplane maintenance procedures, and automated cleaning of surgical instruments in a hospital sterilization processing department. The location of defects and the corrective action to be taken for each varies from task episode. What needs to be learned are optimal stochastic strategies rather than optimization of any one single defect type and location. RL has been widely applied in robotics and autonomous agents research, but primarily for problems with relatively low variability compared to the task requirements overall.

We characterize the performance of RL at varying levels of variability in a grid world environment at different task complexity levels, and analyze RL performance problems seen during the experiments. The experiments revealed that the higher variability in the stochastic environments significantly reduces the RL agent's performance due to forgetting (or overwriting) effects as the most recent observation from the stochastic environment unduly influences learned behavior. Furthermore, we characterize the impact of variability on hyperparameter selection.

To help mitigate the impact of variability on RL performance, we developed a chain of -tables approach aimed at reducing the impact of subtask variability on other subtasks within a training episode. The performance of the chain of -tables approach was assessed against the original SARSA RL and the double SARSA approach. In high and very high variability cases, the chain of -tables approach outperforms the others in terms of the efficiency, accumulated reward, number of steps, and computational time.

An adaptive hyperparameter setting method was developed based on a sample variability metric. The approach quickly estimates the environmental variability and automatically sets appropriate hyperparameter values.



Committee Chair

Knapp, Gerald M.

Included in

Robotics Commons