Degree

Doctor of Philosophy (PhD)

Department

Division of Computer Science and Engineering

Document Type

Dissertation

Abstract

Due to significant advancements in experimental and computational techniques, materials data are abundant. To facilitate data-driven research, it calls for a system for managing and sharing data and supporting a set of tools for effective data analysis and modeling. Generally, a given material property M can be considered as a multivariate data problem. The dimensions of M are the values of the property itself, the conditions (pressure P, temperature T, and multi-component composition X) that control the concerned property, and relevant metadata I (source, date).

Here we present a comprehensive database considering both experimental and computational sources and an innovative visual analytics system for melt viscosity (η), which can be represented by M (η, P, T, X1, X2, …, I1, I2, …). We implemented the parallel coordinates plot (PCP) method by introducing new non-standard features, such as derived axes/sub-axes, dimension merging, binary scaling, and nested plots. Thus enhanced PCP offers many insights of relevance to underlying physics, data modeling, and guiding future experiments/computations.

The construction of viscosity models is a non-trivial process, and extant models are often limited to a sub-parameter space, such as the ambient pressure conditions. To develop a generalized model which applies to wider parameter space, we trained various machine learning models, including neural network, Decision Tree, Random Forest, and XGBoost. We evaluated model performance based on loss function, error distribution, and model continuity.

Our results show that neural network models outperformed the physics-based models as well as all tree-based models. A small neural network with two hidden layers, each containing 64 nodes, was found to be sufficient to model both the ambient pressure and complete dataset. Despite a marginal decrease in RMSE, a larger neural network consisting of four hidden layers with 128 nodes in each layer could provide an even better fit for the complete dataset in terms of model continuity and error distribution. Tree-based models could follow the training data, but the model results show high variations with small changes in parameter space, making them less applicable for continuous numerical data. Our data visualization and modeling approach is expected to be useful to researchers who explore and model material data, for instance, the density property can be incorporated as a new attribute in our system.

Date

1-12-2023

Committee Chair

Karki, Bijaya B.

DOI

10.31390/gradschool_dissertations.6036

Share

COinS