Doctor of Philosophy (PhD)


Electrical and Computer Engineering

Document Type



My primary objective in this dissertation is to establish a framework under which I launch a systematic study of the fundamental tradeoff between deliverable and private information in statistical inference. My research was partly motivated by arising and prevailing privacy concerns of users in many machine learning problems.

In this dissertation, I begin by introducing examples where I am concerned of privacy leakage versus decision utility in statistical inference problems. I then go into further details about what I have achieved in formulating and solving such problems using information theory related metrics in a variety of settings. Both related works and my own results are later summarized in the first chapter.

In the second chapter, I introduce a problem of detecting any subgraph using binary codeword queries. Furthermore, I seek and find limits imposed by the privacy of each graph which help me develop an understanding of privacy versus utility problems.

In the third chapter, I shift my focus from the original graphical framework to a more general bin allocation problem motivated by addressing concerns on privacy leakage in regard to users’ web surfing patterns with usage of proxy or VPN services. After problem formulation, I deem it necessary to introduce submodular functions as a means of simplifying such problems and finding their solutions.

In chapter four, I expand upon the concept introduced in chapter three by allowing uncertainty between hypotheses and find the relationship between distinguishability, privacy leakage and utility in a deterministic bin allocation framework.

In chapters five and six, motivated by my previous works, I shift my focus to the problem of tradeoff between utility and leaked information when a randomization, rather than a deterministic mapping, is introduced as a privacy protecviii tion mechanism. In particular, I first seek solutions using a typical and widely accepted Information Bottleneck (IB) approach. I then detail how the original information bottleneck method does not necessarily provide an optimal solution to the proposed problem. I then offer my own novel approach based upon Augmented Lagrange Multipliers (ALM) and Alternating Direction Method of Multipliers (ADMM) with both theoretical justification and empirical evidence , as well as the inherent structures of both the objective function and privacy constraints. My approach has been shown to attain notable improvements than that under the IB framework, with well justified enhancement on efficiency of local convergence.

Finally in chapter seven, I present plans to cope with issues of lacking true statistics, by exploiting a set of information theoretical measures which have been shown to be equipped with more benign properties in robustness against limited amount of training data than the regular mutual information measure.



Committee Chair

Wei, Shuangqing