Symbolic and Deep Learning Based Data Representation Methods for Activity Recognition and Image Understanding at Pixel Level
Doctor of Philosophy (PhD)
Efficient representation of large amount of data particularly images and video helps in the analysis, processing and overall understanding of the data. In this work, we present two frameworks that encapsulate the information present in such data. At first, we present an automated symbolic framework to recognize particular activities in real time from videos. The framework uses regular expressions for symbolically representing (possibly infinite) sets of motion characteristics obtained from a video. It is a uniform framework that handles trajectory-based and periodic articulated activities and provides polynomial time graph algorithms for fast recognition. The regular expressions representing motion characteristics can either be provided manually or learnt automatically from positive and negative examples of strings (that describe dynamic behavior) using offline automata learning frameworks. Confidence measures are associated with recognitions using Levenshtein distance between a string representing a motion signature and the regular expression describing an activity. We have used our framework to recognize trajectory-based activities like vehicle turns (U-turns, left and right turns, and K-turns), vehicle start and stop, person running and walking, and periodic articulated activities like digging, waving, boxing, and clapping in videos from the VIRAT public dataset, the KTH dataset, and a set of videos obtained from YouTube. Next, we present a core sampling framework that is able to use activation maps from several layers of a Convolutional Neural Network (CNN) as features to another neural network using transfer learning to provide an understanding of an input image. The intermediate map responses of a Convolutional Neural Network (CNN) contain information about an image that can be used to extract contextual knowledge about it. Our framework creates a representation that combines features from the test data and the contextual knowledge gained from the responses of a pretrained network, processes it and feeds it to a separate Deep Belief Network. We use this representation to extract more information from an image at the pixel level, hence gaining understanding of the whole image. We experimentally demonstrate the usefulness of our framework using a pretrained VGG-16 model to perform segmentation on the BAERI dataset of Synthetic Aperture Radar (SAR) imagery and the CAMVID dataset. Using this framework, we also reconstruct images by removing noise from noisy character images. The reconstructed images are encoded using Quadtrees. Quadtrees can be an efficient representation in learning from sparse features. When we are dealing with handwritten character images, they are quite susceptible to noise. Hence, preprocessing stages to make the raw data cleaner can improve the efficacy of their use. We improve upon the efficiency of probabilistic quadtrees by using a pixel level classifier to extract the character pixels and remove noise from the images. The pixel level denoiser uses a pretrained CNN trained on a large image dataset and uses transfer learning to aid the reconstruction of characters. In this work, we primarily deal with classification of noisy characters and create the noisy versions of handwritten Bangla Numeral and Basic Character datasets and use them and the Noisy MNIST dataset to demonstrate the usefulness of our approach.
Document Availability at the Time of Submission
Secure the entire work for patent and/or proprietary purposes for a period of one year. Student has submitted appropriate documentation which states: During this period the copyright owner also agrees not to exercise her/his ownership rights, including public use in works, without prior authorization from LSU. At the end of the one year period, either we or LSU may request an automatic extension for one additional year. At the end of the one year secure period (or its extension, if such is requested), the work will be released for access worldwide.
Karki, Manohar, "Symbolic and Deep Learning Based Data Representation Methods for Activity Recognition and Image Understanding at Pixel Level" (2017). LSU Doctoral Dissertations. 4298.