Start Over

Exploiting Network Science for Feature Extraction and Representation Learning

Authors :: Kartikeya Bhardwaj
Publication Year :: 2019
Publisher :: Carnegie Mellon University, 2019.
Abstract: Networks are ubiquitous for many real-world problems such as modeling information diffusion over social networks, transportation systems, understanding protein-proteininteractions, human mobility, computational sustainability, among many others. Recently, due to the ongoing Big Data revolution, the fields of machine learning and Artificial Intelligence (AI) have also become extremely important, with AI mostly being dominated by representation learning techniques such as deep learning. However, research at the intersection of network science, machine learning and AI has been mostly unexplored. Specifically, most of the prior research focuses on how machine learning techniques can be used to solve “network” problems such as predicting information diffusion on social networks or classifying blogger interests in a blog network, etc. On the contrary, in this thesis, we answer the following key question: How canwe exploit network science to improve machine learning and representation learning models when addressing general problems? To answer the above question, we address several problems at the intersection of network science, machine learning, and AI. Specifically, we address four fundamental research challenges: (i) Network Science for Traditional Machine Learning, (ii) Representation Learning for Small-Sample Datasets, (iii) Network Science-BasedDeep Learning Model Compression, and (iv) Network Science for Neural Architecture Space Exploration. In other words, we show that many problems are governed by latent network dynamics which must be incorporated into the machine learning or representation learning models.To this end, we first demonstrate how network science can be used for traditional machine learning problems such as spatiotemporal timeseries prediction and application-specific feature extraction. More precisely, we propose a new framework called Network-of-Dynamic Bayesian Networks (NDBN) to address a complex probabilistic learning problem over networks with known but rapidly changing structure. We also propose a new domain-specific network inference approach when the network structure is unknown and only the high-dimensional data is available. We further introducea new network science-based, application-specific feature extraction method called K-Hop Learning. As concrete case studies, we show that both NDBN framework and K-Hop Learning significantly outperform traditional machine learning techniques for computational sustainability problems such as short-term solar energy and river flow prediction, respectively. We then discuss how network science can be used to address general representationlearning problems with high-dimensional and small-sample datasets. Here, we propose a new network community-based dimensionality reduction framework calledFeatureNet. Our approach is based on a new correlations-based network construction technique that explicitly discovers hidden communities in high-dimensional raw data.We show the effectiveness of FeatureNet on many diverse small-sample problems as deep learning typically overfits for such problems. We demonstrate that our techniqueachieves significantly higher accuracy than ten state-of-the-art dimensionality reduction methods (up to 40% improvement) for the small-sample problems. Since a simple correlations-based network alone cannot capture meaningful features for problems like image classification, we focus on deep learning models like Convolutional Neural Networks (CNN). Indeed, in the era of Internet-of-Things (IoT),computational costs of deep networks have become a critical challenge for deploying such models on resource-constrained edge devices. Towards this, model compressionhas emerged as an important area of research. However, when a computationally expensive CNN (or even a compressed model) cannot fit within the memory-budgetof a single IoT-device, it must be distributed across multiple devices which leads to significant inter-device communication. To alleviate the above problem, we propose a new model compression framework called the Network-of-Neural Networks (NoNN) which first exploits network science to partition a large “teacher” model’s knowledge into disjoint groups and then trains individual “student” models for each group. This results in a set of student moduleswhich satisfy the strict resource-constraints of individual IoT-devices. Extensive experiments on five well-known image classification tasks show that NoNN achieves similar accuracy as the teacher model and significantly outperforms the prior art. We also deploy our proposed framework on real hardware such as Raspberry Pi’s and Odroids to demonstrate that NoNN results in up to 12 reduction in latency, and up to 14 reduction in energy per device with negligible loss of accuracy. Finally, since deep networks are essentially a network of (artificial) neurons, networkscience is a perfect candidate to study their architectural characteristics. Hence, we model deep networks from a network science perspective to identify which architecture level characteristics enable models with different number of parameters and layers to achieve comparable accuracy. To this end, we propose new metrics called NN-Massand NN-Density to study the architecture design space of deep networks. We further theoretically demonstrate that (i) For a given depth and width, CNN architectures withhigher NN-Mass achieve lower generalization error, and (ii) Irrespective of number of parameters and layers (but same width), models with similar NN-Mass yield similar test accuracy. We then present extensive empirical evidence towards the above two theoretical insights by conducting experiments on real image classification tasks suchas CIFAR-10 and CIFAR-100. Lastly, we exploit the latter insight to directly design efficient architectures which achieve comparable accuracy to large models ( 97%on CIFAR-10 dataset) with up to 3 reduction in total parameters. This ultimately reveals how model sizes can be reduced directly from the architecture perspective.In summary, in this thesis, we address several problems at the intersection of network science, machine learning, and representation learning. Our research comprehensivelydemonstrates that network science can not only play a significant role but also lead to excellent results in both machine learning and representation learning.