Author: "Klabjan, Diego" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Klabjan, Diego"' showing total 618 results

Start Over Author "Klabjan, Diego"

618 results on '"Klabjan, Diego"'

51. The Impact of the Mini-batch Size on the Variance of Gradients in Stochastic Gradient Descent

Author: Qian, Xin and Klabjan, Diego
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning
Abstract: The mini-batch stochastic gradient descent (SGD) algorithm is widely used in training machine learning models, in particular deep learning models. We study SGD dynamics under linear regression and two-layer linear networks, with an easy extension to deeper linear networks, by focusing on the variance of the gradients, which is the first study of this nature. In the linear regression case, we show that in each iteration the norm of the gradient is a decreasing function of the mini-batch size $b$ and thus the variance of the stochastic gradient estimator is a decreasing function of $b$. For deep neural networks with $L_2$ loss we show that the variance of the gradient is a polynomial in $1/b$. The results back the important intuition that smaller batch sizes yield lower loss function values which is a common believe among the researchers. The proof techniques exhibit a relationship between stochastic gradient estimators and initial weights, which is useful for further research on the dynamics of SGD. We empirically provide further insights to our results on various datasets and commonly used deep network structures.
Published: 2020

52. Keyword-based Topic Modeling and Keyword Selection

Author: Wang, Xingyu, Zhang, Lida, and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: Certain type of documents such as tweets are collected by specifying a set of keywords. As topics of interest change with time it is beneficial to adjust keywords dynamically. The challenge is that these need to be specified ahead of knowing the forthcoming documents and the underlying topics. The future topics should mimic past topics of interest yet there should be some novelty in them. We develop a keyword-based topic model that dynamically selects a subset of keywords to be used to collect future documents. The generative process first selects keywords and then the underlying documents based on the specified keywords. The model is trained by using a variational lower bound and stochastic gradient optimization. The inference consists of finding a subset of keywords where given a subset the model predicts the underlying topic-word matrix for the unknown forthcoming documents. We compare the keyword topic model against a benchmark model using viral predictions of tweets combined with a topic model. The keyword-based topic model outperforms this sophisticated baseline model by 67%.
Published: 2020

53. Listwise Learning to Rank by Exploring Unique Ratings

Author: Zhu, Xiaofeng and Klabjan, Diego
Subjects: Computer Science - Information Retrieval, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: In this paper, we propose new listwise learning-to-rank models that mitigate the shortcomings of existing ones. Existing listwise learning-to-rank models are generally derived from the classical Plackett-Luce model, which has three major limitations. (1) Its permutation probabilities overlook ties, i.e., a situation when more than one document has the same rating with respect to a query. This can lead to imprecise permutation probabilities and inefficient training because of selecting documents one by one. (2) It does not favor documents having high relevance. (3) It has a loose assumption that sampling documents at different steps is independent. To overcome the first two limitations, we model ranking as selecting documents from a candidate set based on unique rating levels in decreasing order. The number of steps in training is determined by the number of unique rating levels. We propose a new loss function and associated four models for the entire sequence of weighted classification tasks by assigning high weights to the selected documents with high ratings for optimizing Normalized Discounted Cumulative Gain (NDCG). To overcome the final limitation, we further propose a novel and efficient way of refining prediction scores by combining an adapted Vanilla Recurrent Neural Network (RNN) model with pooling given selected documents at previous steps. We encode all of the documents already selected by an RNN model. In a single step, we rank all of the documents with the same ratings using the last cell of the RNN multiple times. We have implemented our models using three settings: neural networks, neural networks with gradient boosting, and regression trees with gradient boosting. We have conducted experiments on four public datasets. The experiments demonstrate that the models notably outperform state-of-the-art learning-to-rank models.
Published: 2020
Full Text: View/download PDF

54. Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis

Author: Sandler, Adam, Klabjan, Diego, and Luo, Yuan
Subjects: Computer Science - Machine Learning, Statistics - Methodology, Statistics - Machine Learning
Abstract: We analyze large, multi-dimensional, sparse counting data sets, finding unsupervised groups to provide unique insights into genetic data. We create gene and biological pathway groups based on patients' variants to find common risk factors for four common types of cancer (breast, lung, prostate, and colorectal) and autism spectrum disorder. To accomplish this, we extend latent Dirichlet allocation to multiple dimensions and design distinct methods for hierarchical topic modeling. We find that our conditional hierarchical Bayesian Tucker decomposition models are more coherent than baseline models., Comment: 42 pages, 10 figures, 5 tables
Published: 2019

55. Mixture-based Multiple Imputation Model for Clinical Data with a Temporal Dimension

Author: Xue, Ye, Klabjan, Diego, and Luo, Yuan
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: The problem of missing values in multivariable time series is a key challenge in many applications such as clinical data mining. Although many imputation methods show their effectiveness in many applications, few of them are designed to accommodate clinical multivariable time series. In this work, we propose a multiple imputation model that capture both cross-sectional information and temporal correlations. We integrate Gaussian processes with mixture models and introduce individualized mixing weights to handle the variance of predictive confidence of Gaussian process models. The proposed model is compared with several state-of-the-art imputation algorithms on both real-world and synthetic datasets. Experiments show that our best model can provide more accurate imputation than the benchmarks on all of our datasets.
Published: 2019
Full Text: View/download PDF

56. Data Extraction from Charts via Single Deep Neural Network

Author: Liu, Xiaoyi, Klabjan, Diego, and NBless, Patrick
Subjects: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Computation and Language, Computer Science - Machine Learning
Abstract: Automatic data extraction from charts is challenging for two reasons: there exist many relations among objects in a chart, which is not a common consideration in general computer vision problems; and different types of charts may not be processed by the same model. To address these problems, we propose a framework of a single deep neural network, which consists of object detection, text recognition and object matching modules. The framework handles both bar and pie charts, and it may also be extended to other types of charts by slight revisions and by augmenting the training data. Our model performs successfully on 79.4% of test simulated bar charts and 88.0% of test simulated pie charts, while for charts outside of the training domain it degrades for 57.5% and 62.3%, respectively.
Published: 2019

57. Dynamic Cell Structure via Recursive-Recurrent Neural Networks

Author: Qian, Xin, Kennedy, Matthew, and Klabjan, Diego
Subjects: Computer Science - Machine Learning, Computer Science - Neural and Evolutionary Computing, Statistics - Machine Learning
Abstract: In a recurrent setting, conventional approaches to neural architecture search find and fix a general model for all data samples and time steps. We propose a novel algorithm that can dynamically search for the structure of cells in a recurrent neural network model. Based on a combination of recurrent and recursive neural networks, our algorithm is able to construct customized cell structures for each data sample and time step, allowing for a more efficient architecture search than existing models. Experiments on three common datasets show that the algorithm discovers high-performance cell architectures and achieves better prediction accuracy compared to the GRU structure for language modelling and sentiment analysis.
Published: 2019

58. Scale Invariant Power Iteration

Author: Kim, Cheolmin, Kim, Youngseok, and Klabjan, Diego
Subjects: Mathematics - Optimization and Control, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Power iteration has been generalized to solve many interesting problems in machine learning and statistics. Despite its striking success, theoretical understanding of when and how such an algorithm enjoys good convergence property is limited. In this work, we introduce a new class of optimization problems called scale invariant problems and prove that they can be efficiently solved by scale invariant power iteration (SCI-PI) with a generalized convergence guarantee of power iteration. By deriving that a stationary point is an eigenvector of the Hessian evaluated at the point, we show that scale invariant problems indeed resemble the leading eigenvector problem near a local optimum. Also, based on a novel reformulation, we geometrically derive SCI-PI which has a general form of power iteration. The convergence analysis shows that SCI-PI attains local linear convergence with a rate being proportional to the top two eigenvalues of the Hessian at the optimum. Moreover, we discuss some extended settings of scale invariant problems and provide similar convergence results for them. In numerical experiments, we introduce applications to independent component analysis, Gaussian mixtures, and non-negative matrix factorization. Experimental results demonstrate that SCI-PI is competitive to state-of-the-art benchmark algorithms and often yield better solutions.
Published: 2019

59. Convergence Analyses of Online ADAM Algorithm in Convex Setting and Two-Layer ReLU Neural Network

Author: Fang, Biyi and Klabjan, Diego
Subjects: Computer Science - Machine Learning, Computer Science - Data Structures and Algorithms, Statistics - Machine Learning
Abstract: Nowadays, online learning is an appealing learning paradigm, which is of great interest in practice due to the recent emergence of large scale applications such as online advertising placement and online web ranking. Standard online learning assumes a finite number of samples while in practice data is streamed infinitely. In such a setting gradient descent with a diminishing learning rate does not work. We first introduce regret with rolling window, a new performance metric for online streaming learning, which measures the performance of an algorithm on every fixed number of contiguous samples. At the same time, we propose a family of algorithms based on gradient descent with a constant or adaptive learning rate and provide very technical analyses establishing regret bound properties of the algorithms. We cover the convex setting showing the regret of the order of the square root of the size of the window in the constant and dynamic learning rate scenarios. Our proof is applicable also to the standard online setting where we provide the first analysis of the same regret order (the previous proofs have flaws). We also study a two layer neural network setting with ReLU activation. In this case we establish that if initial weights are close to a stationary point, the same square root regret bound is attainable. We conduct computational experiments demonstrating a superior performance of the proposed algorithms.
Published: 2019

60. Automatic Ontology Learning from Domain-Specific Short Unstructured Text Data

Author: Xu, Yiming, Rajpathak, Dnyanesh, Gibbs, Ian, and Klabjan, Diego
Subjects: Computer Science - Information Retrieval, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Ontology learning is a critical task in industry, dealing with identifying and extracting concepts captured in text data such that these concepts can be used in different tasks, e.g. information retrieval. Ontology learning is non-trivial due to several reasons with limited amount of prior research work that automatically learns a domain specific ontology from data. In our work, we propose a two-stage classification system to automatically learn an ontology from unstructured text data. We first collect candidate concepts, which are classified into concepts and irrelevant collocates by our first classifier. The concepts from the first classifier are further classified by the second classifier into different concept types. The proposed system is deployed as a prototype at a company and its performance is validated by using complaint and repair verbatim data collected in automotive industry from different data sources.
Published: 2019

61. Stochastic Variance-Reduced Heavy Ball Power Iteration

Author: Kim, Cheolmin and Klabjan, Diego
Subjects: Mathematics - Optimization and Control
Abstract: We present a stochastic variance-reduced heavy ball power iteration algorithm for solving PCA and provide a convergence analysis for it. The algorithm is an extension of heavy ball power iteration, incorporating a step size so that progress can be controlled depending on the magnitude of the variance of stochastic gradients. The algorithm works with any size of the mini-batch, and if the step size is appropriately chosen, it attains global linear convergence to the first eigenvector of the covariance matrix in expectation. The global linear convergence result in expectation is analogous to those of stochastic variance-reduced gradient methods for convex optimization but due to non-convexity of PCA, it has never been shown for previous stochastic variants of power iteration since it requires very different techniques. We provide the first such analysis and stress that our framework can be used to establish convergence of the previous stochastic algorithms for any initial vector and in expectation. Experimental results show that the algorithm attains acceleration in a large batch regime, outperforming benchmark algorithms especially when the eigen-gap is small.
Published: 2019

62. Autoencoders and Generative Adversarial Networks for Imbalanced Sequence Classification

Author: Ger, Stephanie, Jambunath, Yegna Subramanian, and Klabjan, Diego
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Generative Adversarial Networks (GANs) have been used in many different applications to generate realistic synthetic data. We introduce a novel GAN with Autoencoder (GAN-AE) architecture to generate synthetic samples for variable length, multi-feature sequence datasets. In this model, we develop a GAN architecture with an additional autoencoder component, where recurrent neural networks (RNNs) are used for each component of the model in order to generate synthetic data to improve classification accuracy for a highly imbalanced medical device dataset. In addition to the medical device dataset, we also evaluate the GAN-AE performance on two additional datasets and demonstrate the application of GAN-AE to a sequence-to-sequence task where both synthetic sequence inputs and sequence outputs must be generated. To evaluate the quality of the synthetic data, we train encoder-decoder models both with and without the synthetic data and compare the classification model performance. We show that a model trained with GAN-AE generated synthetic data outperforms models trained with synthetic data generated both with standard oversampling techniques such as SMOTE and Autoencoders as well as with state of the art GAN-based models.
Published: 2019

63. Layer Flexible Adaptive Computational Time

Author: Zhang, Lida, Ebrahimi, Abdolghani, and Klabjan, Diego
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning, I.2.6
Abstract: Deep recurrent neural networks perform well on sequence data and are the model of choice. However, it is a daunting task to decide the structure of the networks, i.e. the number of layers, especially considering different computational needs of a sequence. We propose a layer flexible recurrent neural network with adaptive computation time, and expand it to a sequence to sequence model. Different from the adaptive computation time model, our model has a dynamic number of transmission states which vary by step and sequence. We evaluate the model on a financial data set and Wikipedia language modeling. Experimental results show the performance improvement of 7\% to 12\% and indicate the model's ability to dynamically change the number of layers along with the computational steps., Comment: 11 pages, 5 figures
Published: 2018

64. An inverse classification framework with limited budget and maximum number of perturbed samples

Author: Koo, Jaehoon, Klabjan, Diego, and Utke, Jean
Published: 2023
Full Text: View/download PDF

65. Open-set recognition of breast cancer treatments

Author: Cao, Alexander, Klabjan, Diego, and Luo, Yuan
Published: 2023
Full Text: View/download PDF

66. Combined convolutional and recurrent neural networks for hierarchical classification of images

Author: Koo, Jaehoon, Klabjan, Diego, and Utke, Jean
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Deep learning models based on CNNs are predominantly used in image classification tasks. Such approaches, assuming independence of object categories, normally use a CNN as a feature learner and apply a flat classifier on top of it. Object classes in many settings have hierarchical relations, and classifiers exploiting these relations should perform better. We propose hierarchical classification models combining a CNN to extract hierarchical representations of images, and an RNN or sequence-to-sequence model to capture a hierarchical tree of classes. In addition, we apply residual learning to the RNN part in oder to facilitate training our compound model and improve generalization of the model. Experimental results on a real world proprietary dataset of images show that our hierarchical networks perform better than state-of-the-art CNNs.
Published: 2018

67. Unified recurrent neural network for many feature types

Author: Stec, Alexander, Klabjan, Diego, and Utke, Jean
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: There are time series that are amenable to recurrent neural network (RNN) solutions when treated as sequences, but some series, e.g. asynchronous time series, provide a richer variation of feature types than current RNN cells take into account. In order to address such situations, we introduce a unified RNN that handles five different feature types, each in a different manner. Our RNN framework separates sequential features into two groups dependent on their frequency, which we call sparse and dense features, and which affect cell updates differently. Further, we also incorporate time features at the sequential level that relate to the time between specified events in the sequence and are used to modify the cell's memory state. We also include two types of static (whole sequence level) features, one related to time and one not, which are combined with the encoder output. The experiments show that the modeling framework proposed does increase performance compared to standard cells.
Published: 2018

68. Nested multi-instance classification

Author: Stec, Alexander, Klabjan, Diego, and Utke, Jean
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: There are classification tasks that take as inputs groups of images rather than single images. In order to address such situations, we introduce a nested multi-instance deep network. The approach is generic in that it is applicable to general data instances, not just images. The network has several convolutional neural networks grouped together at different stages. This primarily differs from other previous works in that we organize instances into relevant groups that are treated differently. We also introduce a method to replace instances that are missing which successfully creates neutral input instances and consistently outperforms standard fill-in methods in real world use cases. In addition, we propose a method for manual dropout when a whole group of instances is missing that allows us to use richer training data and obtain higher accuracy at the end of training. With specific pretraining, we find that the model works to great effect on our real world and pub-lic datasets in comparison to baseline methods, justifying the different treatment among groups of instances.
Published: 2018

69. Dynamic Prediction Length for Time Series with Sequence to Sequence Networks

Author: Harmon, Mark and Klabjan, Diego
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Recurrent neural networks and sequence to sequence models require a predetermined length for prediction output length. Our model addresses this by allowing the network to predict a variable length output in inference. A new loss function with a tailored gradient computation is developed that trades off prediction accuracy and output length. The model utilizes a function to determine whether a particular output at a time should be evaluated or not given a predetermined threshold. We evaluate the model on the problem of predicting the prices of securities. We find that the model makes longer predictions for more stable securities and it naturally balances prediction accuracy and length.
Published: 2018

70. Forecasting Crime with Deep Learning

Author: Stec, Alexander and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: The objective of this work is to take advantage of deep neural networks in order to make next day crime count predictions in a fine-grain city partition. We make predictions using Chicago and Portland crime data, which is augmented with additional datasets covering weather, census data, and public transportation. The crime counts are broken into 10 bins and our model predicts the most likely bin for a each spatial region at a daily level. We train this data using increasingly complex neural network structures, including variations that are suited to the spatial and temporal aspects of the crime prediction problem. With our best model we are able to predict the correct bin for overall crime count with 75.6% and 65.3% accuracy for Chicago and Portland, respectively. The results show the efficacy of neural networks for the prediction problem and the value of using external datasets in addition to standard crime data.
Published: 2018

71. Bayesian active learning for choice models with deep Gaussian processes

Author: Yang, Jie and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: In this paper, we propose an active learning algorithm and models which can gradually learn individual's preference through pairwise comparisons. The active learning scheme aims at finding individual's most preferred choice with minimized number of pairwise comparisons. The pairwise comparisons are encoded into probabilistic models based on assumptions of choice models and deep Gaussian processes. The next-to-compare decision is determined by a novel acquisition function. We benchmark the proposed algorithm and models using functions with multiple local optima and one public airline itinerary dataset. The experiments indicate the effectiveness of our active learning algorithm and models.
Published: 2018

72. k-Nearest Neighbors by Means of Sequence to Sequence Deep Neural Networks and Memory Networks

Author: Xu, Yiming and Klabjan, Diego
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: k-Nearest Neighbors is one of the most fundamental but effective classification models. In this paper, we propose two families of models built on a sequence to sequence model and a memory network model to mimic the k-Nearest Neighbors model, which generate a sequence of labels, a sequence of out-of-sample feature vectors and a final label for classification, and thus they could also function as oversamplers. We also propose 'out-of-core' versions of our models which assume that only a small portion of data can be loaded into memory. Computational experiments show that our models on structured datasets outperform k-Nearest Neighbors, a feed-forward neural network, XGBoost, lightGBM, random forest and a memory network, due to the fact that our models must produce additional output and not just the label. On image and text datasets, the performance of our model is close to many state-of-the-art deep models. As an oversampler on imbalanced datasets, the sequence to sequence kNN model often outperforms Synthetic Minority Over-sampling Technique and Adaptive Synthetic Sampling.
Published: 2018

73. Improved Classification Based on Deep Belief Networks

Author: Koo, Jaehoon and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: For better classification generative models are used to initialize the model and model features before training a classifier. Typically it is needed to solve separate unsupervised and supervised learning problems. Generative restricted Boltzmann machines and deep belief networks are widely used for unsupervised learning. We developed several supervised models based on DBN in order to improve this two-phase strategy. Modifying the loss function to account for expectation with respect to the underlying generative model, introducing weight bounds, and multi-level programming are applied in model development. The proposed models capture both unsupervised and supervised objectives effectively. The computational study verifies that our models perform better than the two-phase training approach.
Published: 2018

74. A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations

Author: Fang, Biyi and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine learning and predictive modeling, distributed optimization methods have recently garnered ample attention, in particular when either observations or features are distributed, but not both. We propose a general stochastic algorithm where observations, features, and gradient components can be sampled in a double distributed setting, i.e., with both features and observations distributed. Very technical analyses establish convergence properties of the algorithm under different conditions on the learning rate (diminishing to zero or constant). Computational experiments in Spark demonstrate a superior performance of our algorithm versus a benchmark in early iterations of the algorithm, which is due to the stochastic components of the algorithm., Comment: 11 figures, 41 pages
Published: 2018

75. Truth Validation with Evidence

Author: Wongchaisuwat, Papis and Klabjan, Diego
Subjects: Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: In the modern era, abundant information is easily accessible from various sources, however only a few of these sources are reliable as they mostly contain unverified contents. We develop a system to validate the truthfulness of a given statement together with underlying evidence. The proposed system provides supporting evidence when the statement is tagged as false. Our work relies on an inference method on a knowledge graph (KG) to identify the truthfulness of statements. In order to extract the evidence of falseness, the proposed algorithm takes into account combined knowledge from KG and ontologies. The system shows very good results as it provides valid and concise evidence. The quality of KG plays a role in the performance of the inference method which explicitly affects the performance of our evidence-extracting algorithm., Comment: 40 pages (including Appendix), 3 tables, 3 figures
Published: 2018

76. Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations

Author: Wang, Xingyu and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be not optimal. Compared to previous works that decouple agents in the game by assuming optimality in expert strategies, we introduce a new objective function that directly pits experts against Nash Equilibrium strategies, and we design an algorithm to solve for the reward function in the context of inverse reinforcement learning with deep neural networks as model approximations. In our setting the model and algorithm do not decouple by agent. In order to find Nash Equilibrium in large-scale games, we also propose an adversarial training algorithm for zero-sum stochastic games, and show the theoretical appeal of non-existence of local optima in its objective function. In our numerical experiments, we demonstrate that our Nash Equilibrium and inverse reinforcement learning algorithms address games that are not amenable to previous approaches using tabular representations. Moreover, with sub-optimal expert demonstrations our algorithms recover both reward functions and strategies with good quality., Comment: 31 pages, to be presented at ICML 2018
Published: 2018

77. Generative Adversarial Nets for Multiple Text Corpora

Author: Wang, Baiyang and Klabjan, Diego
Subjects: Computer Science - Computation and Language
Abstract: Generative adversarial nets (GANs) have been successfully applied to the artificial generation of image data. In terms of text data, much has been done on the artificial generation of natural language from a single corpus. We consider multiple text corpora as the input data, for which there can be two applications of GANs: (1) the creation of consistent cross-corpus word embeddings given different word embeddings per corpus; (2) the generation of robust bag-of-words document embeddings for each corpora. We demonstrate our GAN models on real-world text data sets from different corpora, and show that embeddings from both models lead to improvements in supervised learning problems.
Published: 2017

78. A mixed methods analysis of caller-emergency medical dispatcher communication during 9–1–1 calls for out-of-hospital cardiac arrest

Author: Richards, Christopher T., McCarthy, Danielle M., Markul, Eddie, Rottman, Doreen R., Lindeman, Patricia, Prabhakaran, Shyam, Klabjan, Diego, Holl, Jane L., and Cameron, Kenzie A.
Published: 2022
Full Text: View/download PDF

79. Truth validation with evidence

Author: Wongchaisuwat, Papis and Klabjan, Diego
Published: 2022
Full Text: View/download PDF

80. OSTSC: Over Sampling for Time Series Classification in R

Author: Dixon, Matthew, Klabjan, Diego, and Wei, Lan
Subjects: Statistics - Computation, Statistics - Machine Learning
Abstract: The OSTSC package is a powerful oversampling approach for classifying univariant, but multinomial time series data in R. This article provides a brief overview of the oversampling methodology implemented by the package. A tutorial of the OSTSC package is provided. We begin by providing three test cases for the user to quickly validate the functionality in the package. To demonstrate the performance impact of OSTSC, we then provide two medium size imbalanced time series datasets. Each example applies a TensorFlow implementation of a Long Short-Term Memory (LSTM) classifier - a type of a Recurrent Neural Network (RNN) classifier - to imbalanced time series. The classifier performance is compared with and without oversampling. Finally, larger versions of these two datasets are evaluated to demonstrate the scalability of the package. The examples demonstrate that the OSTSC package improves the performance of RNN classifiers applied to highly imbalanced time series data. In particular, OSTSC is observed to increase the AUC of LSTM from 0.543 to 0.784 on a high frequency trading dataset consisting of 30,000 time series observations.
Published: 2017

81. A Simple and Fast Algorithm for L1-norm Kernel PCA

Author: Kim, Cheolmin and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: We present an algorithm for L1-norm kernel PCA and provide a convergence analysis for it. While an optimal solution of L2-norm kernel PCA can be obtained through matrix decomposition, finding that of L1-norm kernel PCA is not trivial due to its non-convexity and non-smoothness. We provide a novel reformulation through which an equivalent, geometrically interpretable problem is obtained. Based on the geometric interpretation of the reformulated problem, we present a fixed-point type algorithm that iteratively computes a binary weight for each observation. As the algorithm requires only inner products of data vectors, it is computationally efficient and the kernel trick is applicable. In the convergence analysis, we show that the algorithm converges to a local optimal solution in a finite number of steps. Moreover, we provide a rate of convergence analysis, which has been never done for any L1-norm PCA algorithm, proving that the sequence of objective values converges at a linear rate. In numerical experiments, we show that the algorithm is robust in the presence of entry-wise perturbations and computationally scalable, especially in a large-scale setting. Lastly, we introduce an application to outlier detection where the model based on the proposed algorithm outperforms the benchmark algorithms., Comment: 14 pages, 7 figures
Published: 2017
Full Text: View/download PDF

82. Semantic Document Distance Measures and Unsupervised Document Revision Detection

Author: Zhu, Xiaofeng, Klabjan, Diego, and Bless, Patrick
Subjects: Computer Science - Information Retrieval, Computer Science - Computation and Language
Abstract: In this paper, we model the document revision detection problem as a minimum cost branching problem that relies on computing document distances. Furthermore, we propose two new document distance measures, word vector-based Dynamic Time Warping (wDTW) and word vector-based Tree Edit Distance (wTED). Our revision detection system is designed for a large scale corpus and implemented in Apache Spark. We demonstrate that our system can more precisely detect revisions than state-of-the-art methods by utilizing the Wikipedia revision dumps https://snap.stanford.edu/data/wiki-meta.html and simulated data sets.
Published: 2017

83. Unsupervised Terminological Ontology Learning based on Hierarchical Topic Modeling

Author: Zhu, Xiaofeng, Klabjan, Diego, and Bless, Patrick
Subjects: Computer Science - Computation and Language, Computer Science - Information Retrieval, Computer Science - Machine Learning
Abstract: In this paper, we present hierarchical relationbased latent Dirichlet allocation (hrLDA), a data-driven hierarchical topic model for extracting terminological ontologies from a large number of heterogeneous documents. In contrast to traditional topic models, hrLDA relies on noun phrases instead of unigrams, considers syntax and document structures, and enriches topic hierarchies with topic relations. Through a series of experiments, we demonstrate the superiority of hrLDA over existing topic models, especially for building hierarchies. Furthermore, we illustrate the robustness of hrLDA in the settings of noisy data sets, which are likely to occur in many practical scenarios. Our ontology evaluation results show that ontologies extracted from hrLDA are very competitive with the ontologies created by domain experts.
Published: 2017
Full Text: View/download PDF

84. Online Adaptive Machine Learning Based Algorithm for Implied Volatility Surface Modeling

Author: Zeng, Yaxiong and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Learning, Quantitative Finance - Computational Finance
Abstract: In this work, we design a machine learning based method, online adaptive primal support vector regression (SVR), to model the implied volatility surface (IVS). The algorithm proposed is the first derivation and implementation of an online primal kernel SVR. It features enhancements that allow efficient online adaptive learning by embedding the idea of local fitness and budget maintenance to dynamically update support vectors upon pattern drifts. For algorithm acceleration, we implement its most computationally intensive parts in a Field Programmable Gate Arrays hardware, where a 132x speedup over CPU is achieved during online prediction. Using intraday tick data from the E-mini S&P 500 options market, we show that the Gaussian kernel outperforms the linear kernel in regulating the size of support vectors, and that our empirical IVS algorithm beats two competing online methods with regards to model complexity and regression errors (the mean absolute percentage error of our algorithm is up to 13%). Best results are obtained at the center of the IVS grid due to its larger number of adjacent support vectors than the edges of the grid. Sensitivity analysis is also presented to demonstrate how hyper parameters affect the error rates and model complexity., Comment: 34 Pages
Published: 2017

85. Improving the Expected Improvement Algorithm

Author: Qin, Chao, Klabjan, Diego, and Russo, Daniel
Subjects: Computer Science - Learning, Statistics - Machine Learning
Abstract: The expected improvement (EI) algorithm is a popular strategy for information collection in optimization under uncertainty. The algorithm is widely known to be too greedy, but nevertheless enjoys wide use due to its simplicity and ability to handle uncertainty and noise in a coherent decision theoretic framework. To provide rigorous insight into EI, we study its properties in a simple setting of Bayesian optimization where the domain consists of a finite grid of points. This is the so-called best-arm identification problem, where the goal is to allocate measurement effort wisely to confidently identify the best arm using a small number of measurements. In this framework, one can show formally that EI is far from optimal. To overcome this shortcoming, we introduce a simple modification of the expected improvement algorithm. Surprisingly, this simple change results in an algorithm that is asymptotically optimal for Gaussian best-arm identification problems, and provably outperforms standard EI by an order of magnitude., Comment: Submitted to NIPS 2017
Published: 2017

86. Diminishing Batch Normalization

Author: Ma, Yintai and Klabjan, Diego
Subjects: Computer Science - Machine Learning
Abstract: In this paper, we propose a generalization of the Batch Normalization (BN) algorithm, diminishing batch normalization (DBN), where we update the BN parameters in a diminishing moving average way. BN is very effective in accelerating the convergence of a neural network training phase that it has become a common practice. Our proposed DBN algorithm remains the overall structure of the original BN algorithm while introduces a weighted averaging update to some trainable parameters. We provide an analysis of the convergence of the DBN algorithm that converges to a stationary point with respect to trainable parameters. Our analysis can be easily generalized for original BN algorithm by setting some parameters to constant. To the best knowledge of authors, this analysis is the first of its kind for convergence with Batch Normalization introduced. We analyze a two-layer model with arbitrary activation function. The primary challenge of the analysis is the fact that some parameters are updated by gradient while others are not. The convergence analysis applies to any activation function that satisfies our common assumptions. In the numerical experiments, we test the proposed algorithm on complex modern CNN models with stochastic gradients and ReLU activation. We observe that DBN outperforms the original BN algorithm on MNIST, NI and CIFAR-10 datasets with reasonable complex FNN and CNN models.
Published: 2017

87. Activation Ensembles for Deep Neural Networks

Author: Harmon, Mark and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: Many activation functions have been proposed in the past, but selecting an adequate one requires trial and error. We propose a new methodology of designing activation functions within a neural network at each layer. We call this technique an "activation ensemble" because it allows the use of multiple activation functions at each layer. This is done by introducing additional variables, $\alpha$, at each activation layer of a network to allow for multiple activation functions to be active at each neuron. By design, activations with larger $\alpha$ values at a neuron is equivalent to having the largest magnitude. Hence, those higher magnitude activations are "chosen" by the network. We implement the activation ensembles on a variety of datasets using an array of Feed Forward and Convolutional Neural Networks. By using the activation ensemble, we achieve superior results compared to traditional techniques. In addition, because of the flexibility of this methodology, we more deeply explore activation functions and the features that they capture.
Published: 2017

88. An Attention-Based Deep Net for Learning to Rank

Author: Wang, Baiyang and Klabjan, Diego
Subjects: Computer Science - Learning
Abstract: In information retrieval, learning to rank constructs a machine-based ranking model which given a query, sorts the search results by their degree of relevance or importance to the query. Neural networks have been successfully applied to this problem, and in this paper, we propose an attention-based deep neural network which better incorporates different embeddings of the queries and search results with an attention-based mechanism. This model also applies a decoder mechanism to learn the ranks of the search results in a listwise fashion. The embeddings are trained with convolutional neural networks or the word2vec model. We demonstrate the performance of this model with image retrieval and text querying data sets.
Published: 2017

89. Semi-supervised Learning for Discrete Choice Models

Author: Yang, Jie, Shebalov, Sergey, and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: We introduce a semi-supervised discrete choice model to calibrate discrete choice models when relatively few requests have both choice sets and stated preferences but the majority only have the choice sets. Two classic semi-supervised learning algorithms, the expectation maximization algorithm and the cluster-and-label algorithm, have been adapted to our choice modeling problem setting. We also develop two new algorithms based on the cluster-and-label algorithm. The new algorithms use the Bayesian Information Criterion to evaluate a clustering setting to automatically adjust the number of clusters. Two computational studies including a hotel booking case and a large-scale airline itinerary shopping case are presented to evaluate the prediction accuracy and computational effort of the proposed algorithms. Algorithmic recommendations are rendered under various scenarios.
Published: 2017

90. Subset Selection for Multiple Linear Regression via Optimization

Author: Park, Young Woong and Klabjan, Diego
Subjects: Statistics - Machine Learning
Abstract: Subset selection in multiple linear regression aims to choose a subset of candidate explanatory variables that tradeoff fitting error (explanatory power) and model complexity (number of variables selected). We build mathematical programming models for regression subset selection based on mean square and absolute errors, and minimal-redundancy-maximal-relevance criteria. The proposed models are tested using a linear-program-based branch-and-bound algorithm with tailored valid inequalities and big M values and are compared against the algorithms in the literature. For high dimensional cases, an iterative heuristic algorithm is proposed based on the mathematical programming models and a core set concept, and a randomized version of the algorithm is derived to guarantee convergence to the global optimum. From the computational experiments, we find that our models quickly find a quality solution while the rest of the time is spent to prove optimality; the iterative algorithms find solutions in a relatively short time and are competitive compared to state-of-the-art algorithms; using ad-hoc big M values is not recommended.
Published: 2017
Full Text: View/download PDF

91. Bayesian Network Learning via Topological Order

Author: Park, Young Woong and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Data Structures and Algorithms
Abstract: We propose a mixed integer programming (MIP) model and iterative algorithms based on topological orders to solve optimization problems with acyclic constraints on a directed graph. The proposed MIP model has a significantly lower number of constraints compared to popular MIP models based on cycle elimination constraints and triangular inequalities. The proposed iterative algorithms use gradient descent and iterative reordering approaches, respectively, for searching topological orders. A computational experiment is presented for the Gaussian Bayesian network learning problem, an optimization problem minimizing the sum of squared errors of regression models with L1 penalty over a feature network with application of gene network inference in bioinformatics.
Published: 2017

92. Retention Prediction in Sandbox Games with Bipartite Tensor Factorization

Author: Sifa, Rafet, Fedell, Michael, Franklin, Nathan, Klabjan, Diego, Ram, Shiva, Venugopal, Arpan, Demediuk, Simon, Drachen, Anders, Kacprzyk, Janusz, Series Editor, Pal, Nikhil R., Advisory Editor, Bello Perez, Rafael, Advisory Editor, Corchado, Emilio S., Advisory Editor, Hagras, Hani, Advisory Editor, Kóczy, László T., Advisory Editor, Kreinovich, Vladik, Advisory Editor, Lin, Chin-Teng, Advisory Editor, Lu, Jie, Advisory Editor, Melin, Patricia, Advisory Editor, Nedjah, Nadia, Advisory Editor, Nguyen, Ngoc Thanh, Advisory Editor, Wang, Jun, Advisory Editor, Arai, Kohei, editor, Kapoor, Supriya, editor, and Bhatia, Rahul, editor
Published: 2020
Full Text: View/download PDF

93. Improved Classification Based on Deep Belief Networks

Author: Koo, Jaehoon, Klabjan, Diego, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Farkaš, Igor, editor, Masulli, Paolo, editor, and Wermter, Stefan, editor
Published: 2020
Full Text: View/download PDF

94. Optimization for Large-Scale Machine Learning with Distributed Features and Observations

Author: Nathan, Alexandros and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine learning and predictive modeling, distributed optimization methods have recently garnered ample attention in the literature. Although previous research has mostly focused on settings where either the observations, or features of the problem at hand are stored in distributed fashion, the situation where both are partitioned across the nodes of a computer cluster (doubly distributed) has barely been studied. In this work we propose two doubly distributed optimization algorithms. The first one falls under the umbrella of distributed dual coordinate ascent methods, while the second one belongs to the class of stochastic gradient/coordinate descent hybrid methods. We conduct numerical experiments in Spark using real-world and simulated data sets and study the scaling properties of our methods. Our empirical evaluation of the proposed algorithms demonstrates the out-performance of a block distributed ADMM method, which, to the best of our knowledge is the only other existing doubly distributed optimization algorithm.
Published: 2016

95. Predicting Shot Making in Basketball Learnt from Adversarial Multiagent Trajectories

Author: Harmon, Mark, Ebrahimi, Abdolghani, Lucey, Patrick, and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: In this paper, we predict the likelihood of a player making a shot in basketball from multiagent trajectories. Previous approaches to similar problems center on hand-crafting features to capture domain specific knowledge. Although intuitive, recent work in deep learning has shown this approach is prone to missing important predictive features. To circumvent this issue, we present a convolutional neural network (CNN) approach where we initially represent the multiagent behavior as an image. To encode the adversarial nature of basketball, we use a multi-channel image which we then feed into a CNN. Additionally, to capture the temporal aspect of the trajectories we "fade" the player trajectories. We find that this approach is superior to a traditional FFN model. By using gradient ascent to create images using an already trained CNN, we discover what features the CNN filters learn. Last, we find that a combined CNN+FFN is the best performing network with an error rate of 39%.
Published: 2016

96. Iteratively Reweighted Least Squares Algorithms for L1-Norm Principal Component Analysis

Author: Park, Young Woong and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Machine Learning
Abstract: Principal component analysis (PCA) is often used to reduce the dimension of data by selecting a few orthonormal vectors that explain most of the variance structure of the data. L1 PCA uses the L1 norm to measure error, whereas the conventional PCA uses the L2 norm. For the L1 PCA problem minimizing the fitting error of the reconstructed data, we propose an exact reweighted and an approximate algorithm based on iteratively reweighted least squares. We provide convergence analyses, and compare their performance against benchmark algorithms in the literature. The computational experiment shows that the proposed algorithms consistently perform best.
Published: 2016
Full Text: View/download PDF

97. Regularization for Unsupervised Deep Neural Nets

Author: Wang, Baiyang and Klabjan, Diego
Subjects: Computer Science - Learning, Computer Science - Neural and Evolutionary Computing
Abstract: Unsupervised neural networks, such as restricted Boltzmann machines (RBMs) and deep belief networks (DBNs), are powerful tools for feature selection and pattern recognition tasks. We demonstrate that overfitting occurs in such models just as in deep feedforward neural networks, and discuss possible regularization methods to reduce overfitting. We also propose a "partial" approach to improve the efficiency of Dropout/DropConnect in this scenario, and discuss the theoretical justification of these methods from model convergence and likelihood bounds. Finally, we compare the performance of these methods based on their likelihood and classification error rates for various pattern recognition data sets.
Published: 2016

98. Rapid Prediction of Player Retention in Free-to-Play Mobile Games

Author: Drachen, Anders, Lundquist, Eric Thurston, Kung, Yungjen, Rao, Pranav Simha, Klabjan, Diego, Sifa, Rafet, and Runge, Julian
Subjects: Statistics - Machine Learning, Computer Science - Social and Information Networks, Statistics - Applications
Abstract: Predicting and improving player retention is crucial to the success of mobile Free-to-Play games. This paper explores the problem of rapid retention prediction in this context. Heuristic modeling approaches are introduced as a way of building simple rules for predicting short-term retention. Compared to common classification algorithms, our heuristic-based approach achieves reasonable and comparable performance using information from the first session, day, and week of player activity., Comment: Draft Submitted to AIIDE-16. 7 pages, 5 figures, 3 tables
Published: 2016

99. An Aggregate and Iterative Disaggregate Algorithm with Proven Optimality in Machine Learning

Author: Park, Young Woong and Klabjan, Diego
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: We propose a clustering-based iterative algorithm to solve certain optimization problems in machine learning, where we start the algorithm by aggregating the original data, solving the problem on aggregated data, and then in subsequent steps gradually disaggregate the aggregated data. We apply the algorithm to common machine learning problems such as the least absolute deviation regression problem, support vector machines, and semi-supervised support vector machines. We derive model-specific data aggregation and disaggregation procedures. We also show optimality, convergence, and the optimality gap of the approximated solution in each iteration. A computational study is provided.
Published: 2016
Full Text: View/download PDF

100. Algorithms for Generalized Cluster-wise Linear Regression

Author: Park, Young Woong, Jiang, Yan, Klabjan, Diego, and Williams, Loren
Subjects: Statistics - Machine Learning, Computer Science - Learning
Abstract: Cluster-wise linear regression (CLR), a clustering problem intertwined with regression, is to find clusters of entities such that the overall sum of squared errors from regressions performed over these clusters is minimized, where each cluster may have different variances. We generalize the CLR problem by allowing each entity to have more than one observation, and refer to it as generalized CLR. We propose an exact mathematical programming based approach relying on column generation, a column generation based heuristic algorithm that clusters predefined groups of entities, a metaheuristic genetic algorithm with adapted Lloyd's algorithm for K-means clustering, a two-stage approach, and a modified algorithm of Sp{\"a}th \cite{Spath1979} for solving generalized CLR. We examine the performance of our algorithms on a stock keeping unit (SKU) clustering problem employed in forecasting halo and cannibalization effects in promotions using real-world retail data from a large supermarket chain. In the SKU clustering problem, the retailer needs to cluster SKUs based on their seasonal effects in response to promotions. The seasonal effects are the results of regressions with predictors being promotion mechanisms and seasonal dummies performed over clusters generated. We compare the performance of all proposed algorithms for the SKU problem with real-world and synthetic data.
Published: 2016
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

618 results on '"Klabjan, Diego"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources