119 results on '"Chen Rong"'
Search Results
2. A New Theoretical Perspective on Data Heterogeneity in Federated Optimization
- Author
-
Wang, Jiayi, Wang, Shiqiang, Chen, Rong-Rong, and Ji, Mingyue
- Subjects
Computer Science - Machine Learning ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Information Theory ,Mathematics - Optimization and Control - Abstract
In federated learning (FL), data heterogeneity is the main reason that existing theoretical analyses are pessimistic about the convergence rate. In particular, for many FL algorithms, the convergence rate grows dramatically when the number of local updates becomes large, especially when the product of the gradient divergence and local Lipschitz constant is large. However, empirical studies can show that more local updates can improve the convergence rate even when these two parameters are large, which is inconsistent with the theoretical findings. This paper aims to bridge this gap between theoretical understanding and practical performance by providing a theoretical analysis from a new perspective on data heterogeneity. In particular, we propose a new and weaker assumption compared to the local Lipschitz gradient assumption, named the heterogeneity-driven pseudo-Lipschitz assumption. We show that this and the gradient divergence assumptions can jointly characterize the effect of data heterogeneity. By deriving a convergence upper bound for FedAvg and its extensions, we show that, compared to the existing works, local Lipschitz constant is replaced by the much smaller heterogeneity-driven pseudo-Lipschitz constant and the corresponding convergence upper bound can be significantly reduced for the same number of local updates, although its order stays the same. In addition, when the local objective function is quadratic, more insights on the impact of data heterogeneity can be obtained using the heterogeneity-driven pseudo-Lipschitz constant. For example, we can identify a region where FedAvg can outperform mini-batch SGD even when the gradient divergence can be arbitrarily large. Our findings are validated using experiments., Comment: ICML 2024
- Published
- 2024
3. Dynamic Matrix Factor Models for High Dimensional Time Series
- Author
-
Yu, Ruofan, Chen, Rong, Xiao, Han, and Han, Yuefeng
- Subjects
Statistics - Methodology ,Economics - Econometrics - Abstract
Matrix time series, which consist of matrix-valued data observed over time, are prevalent in various fields such as economics, finance, and engineering. Such matrix time series data are often observed in high dimensions. Matrix factor models are employed to reduce the dimensionality of such data, but they lack the capability to make predictions without specified dynamics in the latent factor process. To address this issue, we propose a two-component dynamic matrix factor model that extends the standard matrix factor model by incorporating a matrix autoregressive structure for the low-dimensional latent factor process. This two-component model injects prediction capability to the matrix factor model and provides deeper insights into the dynamics of high-dimensional matrix time series. We present the estimation procedures of the model and their theoretical properties, as well as empirical analysis of the estimation procedures via simulations, and a case study of New York city taxi data, demonstrating the performance and usefulness of the model.
- Published
- 2024
4. Congruences modulo powers of $5$ and $7$ for the crank and rank parity functions and related mock theta functions
- Author
-
Chen, Dandan, Chen, Rong, and Garvan, Frank
- Subjects
Mathematics - Number Theory ,Mathematics - Combinatorics ,05A17, 11F33, 11F37, 11P83, 33D15 - Abstract
It is well known that Ramanujan conjectured congruences modulo powers of $5$, $7$ and and $11$ for the partition function. These were subsequently proved by Watson (1938) and Atkin (1967). In 2009 Choi, Kang, and Lovejoy proved congruences modulo powers of $5$ for the crank parity function. The generating function for the analogous rank parity function is $f(q)$, the first example of a mock theta function that Ramanujan mentioned in his last letter to Hardy. Recently we proved congruences modulo powers of $5$ for the rank parity function, and here we extend these congruences for powers of $7$. We also show how these congruences imply congruences modulo powers of $5$ and $7$ for the coefficients of the related third order mock theta function $\omega(q)$, using Atkin-Lehner involutions and transformation results of Zwegers. Finally we a prove a family of congruences modulo powers of $7$ for the crank parity function., Comment: 44 pages
- Published
- 2024
5. Seymour and Woodall's conjecture holds for graphs with independence number two
- Author
-
Chen, Rong and Deng, Zijian
- Subjects
Mathematics - Combinatorics - Abstract
Woodall (and Seymour independently) in 2001 proposed a conjecture that every graph $G$ contains every complete bipartite graph on $\chi(G)$ vertices as a minor, where $\chi(G)$ is the chromatic number of $G$. In this paper, we prove that for each positive integer $\ell$ with $2\ell \leq \chi(G)$, each graph $G$ with independence number two contains a $K^{\ell}_{\ell,\chi(G)-\ell}$-minor, implying that Seymour and Woodall's conjecture holds for graphs with independence number two, where $K^{\ell}_{\ell,\chi(G)-\ell}$ is the graph obtained from $K_{\ell,\chi(G)-\ell}$ by making every pair of vertices on the side of the bipartition of size $\ell$ adjacent.
- Published
- 2024
6. PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation
- Author
-
Huang, Zhuobin, Wei, Xingda, Hao, Yingyi, Chen, Rong, Han, Mingcong, Gu, Jinyu, and Chen, Haibo
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Operating Systems - Abstract
Checkpointing (C) and restoring (R) are key components for GPU tasks. POS is an OS-level GPU C/R system: It can transparently checkpoint or restore processes that use the GPU, without requiring any cooperation from the application, a key feature required by modern systems like the cloud. Moreover, POS is the first OS-level C/R system that can concurrently execute C/R with the application execution: a critical feature that can be trivially achieved when the processes only running on the CPU, but becomes challenging when the processes use GPU. The problem is how to ensure consistency during concurrent execution with the lack of application semantics due to transparency. CPU processes can leverage OS and hardware paging to fix inconsistency without application semantics. Unfortunately, GPU bypasses OS and paging for high performance. POS fills the semantic gap by speculatively extracting buffer access information of GPU kernels during runtime. Thanks to the simple and well-structured nature of GPU kernels, our speculative extraction (with runtime validation) achieves 100% accuracy on applications from training to inference whose domains span from vision, large language models, and reinforcement learning. Based on the extracted semantics, we systematically overlap C/R with application execution, and achieves orders of magnitude higher performance under various tasks compared with the state-of-the-art OS-level GPU C/R, including training fault tolerance, live GPU process migration, and cold starts acceleration in GPU-based serverless computing.
- Published
- 2024
7. Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory
- Author
-
Cheng, Rongxin, Peng, Yifan, Wei, Xingda, Xie, Hongrui, Chen, Rong, Shen, Sijie, and Chen, Haibo
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Databases ,Computer Science - Information Retrieval - Abstract
Vector searches on large-scale datasets are critical to modern online services like web search and RAG, which necessity storing the datasets and their index on the secondary storage like SSD. In this paper, we are the first to characterize the trade-off of performance and index size in existing SSD-based graph and cluster indexes: to improve throughput by 5.7$\times$ and 1.7$\times$, these indexes have to pay a 5.8$\times$ storage amplification and 7.7$\times$ with respect to the dataset size, respectively. The root cause is that the coarse-grained access of SSD mismatches the fine-grained random read required by vector indexes with small amplification. This paper argues that second-tier memory, such as remote DRAM/NVM connected via RDMA or CXL, is a powerful storage for addressing the problem from a system's perspective, thanks to its fine-grained access granularity. However, putting existing indexes -- primarily designed for SSD -- directly on second-tier memory cannot fully utilize its power. Meanwhile, second-tier memory still behaves more like storage, so using it as DRAM is also inefficient. To this end, we build a graph and cluster index that centers around the performance features of second-tier memory. With careful execution engine and index layout designs, we show that vector indexes can achieve optimal performance with orders of magnitude smaller index amplification, on a variety of second-tier memory devices. Based on our improved graph and vector indexes on second-tier memory, we further conduct a systematic study between them to facilitate developers choosing the right index for their workloads. Interestingly, the findings on the second-tier memory contradict the ones on SSDs.
- Published
- 2024
8. Time-Varying Matrix Factor Models
- Author
-
Chen, Bin, Chen, Elynn Y., Bolivar, Stevenson, and Chen, Rong
- Subjects
Statistics - Methodology - Abstract
Matrix-variate data of high dimensions are frequently observed in finance and economics, spanning extended time periods, such as the long-term data on international trade flows among numerous countries. To address potential structural shifts and explore the matrix structure's informational context, we propose a time-varying matrix factor model. This model accommodates changing factor loadings over time, revealing the underlying dynamic structure through nonparametric principal component analysis and facilitating dimension reduction. We establish the consistency and asymptotic normality of our estimators under general conditions that allow for weak correlations across time, rows, or columns of the noise. A novel approach is introduced to overcome rotational ambiguity in the estimators, enhancing the clarity and interpretability of the estimated loading matrices. Our simulation study highlights the merits of the proposed estimators and the effective of the smoothing operation. In an application to international trade flow, we investigate the trading hubs, centrality, patterns, and trends in the trading network.
- Published
- 2024
9. Characterizing Network Requirements for GPU API Remoting in AI Applications
- Author
-
Wang, Tianxia, Chen, Zhuofu, Wei, Xingda, Gu, Jinyu, Chen, Rong, and Chen, Haibo
- Subjects
Computer Science - Operating Systems ,Computer Science - Networking and Internet Architecture - Abstract
GPU remoting is a promising technique for supporting AI applications. Networking plays a key role in enabling remoting. However, for efficient remoting, the network requirements in terms of latency and bandwidth are unknown. In this paper, we take a GPU-centric approach to derive the minimum latency and bandwidth requirements for GPU remoting, while ensuring no (or little) performance degradation for AI applications. Our study including theoretical model demonstrates that, with careful remoting design, unmodified AI applications can run on the remoting setup using commodity networking hardware without any overhead or even with better performance, with low network demands.
- Published
- 2024
10. Cross-target Stance Detection by Exploiting Target Analytical Perspectives
- Author
-
Ding, Daijun, Chen, Rong, Jing, Liwen, Zhang, Bowen, Huang, Xu, Dong, Li, Zhao, Xiaowen, and Song, Ge
- Subjects
Computer Science - Computation and Language - Abstract
Cross-target stance detection (CTSD) is an important task, which infers the attitude of the destination target by utilizing annotated data derived from the source target. One important approach in CTSD is to extract domain-invariant features to bridge the knowledge gap between multiple targets. However, the analysis of informal and short text structure, and implicit expressions, complicate the extraction of domain-invariant knowledge. In this paper, we propose a Multi-Perspective Prompt-Tuning (MPPT) model for CTSD that uses the analysis perspective as a bridge to transfer knowledge. First, we develop a two-stage instruct-based chain-of-thought method (TsCoT) to elicit target analysis perspectives and provide natural language explanations (NLEs) from multiple viewpoints by formulating instructions based on large language model (LLM). Second, we propose a multi-perspective prompt-tuning framework (MultiPLN) to fuse the NLEs into the stance predictor. Extensive experiments results demonstrate the superiority of MPPT against the state-of-the-art baseline methods.
- Published
- 2024
11. Borodin-Kostochka Conjecture holds for odd-hole-free graphs
- Author
-
Chen, Rong, Lan, Kaiyang, Lin, Xinheng, and Zhou, Yidong
- Subjects
Mathematics - Combinatorics - Abstract
The Borodin-Kostochka Conjecture states that for a graph $G$, if $\Delta(G)\geq 9$, then $\chi(G)\leq\max\{\Delta(G)-1,\omega(G)\}$. In this paper, we prove the Borodin-Kostochka Conjecture holding for odd-hole-free graphs., Comment: Submitted
- Published
- 2023
12. LSCD: A Large-Scale Screen Content Dataset for Video Compression
- Author
-
Cheng, Yuhao, Zhang, Siru, Yan, Yiqiang, Chen, Rong, and Zhang, Yun
- Subjects
Computer Science - Multimedia ,Computer Science - Computer Vision and Pattern Recognition - Abstract
Multimedia compression allows us to watch videos, see pictures and hear sounds within a limited bandwidth, which helps the flourish of the internet. During the past decades, multimedia compression has achieved great success using hand-craft features and systems. With the development of artificial intelligence and video compression, there emerges a lot of research work related to using the neural network on the video compression task to get rid of the complicated system. Not only producing the advanced algorithms, but researchers also spread the compression to different content, such as User Generated Content(UGC). With the rapid development of mobile devices, screen content videos become an important part of multimedia data. In contrast, we find community lacks a large-scale dataset for screen content video compression, which impedes the fast development of the corresponding learning-based algorithms. In order to fulfill this blank and accelerate the research of this special type of videos, we propose the Large-scale Screen Content Dataset(LSCD), which contains 714 source sequences. Meanwhile, we provide the analysis of the proposed dataset to show some features of screen content videos, which will help researchers have a better understanding of how to explore new algorithms. Besides collecting and post-processing the data to organize the dataset, we also provide a benchmark containing the performance of both traditional codec and learning-based methods.
- Published
- 2023
13. Transactional Indexes on (RDMA or CXL-based) Disaggregated Memory with Repairable Transaction
- Author
-
Wei, Xingda, Wang, Haotian, Wang, Tianxia, Chen, Rong, Gu, Jinyu, Zuo, Pengfei, and Chen, Haibo
- Subjects
Computer Science - Databases - Abstract
The failure atomic and isolated execution of clients operations is a default requirement for a system that serve multiple loosely coupled clients at a server. However, disaggregated memory breaks this requirement in remote indexes because a client operation is disaggregated to multiple remote reads/writes. Current indexes focus on performance improvements and largely ignore tolerating client failures. We argue that a practical DM index should be transactional: each index operation should be failure atomic and isolated in addition to being concurrency isolated. We present repairable transaction (rTX), a lightweight primitive to execute DM index operations. Each rTX can detect other failed rTXes on-the-fly with the help of concurrency control. Upon detection, it will repair their non-atomic updates online with the help of logging, thus hiding their failures from healthy clients. By further removing unnecessary logging and delegating concurrency control to existing carefully-tuned index algorithms, we show that transactional indexes can be built at a low performance overhead on disaggregated memory. We have refactored two state-of-the-art DM indexes, RaceHashing and Sherman (B+Tree), with rTX. Evaluations show that rTX is 1.2 to 2X faster than other alternatives, e.g., distributed transaction. Meanwhile, its overhead is up to 42% compared to non-fault-tolerant indexes.
- Published
- 2023
14. Graphs that contain a $K_{1,2,3}$ and no induced subdivision of $K_4$ are $4$-colorable
- Author
-
Chen, Rong
- Subjects
Mathematics - Combinatorics - Abstract
In 2012, L\'ev\^eque, Maffray, and Trotignon conjectured that each graph $G$ that contains no induced subdivision of $K_4$ is $4$-colorable. In this paper, we prove that this conjecture holds when $G$ contains a $K_{1,2,3}$.
- Published
- 2023
15. NPSA: Nonparametric Simulated Annealing for Global Optimization
- Author
-
Chen, Rong, Schumitzky, Alan, Kryshchenko, Alona, Otalvaro, Julian D., Yamada, Walter M., and Neely, Michael N.
- Subjects
Statistics - Methodology ,Mathematics - Optimization and Control - Abstract
In this paper we describe NPSA, the first parallel nonparametric global maximum likelihood optimization algorithm using simulated annealing (SA). Unlike the nonparametric adaptive grid search method NPAG, which is not guaranteed to find a global optimum solution, and may suffer from the curse of dimensionality, NPSA is a global optimizer and it is free from these grid related issues. We illustrate NPSA by a number of examples including a pharmacokinetics (PK) model for Voriconazole and show that NPSA may be taken as an upgrade to the current grid search based nonparametric methods., Comment: 35 pages, 11 figures, 1 table
- Published
- 2023
16. Graphs with girth $2\ell+1$ and without longer odd holes are $3$-colorable
- Author
-
Chen, Rong
- Subjects
Mathematics - Combinatorics - Abstract
For a number $\ell\geq 2$, let $\mathcal{G}_{\ell}$ denote the family of graphs which have girth $2\ell+1$ and have no odd hole with length greater than $2\ell+1$. Plummer and Zha conjectured that every 3-connected and internally 4-connected graph in $\mathcal{G}_2$ is 3-colorable. Wu, Xu, and Xu conjectured that every graph in $\bigcup_{\ell\geq2}\mathcal{G}_{\ell}$ is 3-colorable. Chudnovsky et al. and Wu et al., respectively, proved that every graph in $\mathcal{G}_2$ and $\mathcal{G}_3$ is 3-colorable. In this paper, we prove that every graph in $\bigcup_{\ell\geq5}\mathcal{G}_{\ell}$ is 3-colorable., Comment: arXiv admin note: text overlap with arXiv:2210.12376
- Published
- 2022
17. Characterizing Off-path SmartNIC for Accelerating Distributed Systems
- Author
-
Wei, Xingda, Cheng, Rongxin, Yang, Yuhan, Chen, Rong, and Chen, Haibo
- Subjects
Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
SmartNICs have recently emerged as an appealing device for accelerating distributed systems. However, there has not been a comprehensive characterization of SmartNICs, and existing designs typically only leverage a single communication path for workload offloading. This paper presents the first holistic study of a representative off-path SmartNIC, specifically the Bluefield-2, from a communication-path perspective. Our experimental study systematically explores the key performance characteristics of communication among the client, on-board SoC, and host, and offers insightful findings and advice for designers. Moreover, we propose the concurrent use of multiple communication paths of a SmartNIC and present a pioneering guideline to expose new optimization opportunities for various distributed systems. To demonstrate the effectiveness of our approach, we conducted case studies on a SmartNIC-based distributed file system (LineFS) and an RDMA-based disaggregated key-value store (DrTM-KV). Our experimental results show improvements of up to 30% and 25% for LineFS and DrTM-KV, respectively., Comment: Published at OSDI23
- Published
- 2022
18. Graphs with girth $2\ell+1$ and without longer odd holes that contain an odd $K_4$-subdivision
- Author
-
Chen, Rong and Zhou, Yidong
- Subjects
Mathematics - Combinatorics ,05C15, 05C17, 05C69 - Abstract
We say that a graph $G$ has an {\em odd $K_4$-subdivision} if some subgraph of $G$ is isomorphic to a $K_4$-subdivision and whose faces are all odd holes of $G$. For a number $\ell\geq 2$, let $\mathcal{G}_{\ell}$ denote the family of graphs which have girth $2\ell+1$ and have no odd hole with length greater than $2\ell+1$. Wu, Xu and Xu conjectured that every graph in $\bigcup_{\ell\geq2}\mathcal{G}_{\ell}$ is 3-colorable. Recently, Chudnovsky et al. and Wu et al., respectively, proved that every graph in $\mathcal{G}_2$ and $\mathcal{G}_3$ is 3-colorable. In this paper, we prove that no $4$-vertex-critical graph in $\bigcup_{\ell\geq5}\mathcal{G}_{\ell}$ has an odd $K_4$-subdivision. Using this result, Chen proved that all graphs in $\bigcup_{\ell\geq5}\mathcal{G}_{\ell}$ are 3-colorable., Comment: A figure of an odd $K_4$-subdivision was added
- Published
- 2022
19. Path Integral Quantum Monte Carlo Method for Light Nuclei
- Author
-
Chen, Rong
- Subjects
Nuclear Theory - Abstract
I describe the first continuous space nuclear path integral quantum Monte Carlo method, and calculate the ground state properties of light nuclei including Deuteron, Triton, Helium-3 and Helium-4, using both local chiral interaction up to next-to-next-to-leading-order and the Argonne $v_6'$ interaction. Compared with diffusion based quantum Monte Carlo methods such as Green's function Monte Carlo and auxiliary field diffusion Monte Carlo, path integral quantum Monte Carlo has the advantage that it can directly calculate the expectation value of operators without tradeoff, whether they commute with the Hamiltonian or not. For operators that commute with the Hamiltonian, e.g., the Hamiltonian itself, the path integral quantum Monte Carlo light-nuclei results agree with Green's function Monte Carlo and auxiliary field diffusion Monte Carlo results. For other operator expectations which are important to understand nuclear measurements but do not commute with the Hamiltonian and therefore cannot be accurately calculated by diffusion based quantum Monte Carlo methods without tradeoff, the path integral quantum Monte Carlo method gives reliable results. I show root-mean-square radii, one-particle number density distributions, and Euclidean response functions for single-nucleon couplings. I also systematically describe all the sampling algorithms used in this work, the strategies to make the computation efficient, the error estimations, and the details of the implementation of the code to perform calculations. This work can serve as a benchmark test for future calculations of larger nuclei or finite temperature nuclear matter using path integral quantum Monte Carlo., Comment: Rong Chen's 2020 PhD thesis at Arizona State University, Tempe, Arizona, USA. PhD Advisor, Professor Kevin E. Schmidt
- Published
- 2022
20. RPEM: Randomized Monte Carlo Parametric Expectation Maximization Algorithm
- Author
-
Chen, Rong, Schumitzky, Alan, Kryshchenko, Alona, Garreau, Romain, Otalvaro, Julian D., Yamada, Walter M., and Neely, Michael N.
- Subjects
Statistics - Methodology - Abstract
Inspired from quantum Monte Carlo, by using unbiased estimators all the time and sampling discrete and continuous variables at the same time using Metropolis algorithm, we present a novel, fast, and accurate high performance Monte Carlo Parametric Expectation Maximization (MCPEM) algorithm. We named it Randomized Parametric Expectation Maximization (RPEM). In particular, we compared RPEM with Monolix's SAEM and Certara's QRPEM for a realistic two-compartment Voriconazole model with ordinary differential equations (ODEs) and using simulated data. We show that RPEM is 3 to 4 times faster than SAEM and QRPEM, and more accurate than them in reconstructing the population parameters., Comment: 28 pages, 6 figures, 2 tables
- Published
- 2022
21. Path Integral Quantum Monte Carlo Calculations of Light Nuclei
- Author
-
Chen, Rong and Schmidt, Kevin E.
- Subjects
Nuclear Theory - Abstract
We describe a path-integral ground-state quantum Monte Carlo method for light nuclei in continuous space. We show how to efficiently update and sample the paths with spin-isospin dependent and spin-orbit interactions. We apply the method to the triton and alpha particle using both local chiral interactions with next-to-next-to-leading-order %(N$^2$LO) and the Argonne interactions. For operators, like the total energy, that commute with the Hamiltonian, our results agree with Green's function Monte Carlo and auxiliary field diffusion Monte Carlo calculations. For operators that do not commute with the Hamiltonian and for Euclidean response functions, the path-integral formulation allows straightforward calculation without forward walking or the increased variance typical of diffusion methods. We demonstrate this by calculating density distributions, root mean square radii, and Euclidean response functions for single-nucleon couplings., Comment: 15 pages, 9 figures, 6 tables. Accepted by Physical Review C on 2022-9-28; published on 2022-10-26
- Published
- 2022
- Full Text
- View/download PDF
22. No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless Computing
- Author
-
Wei, Xingda, Lu, Fangming, Wang, Tianxia, Gu, Jinyu, Yang, Yuhan, Chen, Rong, and Chen, Haibo
- Subjects
Computer Science - Operating Systems ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Serverless platforms essentially face a tradeoff between container startup time and provisioned concurrency (i.e., cached instances), which is further exaggerated by the frequent need for remote container initialization. This paper presents MITOSIS, an operating system primitive that provides fast remote fork, which exploits a deep codesign of the OS kernel with RDMA. By leveraging the fast remote read capability of RDMA and partial state transfer across serverless containers, MITOSIS bridges the performance gap between local and remote container initialization. MITOSIS is the first to fork over 10,000 new containers from one instance across multiple machines within a second, while allowing the new containers to efficiently transfer the pre-materialized states of the forked one. We have implemented MITOSIS on Linux and integrated it with FN, a popular serverless platform. Under load spikes in real-world serverless workloads, MITOSIS reduces the function tail latency by 89% with orders of magnitude lower memory usage. For serverless workflow that requires state transfer, MITOSIS improves its execution time by 86%., Comment: To appear in OSDI'23
- Published
- 2022
23. Improved iterative methods for solving risk parity portfolio
- Author
-
Choi, Jaehyuk and Chen, Rong
- Subjects
Quantitative Finance - Portfolio Management - Abstract
Risk parity, also known as equal risk contribution, has recently gained increasing attention as a portfolio allocation method. However, solving portfolio weights must resort to numerical methods as the analytic solution is not available. This study improves two existing iterative methods: the cyclical coordinate descent (CCD) and Newton methods. We enhance the CCD method by simplifying the formulation using a correlation matrix and imposing an additional rescaling step. We also suggest an improved initial guess inspired by the CCD method for the Newton method. Numerical experiments show that the improved CCD method performs the best and is approximately three times faster than the original CCD method, saving more than 40% of the iterations.
- Published
- 2022
- Full Text
- View/download PDF
24. KRCORE: a microsecond-scale RDMA control plane for elastic computing
- Author
-
Wei, Xingda, Lu, Fangming, Chen, Rong, and Chen, Haibo
- Subjects
Computer Science - Networking and Internet Architecture - Abstract
We present KRCORE, an RDMA library with a microsecond-scale control plane on commodity RDMA hardware for elastic computing. KRCORE can establish a full-fledged RDMA connection within 10{\mu}s (hundreds or thousands of times faster than verbs), while only maintaining a (small) fixed-sized connection metadata at each node, regardless of the cluster scale. The key ideas include virtualizing pre-initialized kernel-space RDMA connections instead of creating one from scratch, and retrofitting advanced RDMA dynamic connected transport with static transport for both low connection overhead and high networking speed. Under load spikes, KRCORE can shorten the worker bootstrap time of an existing disaggregated key-value store (namely RACE Hashing) by 83%. In serverless computing (namely Fn), KRCORE can also reduce the latency for transferring data through RDMA by 99%., Comment: To appear in USENIX ATC'2022 (https://www.usenix.org/conference/atc22/presentation/wei)
- Published
- 2021
25. CP Factor Model for Dynamic Tensors
- Author
-
Han, Yuefeng, Yang, Dan, Zhang, Cun-Hui, and Chen, Rong
- Subjects
Statistics - Methodology ,Economics - Econometrics - Abstract
Observations in various applications are frequently represented as a time series of multidimensional arrays, called tensor time series, preserving the inherent multidimensional structure. In this paper, we present a factor model approach, in a form similar to tensor CP decomposition, to the analysis of high-dimensional dynamic tensor time series. As the loading vectors are uniquely defined but not necessarily orthogonal, it is significantly different from the existing tensor factor models based on Tucker-type tensor decomposition. The model structure allows for a set of uncorrelated one-dimensional latent dynamic factor processes, making it much more convenient to study the underlying dynamics of the time series. A new high order projection estimator is proposed for such a factor model, utilizing the special structure and the idea of the higher order orthogonal iteration procedures commonly used in Tucker-type tensor factor model and general tensor CP decomposition procedures. Theoretical investigation provides statistical error bounds for the proposed methods, which shows the significant advantage of utilizing the special model structure. Simulation study is conducted to further demonstrate the finite sample properties of the estimators. Real data application is used to illustrate the model and its interpretations.
- Published
- 2021
26. Congruence modulo 4 for Andrews' integer partition with even parts below odd parts
- Author
-
Chen, Dandan and Chen, Rong
- Subjects
Mathematics - Number Theory - Abstract
We find and prove a class of congruences modulo 4 for Andrews' partition with certain ternary quadratic form. We also discuss distribution of $\overline{\mathcal{EO}}(n)$ and further prove that $\overline{\mathcal{EO}}(n)\equiv0\pmod4$ for almost all $n$. This study was inspired by similar congruences modulo 4 in the work by the second author and Garvan.
- Published
- 2021
27. A Practical Algorithm Design and Evaluation for Heterogeneous Elastic Computing with Stragglers
- Author
-
Woolsey, Nicholas, Kliewer, Joerg, Chen, Rong-Rong, and Ji, Mingyue
- Subjects
Computer Science - Information Theory ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Our extensive real measurements over Amazon EC2 show that the virtual instances often have different computing speeds even if they share the same configurations. This motivates us to study heterogeneous Coded Storage Elastic Computing (CSEC) systems where machines, with different computing speeds, join and leave the network arbitrarily over different computing steps. In CSEC systems, a Maximum Distance Separable (MDS) code is used for coded storage such that the file placement does not have to be redefined with each elastic event. Computation assignment algorithms are used to minimize the computation time given computation speeds of different machines. While previous studies of heterogeneous CSEC do not include stragglers-the slow machines during the computation, we develop a new framework in heterogeneous CSEC that introduces straggler tolerance. Based on this framework, we design a novel algorithm using our previously proposed approach for heterogeneous CSEC such that the system can handle any subset of stragglers of a specified size while minimizing the computation time. Furthermore, we establish a trade-off in computation time and straggler tolerance. Another major limitation of existing CSEC designs is the lack of practical evaluations using real applications. In this paper, we evaluate the performance of our designs on Amazon EC2 for applications of the power iteration and linear regression. Evaluation results show that the proposed heterogeneous CSEC algorithms outperform the state-of-the-art designs by more than 30%., Comment: 6 pages, 2 figures, accepted by IEEE Globecom 2021
- Published
- 2021
28. Generating Functions of the Hurwitz Class Numbers Associated with Certain Mock Theta Functions
- Author
-
Chen, Dandan and Chen, Rong
- Subjects
Mathematics - Number Theory - Abstract
We find the Hecke-Rogers type series representations of generating functions of the Hurwitz class numbers which is very close to certain mock theta functions. We also prove two combinatorial interpretation of Hurwitz class numbers appeared on OEIS.
- Published
- 2021
29. The $9$-connected Excluded Minors for the Class of Quasi-graphic Matroids
- Author
-
Chen, Rong
- Subjects
Mathematics - Combinatorics - Abstract
The class of quasi-graphic matroids, recently introduced by Geelen, Gerards, and Whittle, is minor closed and contains both the class of lifted-graphic matroids and the class of frame matroids, each of which generalises the class of graphic matroids. In this paper, we prove that the matroids $U_{3,7}$ and $U_{4,7}$ are the only $9$-connected excluded minors for the class of quasi-graphic matroids.
- Published
- 2021
30. Generalized Autoregressive Moving Average Models with GARCH Errors
- Author
-
Zheng, Tingguo, Xiao, Han, and Chen, Rong
- Subjects
Statistics - Methodology ,Economics - Econometrics - Abstract
One of the important and widely used classes of models for non-Gaussian time series is the generalized autoregressive model average models (GARMA), which specifies an ARMA structure for the conditional mean process of the underlying time series. However, in many applications one often encounters conditional heteroskedasticity. In this paper we propose a new class of models, referred to as GARMA-GARCH models, that jointly specify both the conditional mean and conditional variance processes of a general non-Gaussian time series. Under the general modeling framework, we propose three specific models, as examples, for proportional time series, nonnegative time series, and skewed and heavy-tailed financial time series. Maximum likelihood estimator (MLE) and quasi Gaussian MLE (GMLE) are used to estimate the parameters. Simulation studies and three applications are used to demonstrate the properties of the models and the estimation procedures.
- Published
- 2021
31. Evidence for nesting driven charge density waves instabilities in a quasi-two-dimensional material: LaAgSb2
- Author
-
Bosak, Alexeï, Souliou, Sofia-Michaela, Faugeras, Clément, Heid, Rolf, Molas, Maciej R., Chen, Rong-Yan, Wang, Nan-Lin, Potemski, Marek, and Tacon, Matthieu Le
- Subjects
Condensed Matter - Strongly Correlated Electrons - Abstract
Since their theoretical prediction by Peierls in the 30s, charge density waves (CDW) have been one of the most commonly encountered electronic phases in low dimensional metallic systems. The instability mechanism originally proposed combines Fermi surface nesting and electron-phonon coupling but is, strictly speaking, only valid in one dimension. In higher dimensions, its relevance is questionable as sharp maxima in the static electronic susceptibility \chi(q) are smeared out, and are, in many cases, unable to account for the periodicity of the observed charge modulations. Here, we investigate the quasi twodimensional LaAgSb2, which exhibits two CDW transitions, by a combination of diffuse xray scattering, inelastic x-ray scattering and ab initio calculations. We demonstrate that the CDW formation is driven by phonons softening. The corresponding Kohn anomalies are visualized in 3D through the momentum distribution of the x-ray diffuse scattering intensity. We show that they can be quantitatively accounted for by considering the electronic susceptibility calculated from a Dirac-like band, weighted by anisotropic electron-phonon coupling. This remarkable agreement sheds new light on the importance of Fermi surface nesting in CDW formation., Comment: 20 pages, 9 figures, submitted to Phys. Rev. Rep
- Published
- 2021
- Full Text
- View/download PDF
32. Independent Reinforcement Learning for Weakly Cooperative Multiagent Traffic Control Problem
- Author
-
Zhang, Chengwei, Jin, Shan, Xue, Wanli, Xie, Xiaofei, Chen, Shengyong, and Chen, Rong
- Subjects
Computer Science - Machine Learning - Abstract
The adaptive traffic signal control (ATSC) problem can be modeled as a multiagent cooperative game among urban intersections, where intersections cooperate to optimize their common goal. Recently, reinforcement learning (RL) has achieved marked successes in managing sequential decision making problems, which motivates us to apply RL in the ASTC problem. Here we use independent reinforcement learning (IRL) to solve a complex traffic cooperative control problem in this study. One of the largest challenges of this problem is that the observation information of intersection is typically partially observable, which limits the learning performance of IRL algorithms. To this, we model the traffic control problem as a partially observable weak cooperative traffic model (PO-WCTM) to optimize the overall traffic situation of a group of intersections. Different from a traditional IRL task that averages the returns of all agents in fully cooperative games, the learning goal of each intersection in PO-WCTM is to reduce the cooperative difficulty of learning, which is also consistent with the traffic environment hypothesis. We also propose an IRL algorithm called Cooperative Important Lenient Double DQN (CIL-DDQN), which extends Double DQN (DDQN) algorithm using two mechanisms: the forgetful experience mechanism and the lenient weight training mechanism. The former mechanism decreases the importance of experiences stored in the experience reply buffer, which deals with the problem of experience failure caused by the strategy change of other agents. The latter mechanism increases the weight experiences with high estimation and `leniently' trains the DDQN neural network, which improves the probability of the selection of cooperative joint strategies. Experimental results show that CIL-DDQN outperforms other methods in almost all performance indicators of the traffic control problem.
- Published
- 2021
33. Simultaneous Decorrelation of Matrix Time Series
- Author
-
Han, Yuefeng, Chen, Rong, Zhang, Cun-Hui, and Yao, Qiwei
- Subjects
Statistics - Methodology ,Economics - Econometrics - Abstract
We propose a contemporaneous bilinear transformation for a $p\times q$ matrix time series to alleviate the difficulties in modeling and forecasting matrix time series when $p$ and/or $q$ are large. The resulting transformed matrix assumes a block structure consisting of several small matrices, and those small matrix series are uncorrelated across all times. Hence an overall parsimonious model is achieved by modelling each of those small matrix series separately without the loss of information on the linear dynamics. Such a parsimonious model often has better forecasting performance, even when the underlying true dynamics deviates from the assumed uncorrelated block structure after transformation. The uniform convergence rates of the estimated transformation are derived, which vindicate an important virtue of the proposed bilinear transformation, i.e. it is technically equivalent to the decorrelation of a vector time series of dimension max$(p,q)$ instead of $p\times q$. The proposed method is illustrated numerically via both simulated and real data examples.
- Published
- 2021
34. Congruences modulo powers of 5 for the rank parity function
- Author
-
Chen, Dandan, Chen, Rong, and Garvan, Frank
- Subjects
Mathematics - Number Theory ,Mathematics - Combinatorics ,05A17, 11F30, 11F37, 11P82, 11P83 - Abstract
It is well known that Ramanujan conjectured congruences modulo powers of 5, 7 and and 11 for the partition function. These were subsequently proved by Watson (1938) and Atkin (1967). In 2009 Choi, Kang, and Lovejoy proved congruences modulo powers of 5 for the crank parity function. The generating function for rank parity function is f(q), which is the first example of a mock theta function that Ramanujan mentioned in his last letter to Hardy. We prove congruences modulo powers of 5 for the rank parity function., Comment: 25 pages
- Published
- 2021
35. Rank Determination in Tensor Factor Model
- Author
-
Han, Yuefeng, Chen, Rong, and Zhang, Cun-Hui
- Subjects
Statistics - Methodology ,Economics - Econometrics ,Primary 62H25, 62H12, secondary 62F07 - Abstract
Factor model is an appealing and effective analytic tool for high-dimensional time series, with a wide range of applications in economics, finance and statistics. This paper develops two criteria for the determination of the number of factors for tensor factor models where the signal part of an observed tensor time series assumes a Tucker decomposition with the core tensor as the factor tensor. The task is to determine the dimensions of the core tensor. One of the proposed criteria is similar to information based criteria of model selection, and the other is an extension of the approaches based on the ratios of consecutive eigenvalues often used in factor analysis for panel time series. Theoretically results, including sufficient conditions and convergence rates, are established. The results include the vector factor models as special cases, with an additional convergence rates. Simulation studies provide promising finite sample performance for the two criteria.
- Published
- 2020
- Full Text
- View/download PDF
36. Distributed Learning with Low Communication Cost via Gradient Boosting Untrained Neural Network
- Author
-
Zhang, Xiatian, He, Xunshi, Wang, Nan, and Chen, Rong
- Subjects
Computer Science - Machine Learning ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
For high-dimensional data, there are huge communication costs for distributed GBDT because the communication volume of GBDT is related to the number of features. To overcome this problem, we propose a novel gradient boosting algorithm, the Gradient Boosting Untrained Neural Network(GBUN). GBUN ensembles the untrained randomly generated neural network that softly distributes data samples to multiple neuron outputs and dramatically reduces the communication costs for distributed learning. To avoid creating huge neural networks for high-dimensional data, we extend Simhash algorithm to mimic forward calculation of the neural network. Our experiments on multiple public datasets show that GBUN is as good as conventional GBDT in terms of prediction accuracy and much better than it in scaling property for distributed learning. Comparing to conventional GBDT varieties, GBUN speeds up the training process up to 13 times on the cluster with 64 machines, and up to 4614 times on the cluster with 100KB/s network bandwidth. Therefore, GBUN is not only an efficient distributed learning algorithm but also has great potentials for federated learning.
- Published
- 2020
37. A proof of the mod $4$ unimodal sequence conjectures and related mock theta functions
- Author
-
Chen, Rong and Garvan, Frank
- Subjects
Mathematics - Number Theory ,Mathematics - Combinatorics ,05A17, 05A30, 11E41, 11P84 (Primary), 11F33, 11P81, 11P83, 33D15 (Secondary) - Abstract
In 2012 Bryson, Ono, Pitman and Rhoades showed how the generating functions for certain strongly unimodal sequences are related to quantum modular and mock modular forms. They proved some parity results and conjectured some mod 4 congruences for the coefficients of these generating functions. In 2016 Kim, Lim and Lovejoy obtained similar results for odd-balanced unimodal sequences and made similar mod 4 conjectures. We prove all of these mod 4 conjectures and similar congruences for the Andrews spt-function and related mock theta functions. Our method of proof involves new Hecke-Rogers type identities for indefinite binary quadratic forms and the Hurwitz class number., Comment: 30 pages
- Published
- 2020
38. Demystifying Why Local Aggregation Helps: Convergence Analysis of Hierarchical SGD
- Author
-
Wang, Jiayi, Wang, Shiqiang, Chen, Rong-Rong, and Ji, Mingyue
- Subjects
Computer Science - Machine Learning ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Information Theory ,Mathematics - Optimization and Control - Abstract
Hierarchical SGD (H-SGD) has emerged as a new distributed SGD algorithm for multi-level communication networks. In H-SGD, before each global aggregation, workers send their updated local models to local servers for aggregations. Despite recent research efforts, the effect of local aggregation on global convergence still lacks theoretical understanding. In this work, we first introduce a new notion of "upward" and "downward" divergences. We then use it to conduct a novel analysis to obtain a worst-case convergence upper bound for two-level H-SGD with non-IID data, non-convex objective function, and stochastic gradient. By extending this result to the case with random grouping, we observe that this convergence upper bound of H-SGD is between the upper bounds of two single-level local SGD settings, with the number of local iterations equal to the local and global update periods in H-SGD, respectively. We refer to this as the "sandwich behavior". Furthermore, we extend our analytical approach based on "upward" and "downward" divergences to study the convergence for the general case of H-SGD with more than two levels, where the "sandwich behavior" still holds. Our theoretical results provide key insights of why local aggregation can be beneficial in improving the convergence of H-SGD., Comment: 36 pages, in AAAI 2022
- Published
- 2020
39. Evolutionary dynamics in financial markets with heterogeneities in strategies and risk tolerance
- Author
-
Xu, Wen-Juan, Zhong, Chen-Yang, Ren, Fei, Qiu, Tian, Chen, Rong-Da, He, Yun-Xin, and Zhong, Li-Xin
- Subjects
Quantitative Finance - General Finance ,Nonlinear Sciences - Adaptation and Self-Organizing Systems ,Physics - Physics and Society - Abstract
In nature and human societies, the effects of homogeneous and heterogeneous characteristics on the evolution of collective behaviors are quite different from each other. It is of great importance to understand the underlying mechanisms of the occurrence of such differences. By incorporating pair pattern strategies and reference point strategies into an agent-based model, we have investigated the coupled effects of heterogeneous investment strategies and heterogeneous risk tolerance on price fluctuations. In the market flooded with the investors with homogeneous investment strategies or homogeneous risk tolerance, large price fluctuations are easy to occur. In the market flooded with the investors with heterogeneous investment strategies or heterogeneous risk tolerance, the price fluctuations are suppressed. For a heterogeneous population, the coexistence of investors with pair pattern strategies and reference point strategies causes the price to have a slow fluctuation around a typical equilibrium point and both a large price fluctuation and a no-trading state are avoided, in which the pair pattern strategies push the system far away from the equilibrium while the reference point strategies pull the system back to the equilibrium. A theoretical analysis indicates that the evolutionary dynamics in the present model is governed by the competition between different strategies. The strategy that causes large price fluctuations loses more while the strategy that pulls the system back to the equilibrium gains more. Overfrequent trading does harm to one's pursuit for more wealth.
- Published
- 2020
40. Coupled effects of epidemic information and risk awareness on contagion
- Author
-
Xu, Wen-Juan, Zhong, Chen-Yang, Ye, Hui-Fen, Chen, Rong-Da, Qiu, Tian, Ren, Fei, and Zhong, Li-Xin
- Subjects
Physics - Physics and Society ,Quantitative Biology - Populations and Evolution - Abstract
By incorporating delayed epidemic information and self-restricted travel behavior into the SIS model, we have investigated the coupled effects of timely and accurate epidemic information and people's sensitivity to the epidemic information on contagion. In the population with only local random movement, whether the epidemic information is delayed or not has no effect on the spread of the epidemic. People's high sensitivity to the epidemic information leads to their risk aversion behavior and the spread of the epidemic is suppressed. In the population with only global person-to-person movement, timely and accurate epidemic information helps an individual cut off the connections with the infected in time and the epidemic is brought under control in no time. A delay in the epidemic information leads to an individual's misjudgment of who has been infected and who has not, which in turn leads to rapid progress and a higher peak of the epidemic. In the population with coexistence of local and global movement, timely and accurate epidemic information and people's high sensitivity to the epidemic information play an important role in curbing the epidemic. A theoretical analysis indicates that people's misjudgment caused by the delayed epidemic information leads to a higher encounter probability between the susceptible and the infected and people's self-restricted travel behavior helps reduce such an encounter probability. A functional relation between the ratio of infected individuals and the susceptible-infected encounter probability has been found.
- Published
- 2020
41. FLCD: A Flexible Low Complexity Design of Coded Distributed Computing
- Author
-
Woolsey, Nicholas, Wang, Xingyue, Chen, Rong-Rong, and Ji, Mingyue
- Subjects
Computer Science - Information Theory ,Computer Science - Distributed, Parallel, and Cluster Computing ,Computer Science - Performance - Abstract
We propose a flexible low complexity design (FLCD) of coded distributed computing (CDC) with empirical evaluation on Amazon Elastic Compute Cloud (Amazon EC2). CDC can expedite MapReduce like computation by trading increased map computations to reduce communication load and shuffle time. A main novelty of FLCD is to utilize the design freedom in defining map and reduce functions to develop asymptotic homogeneous systems to support varying intermediate values (IV) sizes under a general MapReduce framework. Compared to existing designs with constant IV sizes, FLCD offers greater flexibility in adapting to network parameters and significantly reduces the implementation complexity by requiring fewer input files and shuffle groups. The FLCD scheme is the first proposed low-complexity CDC design that can operate on a network with an arbitrary number of nodes and computation load. We perform empirical evaluations of the FLCD by executing the TeraSort algorithm on an Amazon EC2 cluster. This is the first time that theoretical predictions of the CDC shuffle time are validated by empirical evaluations. The evaluations demonstrate a 2.0 to 4.24x speedup compared to conventional uncoded MapReduce, a 12% to 52% reduction in total time, and a wider range of operating network parameters compared to existing CDC schemes., Comment: 13 pages, 4 figures
- Published
- 2020
42. Coded Elastic Computing on Machines with Heterogeneous Storage and Computation Speed
- Author
-
Woolsey, Nicholas, Chen, Rong-Rong, and Ji, Mingyue
- Subjects
Computer Science - Information Theory ,Computer Science - Distributed, Parallel, and Cluster Computing ,Mathematics - Optimization and Control - Abstract
We study the optimal design of heterogeneous Coded Elastic Computing (CEC) where machines have varying computation speeds and storage. CEC introduced by Yang et al. in 2018 is a framework that mitigates the impact of elastic events, where machines can join and leave at arbitrary times. In CEC, data is distributed among machines using a Maximum Distance Separable (MDS) code such that subsets of machines can perform the desired computations. However, state-of-the-art CEC designs only operate on homogeneous networks where machines have the same speeds and storage. This may not be practical. In this work, based on an MDS storage assignment, we develop a novel computation assignment approach for heterogeneous CEC networks to minimize the overall computation time. We first consider the scenario where machines have heterogeneous computing speeds but same storage and then the scenario where both heterogeneities are present. We propose a novel combinatorial optimization formulation and solve it exactly by decomposing it into a convex optimization problem for finding the optimal computation load and a "filling problem" for finding the exact computation assignment. A low-complexity "filling algorithm" is adapted and can be completed within a number of iterations equals at most the number of available machines., Comment: 30 pages, 4 figures
- Published
- 2020
43. A Combinatorial Design for Cascaded Coded Distributed Computing on General Networks
- Author
-
Woolsey, Nicholas, Chen, Rong-Rong, and Ji, Mingyue
- Subjects
Computer Science - Information Theory ,Computer Science - Computational Complexity ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Coding theoretic approached have been developed to significantly reduce the communication load in modern distributed computing system. In particular, coded distributed computing (CDC) introduced by Li et al. can efficiently trade computation resources to reduce the communication load in MapReduce like computing systems. For the more general cascaded CDC, Map computations are repeated at r nodes to significantly reduce the communication load among nodes tasked with computing Q Reduce functions s times. In this paper, we propose a novel low-complexity combinatorial design for cascaded CDC which 1) determines both input file and output function assignments, 2) requires significantly less number of input files and output functions, and 3) operates on heterogeneous networks where nodes have varying storage and computing capabilities. We provide an analytical characterization of the computation-communication tradeoff, from which we show the proposed scheme can outperform the state-of-the-art scheme proposed by Li et al. for the homogeneous networks. Further, when the network is heterogeneous, we show that the performance of the proposed scheme can be better than its homogeneous counterpart. In addition, the proposed scheme is optimal within a constant factor of the information theoretic converse bound while fixing the input file and the output function assignments., Comment: 30 pages, 6 figures
- Published
- 2020
44. A New Combinatorial Coded Design for Heterogeneous Distributed Computing
- Author
-
Woolsey, Nicholas, Chen, Rong-Rong, and Ji, Mingyue
- Subjects
Computer Science - Information Theory ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
Coded Distributed Computing (CDC) introduced by Li et al. in 2015 offers an efficient approach to trade computing power to reduce the communication load in general distributed computing frameworks such as MapReduce and Spark. In particular, increasing the computation load in the Map phase by a factor of r can create coded multicasting opportunities to reduce the communication load in the Shuffle phase by the same factor. However, the CDC scheme is designed for the homogeneous settings, where the storage, computation load and communication load on the computing nodes are the same. In addition, it requires an exponentially large number of input files (data batches), reduce functions and multicasting groups relative to the number of nodes to achieve the promised gain. We address the CDC limitations by proposing a novel CDC approach based on a combinatorial design, which accommodates heterogeneous networks where nodes have varying storage and computing capabilities. In addition, the proposed approach requires an exponentially less number of input files compared to the original CDC scheme proposed by Li et al. Meanwhile, the resulting computation-communication trade-off maintains the multiplicative gain compared to conventional uncoded unicast and asymptotically achieves the optimal performance proposed by Li et al., Comment: 30 pages, 5 pages
- Published
- 2020
45. Tensor Factor Model Estimation by Iterative Projection
- Author
-
Han, Yuefeng, Chen, Rong, Yang, Dan, and Zhang, Cun-Hui
- Subjects
Statistics - Methodology ,Economics - Econometrics ,Mathematics - Statistics Theory ,Primary 62H25, 62H12, secondary 62R07 - Abstract
Tensor time series, which is a time series consisting of tensorial observations, has become ubiquitous. It typically exhibits high dimensionality. One approach for dimension reduction is to use a factor model structure, in a form similar to Tucker tensor decomposition, except that the time dimension is treated as a dynamic process with a time dependent structure. In this paper we introduce two approaches to estimate such a tensor factor model by using iterative orthogonal projections of the original tensor time series. These approaches extend the existing estimation procedures and improve the estimation accuracy and convergence rate significantly as proven in our theoretical investigation. Our algorithms are similar to the higher order orthogonal projection method for tensor decomposition, but with significant differences due to the need to unfold tensors in the iterations and the use of autocorrelation. Consequently, our analysis is significantly different from the existing ones. Computational and statistical lower bounds are derived to prove the optimality of the sample size requirement and convergence rate for the proposed methods. Simulation study is conducted to further illustrate the statistical properties of these estimators.
- Published
- 2020
46. Multi-Task Learning Enhanced Single Image De-Raining
- Author
-
Fan, Yulong, Chen, Rong, and Li, Bo
- Subjects
Computer Science - Computer Vision and Pattern Recognition ,Electrical Engineering and Systems Science - Image and Video Processing - Abstract
Rain removal in images is an important task in computer vision filed and attracting attentions of more and more people. In this paper, we address a non-trivial issue of removing visual effect of rain streak from a single image. Differing from existing work, our method combines various semantic constraint task in a proposed multi-task regression model for rain removal. These tasks reinforce the model's capabilities from the content, edge-aware, and local texture similarity respectively. To further improve the performance of multi-task learning, we also present two simple but powerful dynamic weighting algorithms. The proposed multi-task enhanced network (MENET) is a powerful convolutional neural network based on U-Net for rain removal research, with a specific focus on utilize multiple tasks constraints and exploit the synergy among them to facilitate the model's rain removal capacity. It is noteworthy that the adaptive weighting scheme has further resulted in improved network capability. We conduct several experiments on synthetic and real rain images, and achieve superior rain removal performance over several selected state-of-the-art (SOTA) approaches. The overall effect of our method is impressive, even in the decomposition of heavy rain and rain streak accumulation.The source code and some results can be found at:https://github.com/SumiHui/MENET., Comment: 15 pages, 16 figures
- Published
- 2020
47. Modeling Multivariate Spatial-Temporal Data with Latent Low-Dimensional Dynamics
- Author
-
Chen, Elynn Y., Yun, Xin, Chen, Rong, and Yao, Qiwei
- Subjects
Statistics - Methodology - Abstract
High-dimensional multivariate spatial-temporal data arise frequently in a wide range of applications; however, there are relatively few statistical methods that can simultaneously deal with spatial, temporal and variable-wise dependencies in large data sets. In this paper, we propose a new approach to utilize the correlations in variable, space and time to achieve dimension reduction and to facilitate spatial/temporal predictions in the high-dimensional settings. The multivariate spatial-temporal process is represented as a linear transformation of a lower-dimensional latent factor process. The spatial dependence structure of the factor process is further represented non-parametrically in terms of latent empirical orthogonal functions. The low-dimensional structure is completely unknown in our setting and is learned entirely from data collected irregularly over space but regularly over time. We propose innovative estimation and prediction methods based on the latent low-rank structures. Asymptotic properties of the estimators and predictors are established. Extensive experiments on synthetic and real data sets show that, while the dimensions are reduced significantly, the spatial, temporal and variable-wise covariance structures are largely preserved. The efficacy of our method is further confirmed by the prediction performances on both synthetic and real data sets., Comment: arXiv admin note: text overlap with arXiv:1710.06351
- Published
- 2020
48. Heterogeneous Computation Assignments in Coded Elastic Computing
- Author
-
Woolsey, Nicholas, Chen, Rong-Rong, and Ji, Mingyue
- Subjects
Computer Science - Information Theory ,Computer Science - Distributed, Parallel, and Cluster Computing - Abstract
We study the optimal design of a heterogeneous coded elastic computing (CEC) network where machines have varying relative computation speeds. CEC introduced by Yang {\it et al.} is a framework which mitigates the impact of elastic events, where machines join and leave the network. A set of data is distributed among storage constrained machines using a Maximum Distance Separable (MDS) code such that any subset of machines of a specific size can perform the desired computations. This design eliminates the need to re-distribute the data after each elastic event. In this work, we develop a process for an arbitrary heterogeneous computing network to minimize the overall computation time by defining an optimal computation load, or number of computations assigned to each machine. We then present an algorithm to define a specific computation assignment among the machines that makes use of the MDS code and meets the optimal computation load., Comment: Submitted to ISIT 2020
- Published
- 2020
49. Hybrid Kronecker Product Decomposition and Approximation
- Author
-
Cai, Chencheng, Chen, Rong, and Xiao, Han
- Subjects
Statistics - Methodology ,Statistics - Machine Learning - Abstract
Discovering the underlying low dimensional structure of high dimensional data has attracted a significant amount of researches recently and has shown to have a wide range of applications. As an effective dimension reduction tool, singular value decomposition is often used to analyze high dimensional matrices, which are traditionally assumed to have a low rank matrix approximation. In this paper, we propose a new approach. We assume a high dimensional matrix can be approximated by a sum of a small number of Kronecker products of matrices with potentially different configurations, named as a hybird Kronecker outer Product Approximation (hKoPA). It provides an extremely flexible way of dimension reduction compared to the low-rank matrix approximation. Challenges arise in estimating a hKoPA when the configurations of component Kronecker products are different or unknown. We propose an estimation procedure when the set of configurations are given and a joint configuration determination and component estimation procedure when the configurations are unknown. Specifically, a least squares backfitting algorithm is used when the configuration is given. When the configuration is unknown, an iterative greedy algorithm is used. Both simulation and real image examples show that the proposed algorithms have promising performances. The hybrid Kronecker product approximation may have potentially wider applications in low dimensional representation of high dimensional data
- Published
- 2019
50. KoPA: Automated Kronecker Product Approximation
- Author
-
Cai, Chencheng, Chen, Rong, and Xiao, Han
- Subjects
Mathematics - Statistics Theory ,Statistics - Methodology ,Statistics - Machine Learning - Abstract
We consider the problem of matrix approximation and denoising induced by the Kronecker product decomposition. Specifically, we propose to approximate a given matrix by the sum of a few Kronecker products of matrices, which we refer to as the Kronecker product approximation (KoPA). Because the Kronecker product is an extension of the outer product from vectors to matrices, KoPA extends the low rank matrix approximation, and includes it as a special case. Comparing with the latter, KoPA also offers a greater flexibility, since it allows the user to choose the configuration, which are the dimensions of the two smaller matrices forming the Kronecker product. On the other hand, the configuration to be used is usually unknown, and needs to be determined from the data in order to achieve the optimal balance between accuracy and parsimony. We propose to use extended information criteria to select the configuration. Under the paradigm of high dimensional analysis, we show that the proposed procedure is able to select the true configuration with probability tending to one, under suitable conditions on the signal-to-noise ratio. We demonstrate the superiority of KoPA over the low rank approximations through numerical studies, and several benchmark image examples.
- Published
- 2019
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.