Author: "Pan, Gang" / Topic: computer science - machine learning - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Pan, Gang"' showing total 23 results

Start Over Author "Pan, Gang" Topic computer science - machine learning

23 results on '"Pan, Gang"'

1. Toward Large-scale Spiking Neural Networks: A Comprehensive Survey and Future Directions

Author: Hu, Yangfan, Zheng, Qian, Li, Guoqi, Tang, Huajin, and Pan, Gang
Subjects: Computer Science - Machine Learning
Abstract: Deep learning has revolutionized artificial intelligence (AI), achieving remarkable progress in fields such as computer vision, speech recognition, and natural language processing. Moreover, the recent success of large language models (LLMs) has fueled a surge in research on large-scale neural networks. However, the escalating demand for computing resources and energy consumption has prompted the search for energy-efficient alternatives. Inspired by the human brain, spiking neural networks (SNNs) promise energy-efficient computation with event-driven spikes. To provide future directions toward building energy-efficient large SNN models, we present a survey of existing methods for developing deep spiking neural networks, with a focus on emerging Spiking Transformers. Our main contributions are as follows: (1) an overview of learning methods for deep spiking neural networks, categorized by ANN-to-SNN conversion and direct training with surrogate gradients; (2) an overview of network architectures for deep spiking neural networks, categorized by deep convolutional neural networks (DCNNs) and Transformer architecture; and (3) a comprehensive comparison of state-of-the-art deep SNNs with a focus on emerging Spiking Transformers. We then further discuss and outline future directions toward large-scale SNNs.
Published: 2024

2. Resisting Stochastic Risks in Diffusion Planners with the Trajectory Aggregation Tree

Author: Feng, Lang, Gu, Pengjie, An, Bo, and Pan, Gang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Diffusion planners have shown promise in handling long-horizon and sparse-reward tasks due to the non-autoregressive plan generation. However, their inherent stochastic risk of generating infeasible trajectories presents significant challenges to their reliability and stability. We introduce a novel approach, the Trajectory Aggregation Tree (TAT), to address this issue in diffusion planners. Compared to prior methods that rely solely on raw trajectory predictions, TAT aggregates information from both historical and current trajectories, forming a dynamic tree-like structure. Each trajectory is conceptualized as a branch and individual states as nodes. As the structure evolves with the integration of new trajectories, unreliable states are marginalized, and the most impactful nodes are prioritized for decision-making. TAT can be deployed without modifying the original training and sampling pipelines of diffusion planners, making it a training-free, ready-to-deploy solution. We provide both theoretical analysis and empirical evidence to support TAT's effectiveness. Our results highlight its remarkable ability to resist the risk from unreliable trajectories, guarantee the performance boosting of diffusion planners in $100\%$ of tasks, and exhibit an appreciable tolerance margin for sample quality, thereby enabling planning with a more than $3\times$ acceleration., Comment: ICML 2024 (Spotlight)
Published: 2024

3. Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline

Author: Meng, Wenjia, Zheng, Qian, Yang, Long, Yin, Yilong, and Pan, Gang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Policy-based methods have achieved remarkable success in solving challenging reinforcement learning problems. Among these methods, off-policy policy gradient methods are particularly important due to that they can benefit from off-policy data. However, these methods suffer from the high variance of the off-policy policy gradient (OPPG) estimator, which results in poor sample efficiency during training. In this paper, we propose an off-policy policy gradient method with the optimal action-dependent baseline (Off-OAB) to mitigate this variance issue. Specifically, this baseline maintains the OPPG estimator's unbiasedness while theoretically minimizing its variance. To enhance practical computational efficiency, we design an approximated version of this optimal baseline. Utilizing this approximation, our method (Off-OAB) aims to decrease the OPPG estimator's variance during policy optimization. We evaluate the proposed Off-OAB method on six representative tasks from OpenAI Gym and MuJoCo, where it demonstrably surpasses state-of-the-art methods on the majority of these tasks., Comment: 12 pages, 3 figures
Published: 2024

4. Generalizable Sleep Staging via Multi-Level Domain Alignment

Author: Wang, Jiquan, Zhao, Sha, Jiang, Haiteng, Li, Shijian, Li, Tao, and Pan, Gang
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Machine Learning
Abstract: Automatic sleep staging is essential for sleep assessment and disorder diagnosis. Most existing methods depend on one specific dataset and are limited to be generalized to other unseen datasets, for which the training data and testing data are from the same dataset. In this paper, we introduce domain generalization into automatic sleep staging and propose the task of generalizable sleep staging which aims to improve the model generalization ability to unseen datasets. Inspired by existing domain generalization methods, we adopt the feature alignment idea and propose a framework called SleepDG to solve it. Considering both of local salient features and sequential features are important for sleep staging, we propose a Multi-level Feature Alignment combining epoch-level and sequence-level feature alignment to learn domain-invariant feature representations. Specifically, we design an Epoch-level Feature Alignment to align the feature distribution of each single sleep epoch among different domains, and a Sequence-level Feature Alignment to minimize the discrepancy of sequential features among different domains. SleepDG is validated on five public datasets, achieving the state-of-the-art performance., Comment: Accepted by the Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-24)
Published: 2023

5. FP3O: Enabling Proximal Policy Optimization in Multi-Agent Cooperation with Parameter-Sharing Versatility

Author: Feng, Lang, Xing, Dong, Zhang, Junru, and Pan, Gang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Multiagent Systems
Abstract: Existing multi-agent PPO algorithms lack compatibility with different types of parameter sharing when extending the theoretical guarantee of PPO to cooperative multi-agent reinforcement learning (MARL). In this paper, we propose a novel and versatile multi-agent PPO algorithm for cooperative MARL to overcome this limitation. Our approach is achieved upon the proposed full-pipeline paradigm, which establishes multiple parallel optimization pipelines by employing various equivalent decompositions of the advantage function. This procedure successfully formulates the interconnections among agents in a more general manner, i.e., the interconnections among pipelines, making it compatible with diverse types of parameter sharing. We provide a solid theoretical foundation for policy improvement and subsequently develop a practical algorithm called Full-Pipeline PPO (FP3O) by several approximations. Empirical evaluations on Multi-Agent MuJoCo and StarCraftII tasks demonstrate that FP3O outperforms other strong baselines and exhibits remarkable versatility across various parameter-sharing configurations.
Published: 2023

6. Mitigating Communication Costs in Neural Networks: The Role of Dendritic Nonlinearity

Author: Wu, Xundong, Zhao, Pengfei, Yu, Zilin, Ma, Lei, Yip, Ka-Wa, Tang, Huajin, Pan, Gang, and Huang, Tiejun
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Machine Learning, Quantitative Biology - Neurons and Cognition
Abstract: Our comprehension of biological neuronal networks has profoundly influenced the evolution of artificial neural networks (ANNs). However, the neurons employed in ANNs exhibit remarkable deviations from their biological analogs, mainly due to the absence of complex dendritic trees encompassing local nonlinearity. Despite such disparities, previous investigations have demonstrated that point neurons can functionally substitute dendritic neurons in executing computational tasks. In this study, we scrutinized the importance of nonlinear dendrites within neural networks. By employing machine-learning methodologies, we assessed the impact of dendritic structure nonlinearity on neural network performance. Our findings reveal that integrating dendritic structures can substantially enhance model capacity and performance while keeping signal communication costs effectively restrained. This investigation offers pivotal insights that hold considerable implications for the development of future neural network accelerators.
Published: 2023

7. Constrained Update Projection Approach to Safe Policy Optimization

Author: Yang, Long, Ji, Jiaming, Dai, Juntao, Zhang, Linrui, Zhou, Binbin, Li, Pengfei, Yang, Yaodong, and Pan, Gang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a novel policy optimization method based on Constrained Update Projection framework that enjoys rigorous safety guarantee. Central to our CUP development is the newly proposed surrogate functions along with the performance bound. Compared to previous safe RL methods, CUP enjoys the benefits of 1) CUP generalizes the surrogate functions to generalized advantage estimator (GAE), leading to strong empirical performance. 2) CUP unifies performance bounds, providing a better understanding and interpretability for some existing algorithms; 3) CUP provides a non-convex implementation via only first-order optimizers, which does not require any strong approximation on the convexity of the objectives. To validate our CUP method, we compared CUP against a comprehensive list of safe RL baselines on a wide range of tasks. Experiments show the effectiveness of CUP both in terms of reward and safety constraint satisfaction. We have opened the source code of CUP at this link https://github.com/zmsn-2077/ CUP-safe-rl., Comment: Accepted by NeurIPS2022. arXiv admin note: substantial text overlap with arXiv:2202.07565
Published: 2022

8. TinyLight: Adaptive Traffic Signal Control on Devices with Extremely Limited Resources

Author: Xing, Dong, Zheng, Qian, Liu, Qianhui, and Pan, Gang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Recent advances in deep reinforcement learning (DRL) have largely promoted the performance of adaptive traffic signal control (ATSC). Nevertheless, regarding the implementation, most works are cumbersome in terms of storage and computation. This hinders their deployment on scenarios where resources are limited. In this work, we propose TinyLight, the first DRL-based ATSC model that is designed for devices with extremely limited resources. TinyLight first constructs a super-graph to associate a rich set of candidate features with a group of light-weighted network blocks. Then, to diminish the model's resource consumption, we ablate edges in the super-graph automatically with a novel entropy-minimized objective function. This enables TinyLight to work on a standalone microcontroller with merely 2KB RAM and 32KB ROM. We evaluate TinyLight on multiple road networks with real-world traffic demands. Experiments show that even with extremely limited resources, TinyLight still achieves competitive performance. The source code and appendix of this work can be found at \url{https://bit.ly/38hH8t8}., Comment: Accepted by IJCAI 2022 (Long Oral)
Published: 2022

9. Dynamic Ensemble Bayesian Filter for Robust Control of a Human Brain-machine Interface

Author: Qi, Yu, Zhu, Xinyun, Xu, Kedi, Ren, Feixiao, Jiang, Hongjie, Zhu, Junming, Zhang, Jianmin, Pan, Gang, and Wang, Yueming
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Signal Processing, Quantitative Biology - Neurons and Cognition
Abstract: Objective: Brain-machine interfaces (BMIs) aim to provide direct brain control of devices such as prostheses and computer cursors, which have demonstrated great potential for mobility restoration. One major limitation of current BMIs lies in the unstable performance in online control due to the variability of neural signals, which seriously hinders the clinical availability of BMIs. Method: To deal with the neural variability in online BMI control, we propose a dynamic ensemble Bayesian filter (DyEnsemble). DyEnsemble extends Bayesian filters with a dynamic measurement model, which adjusts its parameters in time adaptively with neural changes. This is achieved by learning a pool of candidate functions and dynamically weighting and assembling them according to neural signals. In this way, DyEnsemble copes with variability in signals and improves the robustness of online control. Results: Online BMI experiments with a human participant demonstrate that, compared with the velocity Kalman filter, DyEnsemble significantly improves the control accuracy (increases the success rate by 13.9% and reduces the reach time by 13.5% in the random target pursuit task) and robustness (performs more stably over different experiment days). Conclusion: Our results demonstrate the superiority of DyEnsemble in online BMI control. Significance: DyEnsemble frames a novel and flexible framework for robust neural decoding, which is beneficial to different neural decoding applications.
Published: 2022

10. CUP: A Conservative Update Policy Algorithm for Safe Reinforcement Learning

Author: Yang, Long, Ji, Jiaming, Dai, Juntao, Zhang, Yu, Li, Pengfei, and Pan, Gang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: Safe reinforcement learning (RL) is still very challenging since it requires the agent to consider both return maximization and safe exploration. In this paper, we propose CUP, a Conservative Update Policy algorithm with a theoretical safety guarantee. We derive the CUP based on the new proposed performance bounds and surrogate functions. Although using bounds as surrogate functions to design safe RL algorithms have appeared in some existing works, we develop them at least three aspects: (i) We provide a rigorous theoretical analysis to extend the surrogate functions to generalized advantage estimator (GAE). GAE significantly reduces variance empirically while maintaining a tolerable level of bias, which is an efficient step for us to design CUP; (ii) The proposed bounds are tighter than existing works, i.e., using the proposed bounds as surrogate functions are better local approximations to the objective and safety constraints. (iii) The proposed CUP provides a non-convex implementation via first-order optimizers, which does not depend on any convex approximation. Finally, extensive experiments show the effectiveness of CUP where the agent satisfies safe constraints. We have opened the source code of CUP at https://github.com/RL-boxes/Safe-RL.
Published: 2022

11. Thompson Sampling for Unimodal Bandits

Author: Yang, Long, Li, Zhao, Hu, Zehong, Ruan, Shasha, Li, Shijian, Pan, Gang, and Chen, Hongyang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence
Abstract: In this paper, we propose a Thompson Sampling algorithm for \emph{unimodal} bandits, where the expected reward is unimodal over the partially ordered arms. To exploit the unimodal structure better, at each step, instead of exploration from the entire decision space, our algorithm makes decision according to posterior distribution only in the neighborhood of the arm that has the highest empirical mean estimate. We theoretically prove that, for Bernoulli rewards, the regret of our algorithm reaches the lower bound of unimodal bandits, thus it is asymptotically optimal. For Gaussian rewards, the regret of our algorithm is $\mathcal{O}(\log T)$, which is far better than standard Thompson Sampling algorithms. Extensive experiments demonstrate the effectiveness of the proposed algorithm on both synthetic data sets and the real-world applications., Comment: There are some technical parts need to be improved. We will fix these places and provide an updated version
Published: 2021

12. On Convergence of Gradient Expected Sarsa($\lambda$)

Author: Yang, Long, Zheng, Gang, Zhang, Yu, Zheng, Qian, Li, Pengfei, and Pan, Gang
Subjects: Computer Science - Machine Learning
Abstract: We study the convergence of $\mathtt{Expected~Sarsa}(\lambda)$ with linear function approximation. We show that applying the off-line estimate (multi-step bootstrapping) to $\mathtt{Expected~Sarsa}(\lambda)$ is unstable for off-policy learning. Furthermore, based on convex-concave saddle-point framework, we propose a convergent $\mathtt{Gradient~Expected~Sarsa}(\lambda)$ ($\mathtt{GES}(\lambda)$) algorithm. The theoretical analysis shows that our $\mathtt{GES}(\lambda)$ converges to the optimal solution at a linear convergence rate, which is comparable to extensive existing state-of-the-art gradient temporal difference learning algorithms. Furthermore, we develop a Lyapunov function technique to investigate how the step-size influences finite-time performance of $\mathtt{GES}(\lambda)$, such technique of Lyapunov function can be potentially generalized to other GTD algorithms. Finally, we conduct experiments to verify the effectiveness of our $\mathtt{GES}(\lambda)$., Comment: This submission has been accepted by AAAI2021
Published: 2020

13. Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

Author: Yang, Long, Zheng, Qian, and Pan, Gang
Subjects: Computer Science - Machine Learning
Abstract: The goal of policy-based reinforcement learning (RL) is to search the maximal point of its objective. However, due to the inherent non-concavity of its objective, convergence to a first-order stationary point (FOSP) can not guarantee the policy gradient methods finding a maximal point. A FOSP can be a minimal or even a saddle point, which is undesirable for RL. Fortunately, if all the saddle points are \emph{strict}, all the second-order stationary points (SOSP) are exactly equivalent to local maxima. Instead of FOSP, we consider SOSP as the convergence criteria to character the sample complexity of policy gradient. Our result shows that policy gradient converges to an $(\epsilon,\sqrt{\epsilon\chi})$-SOSP with probability at least $1-\widetilde{\mathcal{O}}(\delta)$ after the total cost of $\mathcal{O}\left(\dfrac{\epsilon^{-\frac{9}{2}}}{(1-\gamma)\sqrt\chi}\log\dfrac{1}{\delta}\right)$, where $\gamma\in(0,1)$. Our result improves the state-of-the-art result significantly where it requires $\mathcal{O}\left(\dfrac{\epsilon^{-9}\chi^{\frac{3}{2}}}{\delta}\log\dfrac{1}{\epsilon\chi}\right)$. Our analysis is based on the key idea that decomposes the parameter space $\mathbb{R}^p$ into three non-intersected regions: non-stationary point, saddle point, and local optimal region, then making a local improvement of the objective of RL in each region. This technique can be potentially generalized to extensive policy gradient methods., Comment: This submission has been accepted by AAAI2021
Published: 2020

14. Dynamic Ensemble Modeling Approach to Nonstationary Neural Decoding in Brain-Computer Interfaces

Author: Qi, Yu, Liu, Bin, Wang, Yueming, and Pan, Gang
Subjects: Electrical Engineering and Systems Science - Signal Processing, Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Brain-computer interfaces (BCIs) have enabled prosthetic device control by decoding motor movements from neural activities. Neural signals recorded from cortex exhibit nonstationary property due to abrupt noises and neuroplastic changes in brain activities during motor control. Current state-of-the-art neural signal decoders such as Kalman filter assume fixed relationship between neural activities and motor movements, thus will fail if this assumption is not satisfied. We propose a dynamic ensemble modeling (DyEnsemble) approach that is capable of adapting to changes in neural signals by employing a proper combination of decoding functions. The DyEnsemble method firstly learns a set of diverse candidate models. Then, it dynamically selects and combines these models online according to Bayesian updating mechanism. Our method can mitigate the effect of noises and cope with different task behaviors by automatic model switching, thus gives more accurate predictions. Experiments with neural data demonstrate that the DyEnsemble method outperforms Kalman filters remarkably, and its advantage is more obvious with noisy signals.
Published: 2019

15. Gradient Q$(\sigma, \lambda)$: A Unified Algorithm with Function Approximation for Reinforcement Learning

Author: Yang, Long, Zhang, Yu, Zheng, Qian, Li, Pengfei, and Pan, Gang
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Full-sampling (e.g., Q-learning) and pure-expectation (e.g., Expected Sarsa) algorithms are efficient and frequently used techniques in reinforcement learning. Q$(\sigma,\lambda)$ is the first approach unifies them with eligibility trace through the sampling degree $\sigma$. However, it is limited to the tabular case, for large-scale learning, the Q$(\sigma,\lambda)$ is too expensive to require a huge volume of tables to accurately storage value functions. To address above problem, we propose a GQ$(\sigma,\lambda)$ that extends tabular Q$(\sigma,\lambda)$ with linear function approximation. We prove the convergence of GQ$(\sigma,\lambda)$. Empirical results on some standard domains show that GQ$(\sigma,\lambda)$ with a combination of full-sampling with pure-expectation reach a better performance than full-sampling and pure-expectation methods.
Published: 2019

16. FiDi-RL: Incorporating Deep Reinforcement Learning with Finite-Difference Policy Search for Efficient Learning of Continuous Control

Author: Shi, Longxiang, Li, Shijian, Cao, Longbing, Yang, Long, Zheng, Gang, and Pan, Gang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: In recent years significant progress has been made in dealing with challenging problems using reinforcement learning.Despite its great success, reinforcement learning still faces challenge in continuous control tasks. Conventional methods always compute the derivatives of the optimal goal with a costly computation resources, and are inefficient, unstable and lack of robust-ness when dealing with such tasks. Alternatively, derivative-based methods treat the optimization process as a blackbox and show robustness and stability in learning continuous control tasks, but not data efficient in learning. The combination of both methods so as to get the best of the both has raised attention. However, most of the existing combination works adopt complex neural networks (NNs) as the policy for control. The double-edged sword of deep NNs can yield better performance, but also makes it difficult for parameter tuning and computation. To this end, in this paper we presents a novel method called FiDi-RL, which incorporates deep RL with Finite-Difference (FiDi) policy search.FiDi-RL combines Deep Deterministic Policy Gradients (DDPG)with Augment Random Search (ARS) and aims at improving the data efficiency of ARS. The empirical results show that FiDi-RL can improves the performance and stability of ARS, and provide competitive results against some existing deep reinforcement learning methods, Comment: I found some theoretical errors
Published: 2019

17. Expected Sarsa($\lambda$) with Control Variate for Variance Reduction

Author: Yang, Long, Zhang, Yu, Wen, Jun, Zheng, Qian, Li, Pengfei, and Pan, Gang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: Off-policy learning is powerful for reinforcement learning. However, the high variance of off-policy evaluation is a critical challenge, which causes off-policy learning falls into an uncontrolled instability. In this paper, for reducing the variance, we introduce control variate technique to $\mathtt{Expected}$ $\mathtt{Sarsa}$($\lambda$) and propose a tabular $\mathtt{ES}$($\lambda$)-$\mathtt{CV}$ algorithm. We prove that if a proper estimator of value function reaches, the proposed $\mathtt{ES}$($\lambda$)-$\mathtt{CV}$ enjoys a lower variance than $\mathtt{Expected}$ $\mathtt{Sarsa}$($\lambda$). Furthermore, to extend $\mathtt{ES}$($\lambda$)-$\mathtt{CV}$ to be a convergent algorithm with linear function approximation, we propose the $\mathtt{GES}$($\lambda$) algorithm under the convex-concave saddle-point formulation. We prove that the convergence rate of $\mathtt{GES}$($\lambda$) achieves $\mathcal{O}(1/T)$, which matches or outperforms lots of state-of-art gradient-based algorithms, but we use a more relaxed condition. Numerical experiments show that the proposed algorithm performs better with lower variance than several state-of-art gradient-based TD learning algorithms: $\mathtt{GQ}$($\lambda$), $\mathtt{GTB}$($\lambda$) and $\mathtt{ABQ}$($\zeta$).
Published: 2019

18. Policy Optimization with Stochastic Mirror Descent

Author: Yang, Long, Zhang, Yu, Zheng, Gang, Zheng, Qian, Li, Pengfei, Huang, Jianhang, Wen, Jun, and Pan, Gang
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes $\mathtt{VRMPO}$ algorithm: a sample efficient policy gradient method with stochastic mirror descent. In $\mathtt{VRMPO}$, a novel variance-reduced policy gradient estimator is presented to improve sample efficiency. We prove that the proposed $\mathtt{VRMPO}$ needs only $\mathcal{O}(\epsilon^{-3})$ sample trajectories to achieve an $\epsilon$-approximate first-order stationary point, which matches the best sample complexity for policy optimization. The extensive experimental results demonstrate that $\mathtt{VRMPO}$ outperforms the state-of-the-art policy gradient methods in various settings.
Published: 2019

19. Brain Network Construction and Classification Toolbox (BrainNetClass)

Author: Zhou, Zhen, Chen, Xiaobo, Zhang, Yu, Qiao, Lishan, Yu, Renping, Pan, Gang, Zhang, Han, and Shen, Dinggang
Subjects: Quantitative Biology - Neurons and Cognition, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Brain functional network has become an increasingly used approach in understanding brain functions and diseases. Many network construction methods have been developed, whereas the majority of the studies still used static pairwise Pearson's correlation-based functional connectivity. The goal of this work is to introduce a toolbox namely "Brain Network Construction and Classification" (BrainNetClass) to the field to promote more advanced brain network construction methods. It comprises various brain network construction methods, including some state-of-the-art methods that were recently developed to capture more complex interactions among brain regions along with connectome feature extraction, reduction, parameter optimization towards network-based individualized classification. BrainNetClass is a MATLAB-based, open-source, cross-platform toolbox with graphical user-friendly interfaces for cognitive and clinical neuroscientists to perform rigorous computer-aided diagnosis with interpretable result presentations even though they do not possess neuroimage computing and machine learning knowledge. We demonstrate the implementations of this toolbox on real resting-state functional MRI datasets. BrainNetClass (v1.0) can be downloaded from https://github.com/zzstefan/BrainNetClass.
Published: 2019
Full Text: View/download PDF

20. TBQ($\sigma$): Improving Efficiency of Trace Utilization for Off-Policy Reinforcement Learning

Author: Shi, Longxiang, Li, Shijian, Cao, Longbing, Yang, Long, and Pan, Gang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning, 68Wxx
Abstract: Off-policy reinforcement learning with eligibility traces is challenging because of the discrepancy between target policy and behavior policy. One common approach is to measure the difference between two policies in a probabilistic way, such as importance sampling and tree-backup. However, existing off-policy learning methods based on probabilistic policy measurement are inefficient when utilizing traces under a greedy target policy, which is ineffective for control problems. The traces are cut immediately when a non-greedy action is taken, which may lose the advantage of eligibility traces and slow down the learning process. Alternatively, some non-probabilistic measurement methods such as General Q($\lambda$) and Naive Q($\lambda$) never cut traces, but face convergence problems in practice. To address the above issues, this paper introduces a new method named TBQ($\sigma$), which effectively unifies the tree-backup algorithm and Naive Q($\lambda$). By introducing a new parameter $\sigma$ to illustrate the \emph{degree} of utilizing traces, TBQ($\sigma$) creates an effective integration of TB($\lambda$) and Naive Q($\lambda$) and continuous role shift between them. The contraction property of TB($\sigma$) is theoretically analyzed for both policy evaluation and control settings. We also derive the online version of TBQ($\sigma$) and give the convergence proof. We empirically show that, for $\epsilon\in(0,1]$ in $\epsilon$-greedy policies, there exists some degree of utilizing traces for $\lambda\in[0,1]$, which can improve the efficiency in trace utilization for off-policy reinforcement learning, to both accelerate the learning process and improve the performance., Comment: 8 pages
Published: 2019

21. Field-aware Neural Factorization Machine for Click-Through Rate Prediction

Author: Zhang, Li, Shen, Weichen, Li, Shijian, and Pan, Gang
Subjects: Computer Science - Machine Learning, Statistics - Machine Learning
Abstract: Recommendation systems and computing advertisements have gradually entered the field of academic research from the field of commercial applications. Click-through rate prediction is one of the core research issues because the prediction accuracy affects the user experience and the revenue of merchants and platforms. Feature engineering is very important to improve click-through rate prediction. Traditional feature engineering heavily relies on people's experience, and is difficult to construct a feature combination that can describe the complex patterns implied in the data. This paper combines traditional feature combination methods and deep neural networks to automate feature combinations to improve the accuracy of click-through rate prediction. We propose a mechannism named 'Field-aware Neural Factorization Machine' (FNFM). This model can have strong second order feature interactive learning ability like Field-aware Factorization Machine, on this basis, deep neural network is used for higher-order feature combination learning. Experiments show that the model has stronger expression ability than current deep learning feature combination models like the DeepFM, DCN and NFM.
Published: 2019

22. Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network

Author: Meng, Wenjia, Zheng, Qian, Yang, Long, Li, Pengfei, and Pan, Gang
Subjects: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Statistics - Machine Learning
Abstract: The deep Q-network (DQN) and return-based reinforcement learning are two promising algorithms proposed in recent years. DQN brings advances to complex sequential decision problems, while return-based algorithms have advantages in making use of sample trajectories. In this paper, we propose a general framework to combine DQN and most of the return-based reinforcement learning algorithms, named R-DQN. We show the performance of traditional DQN can be improved effectively by introducing return-based reinforcement learning. In order to further improve the R-DQN, we design a strategy with two measurements which can qualitatively measure the policy discrepancy. Moreover, we give the two measurements' bounds in the proposed R-DQN framework. We show that algorithms with our strategy can accurately express the trace coefficient and achieve a better approximation to return. The experiments, conducted on several representative tasks from the OpenAI Gym library, validate the effectiveness of the proposed measurements. The results also show that the algorithms with our strategy outperform the state-of-the-art methods.
Published: 2018
Full Text: View/download PDF

23. Spiking Deep Residual Network

Author: Hu, Yangfan, Tang, Huajin, and Pan, Gang
Subjects: Computer Science - Neural and Evolutionary Computing, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
Abstract: Spiking neural networks (SNNs) have received significant attention for their biological plausibility. SNNs theoretically have at least the same computational power as traditional artificial neural networks (ANNs). They possess potential of achieving energy-efficiency while keeping comparable performance to deep neural networks (DNNs). However, it is still a big challenge to train a very deep SNN. In this paper, we propose an efficient approach to build a spiking version of deep residual network (ResNet). ResNet is considered as a kind of the state-of-the-art convolutional neural networks (CNNs). We employ the idea of converting a trained ResNet to a network of spiking neurons, named Spiking ResNet (S-ResNet). We propose a shortcut conversion model to appropriately scale continuous-valued activations to match firing rates in SNN, and a compensation mechanism to reduce the error caused by discretisation. Experimental results demonstrate that, compared with the state-of-the-art SNN approaches, the proposed Spiking ResNet achieves the best performance on CIFAR-10, CIFAR-100, and ImageNet 2012. Our work is the first time to build a SNN deeper than 40, with comparable performance to ANNs on a large-scale dataset.
Published: 2018

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

23 results on '"Pan, Gang"'

1. Toward Large-scale Spiking Neural Networks: A Comprehensive Survey and Future Directions

2. Resisting Stochastic Risks in Diffusion Planners with the Trajectory Aggregation Tree

3. Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline

4. Generalizable Sleep Staging via Multi-Level Domain Alignment

5. FP3O: Enabling Proximal Policy Optimization in Multi-Agent Cooperation with Parameter-Sharing Versatility

6. Mitigating Communication Costs in Neural Networks: The Role of Dendritic Nonlinearity

7. Constrained Update Projection Approach to Safe Policy Optimization

8. TinyLight: Adaptive Traffic Signal Control on Devices with Extremely Limited Resources

9. Dynamic Ensemble Bayesian Filter for Robust Control of a Human Brain-machine Interface

10. CUP: A Conservative Update Policy Algorithm for Safe Reinforcement Learning

11. Thompson Sampling for Unimodal Bandits

12. On Convergence of Gradient Expected Sarsa($\lambda$)

13. Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

14. Dynamic Ensemble Modeling Approach to Nonstationary Neural Decoding in Brain-Computer Interfaces

15. Gradient Q$(\sigma, \lambda)$: A Unified Algorithm with Function Approximation for Reinforcement Learning

16. FiDi-RL: Incorporating Deep Reinforcement Learning with Finite-Difference Policy Search for Efficient Learning of Continuous Control

17. Expected Sarsa($\lambda$) with Control Variate for Variance Reduction

18. Policy Optimization with Stochastic Mirror Descent

19. Brain Network Construction and Classification Toolbox (BrainNetClass)

20. TBQ($\sigma$): Improving Efficiency of Trace Utilization for Off-Policy Reinforcement Learning

21. Field-aware Neural Factorization Machine for Click-Through Rate Prediction

22. Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network

23. Spiking Deep Residual Network

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Database

23 results on '"Pan, Gang"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources