31 results on '"Zou, Quan"'
Search Results
2. FMAlign2: a novel fast multiple nucleotide sequence alignment method for ultralong datasets.
- Author
-
Zhang, Pinglu, Liu, Huan, Wei, Yanming, Zhai, Yixiao, Tian, Qinzhong, and Zou, Quan
- Subjects
SEQUENCE alignment ,NUCLEOTIDE sequence ,SOURCE code ,RESEARCH personnel ,BIOINFORMATICS - Abstract
Motivation In bioinformatics, multiple sequence alignment (MSA) is a crucial task. However, conventional methods often struggle with aligning ultralong sequences. To address this issue, researchers have designed MSA methods rooted in a vertical division strategy, which segments sequence data for parallel alignment. A prime example of this approach is FMAlign, which utilizes the FM-index to extract common seeds and segment the sequences accordingly. Results FMAlign2 leverages the suffix array to identify maximal exact matches, redefining the approach of FMAlign from searching for global chains to partial chains. By using a vertical division strategy, large-scale problem is deconstructed into manageable tasks, enabling parallel execution of subMSA. Furthermore, sequence-profile alignment and refinement are incorporated to concatenate subsets, yielding the final result seamlessly. Compared to FMAlign, FMAlign2 markedly augments the segmentation of sequences and significantly reduces the time while maintaining accuracy, especially on ultralong datasets. Importantly, FMAlign2 enhances existing MSA methods by conferring the capability to handle sequences reaching billions in length within an acceptable time frame. Availability and implementation Source code and datasets are available at https://github.com/malabz/FMAlign2 and https://zenodo.org/records/10435770. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Novel H/ACA Box snoRNA Mining and Secondary Structure Prediction Algorithms
- Author
-
Zou, Quan, Guo, Maozu, Wang, Chunyu, Han, Yingpeng, Li, Wenbin, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Wen, Peng, editor, Li, Yuefeng, editor, Polkowski, Lech, editor, Yao, Yiyu, editor, Tsumoto, Shusaku, editor, and Wang, Guoyin, editor
- Published
- 2009
- Full Text
- View/download PDF
4. cmFSM: a scalable CPU-MIC coordinated drug-finding tool by frequent subgraph mining
- Author
-
Yang, Shunyun, Guo, Runxin, Liu, Rui, Liao, Xiangke, Zou, Quan, Shi, Benyun, and Peng, Shaoliang
- Published
- 2018
- Full Text
- View/download PDF
5. Matrix reconstruction with reliable neighbors for predicting potential MiRNA–disease associations.
- Author
-
Feng, Hailin, Jin, Dongdong, Li, Jian, Li, Yane, Zou, Quan, and Liu, Tongcun
- Subjects
MICRORNA ,NEIGHBORS ,FORECASTING ,BIOINFORMATICS ,CONFIDENCE - Abstract
Numerous experimental studies have indicated that alteration and dysregulation in mircroRNAs (miRNAs) are associated with serious diseases. Identifying disease-related miRNAs is therefore an essential and challenging task in bioinformatics research. Computational methods are an efficient and economical alternative to conventional biomedical studies and can reveal underlying miRNA–disease associations for subsequent experimental confirmation with reasonable confidence. Despite the success of existing computational approaches, most of them only rely on the known miRNA–disease associations to predict associations without adding other data to increase the prediction accuracy, and they are affected by issues of data sparsity. In this paper, we present MRRN, a model that combines matrix reconstruction with node reliability to predict probable miRNA–disease associations. In MRRN, the most reliable neighbors of miRNA and disease are used to update the original miRNA–disease association matrix, which significantly reduces data sparsity. Unknown miRNA–disease associations are reconstructed by aggregating the most reliable first-order neighbors to increase prediction accuracy by representing the local and global structure of the heterogeneous network. Five-fold cross-validation of MRRN produced an area under the curve (AUC) of 0.9355 and area under the precision-recall curve (AUPR) of 0.2646, values that were greater than those produced by comparable models. Two different types of case studies using three diseases were conducted to demonstrate the accuracy of MRRN, and all top 30 predicted miRNAs were verified. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
6. MDADP: A Webserver Integrating Database and Prediction Tools for Microbe-Disease Associations.
- Author
-
Wang, Lei, Li, Hao, Wang, Yuqi, Tan, Yihong, Chen, Zhiping, Pei, Tingrui, and Zou, Quan
- Subjects
INTERNET servers ,DATABASES ,INFORMATION resources ,FORECASTING ,HUMAN body ,MICROBIOLOGY - Abstract
More and more evidence has demonstrated that microbiota play important roles in the life processes of the human body. In recent years, various computational methods have been proposed for identifying potentially disease-associated microbes to save costs in traditional biological experiments. However, prediction performances of these methods are generally limited by outdated and incomplete datasets. And moreover, until now, there are limited studies that can provide visual predictive tools for inferring possible microbe-disease associations (MDAs) as well. Hence, in this manuscript, a novel webserver called MDADP will be proposed to identify latent MDAs, in which, a new MDA database together with interactive prediction tools for MDAs studies will be designed simultaneously. Especially, in the newly constructed MDA database, 2019 known MDAs between 58 diseases and 703 microbes have been manually collected first. And then, through adopting the average ranking method and the co-confidence method respectively, eight representative computational models have been integrated together to identify potential disease-related microbes. As a result, MDADP can provide not only interactive features for users to access and capture MDAs entities, but alsoeffective tools for users to identify candidate microbes for different diseases. To our knowledge, MDADP is the first online platform that incorporates a new MDA database with comprehensive MDA prediction tools. Therefore, we believe that it will be a valuable source of information for researches in microbiology and disease-related fields. MDADP can be accessed at http://mdadp.leelab2997.cn. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
7. Application of learning to rank in bioinformatics tasks.
- Author
-
Ru, Xiaoqing, Ye, Xiucai, Sakurai, Tetsuya, and Zou, Quan
- Subjects
BIOINFORMATICS ,TASKS ,ALGORITHMS - Abstract
Over the past decades, learning to rank (LTR) algorithms have been gradually applied to bioinformatics. Such methods have shown significant advantages in multiple research tasks in this field. Therefore, it is necessary to summarize and discuss the application of these algorithms so that these algorithms are convenient and contribute to bioinformatics. In this paper, the characteristics of LTR algorithms and their strengths over other types of algorithms are analyzed based on the application of multiple perspectives in bioinformatics. Finally, the paper further discusses the shortcomings of the LTR algorithms, the methods and means to better use the algorithms and some open problems that currently exist. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
8. VPTMdb: a viral posttranslational modification database.
- Author
-
Xiang, Yujia, Zou, Quan, and Zhao, Lilin
- Subjects
- *
POST-translational modification , *INTERNET servers , *DRUG target , *DNA viruses , *VIRAL proteins , *PHOSPHORYLATION - Abstract
In viruses, posttranslational modifications (PTMs) are essential for their life cycle. Recognizing viral PTMs is very important for a better understanding of the mechanism of viral infections and finding potential drug targets. However, few studies have investigated the roles of viral PTMs in virus–human interactions using comprehensive viral PTM datasets. To fill this gap, we developed the first comprehensive viral posttranslational modification database (VPTMdb) for collecting systematic information of PTMs in human viruses and infected host cells. The VPTMdb contains 1240 unique viral PTM sites with 8 modification types from 43 viruses (818 experimentally verified PTM sites manually extracted from 150 publications and 422 PTMs extracted from SwissProt) as well as 13 650 infected cells' PTMs extracted from seven global proteomics experiments in six human viruses. The investigation of viral PTM sequences motifs showed that most viral PTMs have the consensus motifs with human proteins in phosphorylation and five cellular kinase families phosphorylate more than 10 viral species. The analysis of protein disordered regions presented that more than 50% glycosylation sites of double-strand DNA viruses are in the disordered regions, whereas single-strand RNA and retroviruses prefer ordered regions. Domain–domain interaction analysis indicating potential roles of viral PTMs play in infections. The findings should make an important contribution to the field of virus–human interaction. Moreover, we created a novel sequence-based classifier named VPTMpre to help users predict viral protein phosphorylation sites. VPTMdb online web server (http://vptmdb.com:8787/VPTMdb/) was implemented for users to download viral PTM data and predict phosphorylation sites of interest. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
9. Protein Complexes Identification with Family-Wise Error Rate Control.
- Author
-
He, Zengyou, Zhao, Can, Liang, Hao, Xu, Bo, and Zou, Quan
- Abstract
The detection of protein complexes from protein-protein interaction network is a fundamental issue in bioinformatics and systems biology. To solve this problem, numerous methods have been proposed from different angles in the past decades. However, the study on detecting statistically significant protein complexes still has not received much attention. Although there are a few methods available in the literature for identifying statistically significant protein complexes, none of these methods can provide a more strict control on the error rate of a protein complex in terms of family-wise error rate (FWER). In this paper, we propose a new detection method SSF that is capable of controlling the FWER of each reported protein complex. More precisely, we first present a $p$ p -value calculation method based on Fisher's exact test to quantify the association between each protein and a given candidate protein complex. Consequently, we describe the key modules of the SSF algorithm: a seed expansion procedure for significant protein complexes search and a set cover strategy for redundancy elimination. The experimental results on five benchmark data sets show that: (1) our method can achieve the highest precision; (2) it outperforms three competing methods in terms of normalized mutual information (NMI) and F1 score in most cases. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
10. Investigating Maize Yield-Related Genes in Multiple Omics Interaction Network Data.
- Author
-
Jiang, Jing, Xing, Fei, Zeng, Xiangxiang, and Zou, Quan
- Abstract
Zea mays (maize) is the highest yielding food crop globally, feeding large numbers of people across the planet. It is thus especially important to explore the key genes that affect maize production with prior knowledge. Merging multiple datasets of different types can improve the accuracy of candidate genes prediction results, so we constructed interaction networks using gene, mRNA, protein, and expression profile datasets. A network propagation schedule was used considering combined scores obtained by integrating both network scores and significance scores for each candidate gene based on the guilt-by-association principle. An SVM model was used to optimize the weighted parameters to achieve more reliable results, according to the accuracy of label classification. We found that integrating multiple omics data with more data types improves the reliability of the results. We investigated the GO terms particularly associated with the top 100 candidate genes and the known genes, and analyzed the roles that these genes play in determining the phenotype of maize. We hope that the candidate genes identified here will provide a biological perspective and contribute to maize breeding research. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
11. Details in the evaluation of circular RNA detection tools: Reply to Chen and Chuang.
- Author
-
Zeng, Xiangxiang, Lin, Wei, Guo, Maozu, and Zou, Quan
- Subjects
CIRCULAR RNA ,BIG data ,TOXINS ,PLASMIDS ,DATABASES - Abstract
In their comment, Chen and Chuang [] pointed out several weak points of our recent paper []. Here we respond in detail to clarify the dataset we used in our work. We also discuss the three confounding factors they listed in their comment. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
12. Deep learning in omics: a survey and guideline.
- Author
-
Zhang, Zhiqiang, Zhao, Yi, Liao, Xiangke, Shi, Wenqiang, Li, Kenli, Zou, Quan, and Peng, Shaoliang
- Subjects
MACHINE learning ,DEEP learning ,ARTIFICIAL intelligence ,GENE expression ,DATA analysis - Abstract
Omics, such as genomics, transcriptome and proteomics, has been affected by the era of big data. A huge amount of high dimensional and complex structured data has made it no longer applicable for conventional machine learning algorithms. Fortunately, deep learning technology can contribute toward resolving these challenges. There is evidence that deep learning can handle omics data well and resolve omics problems. This survey aims to provide an entry-level guideline for researchers, to understand and use deep learning in order to solve omics problems. We first introduce several deep learning models and then discuss several research areas which have combined omics and deep learning in recent years. In addition, we summarize the general steps involved in using deep learning which have not yet been systematically discussed in the existent literature on this topic. Finally, we compare the features and performance of current mainstream open source deep learning frameworks and present the opportunities and challenges involved in deep learning. This survey will be a good starting point and guideline for omics researchers to understand deep learning. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
13. Editorial: Machine Learning Techniques on Gene Function Prediction.
- Author
-
Zou, Quan, Sangaiah, Arun Kumar, and Mrozek, Dariusz
- Subjects
MACHINE learning ,GENES ,DEEP learning ,RANDOM walks - Abstract
Yu et al. constructed a weighted four-layer disease-disease similarity network to characterize the associations at different levels between diseases. He et al. proposed an NRLMFMDA (neighborhood regularized logistic matrix factorization method for miRNA-disease association prediction) by integrating miRNA functional similarity, disease semantic similarity, Gaussian interaction profile kernel similarity, and experimental validation of disease-miRNA association. Liu et al. classified muscle-invasive bladder cancer into two conservative subtypes using miRNA, mRNA, and lncRNA expression data; investigated subtype-related biological pathways; and evaluated the subtype classification performance using machine learning methods. [Extracted from the article]
- Published
- 2019
- Full Text
- View/download PDF
14. Efficient computation of motif discovery on Intel Many Integrated Core (MIC) Architecture.
- Author
-
Peng, Shaoliang, Cheng, Minxia, Huang, Kaiwen, Cui, YingBo, Zhang, Zhiqiang, Guo, Runxin, Zhang, Xiaoyu, Yang, Shunyun, Liao, Xiangke, Lu, Yutong, Zou, Quan, and Shi, Benyun
- Subjects
COMPUTATIONAL biology ,BIOINFORMATICS ,NUCLEOTIDE sequence ,AMINO acid sequence ,HIGH performance computing - Abstract
Background: Novel sequence motifs detection is becoming increasingly essential in computational biology. However, the high computational cost greatly constrains the efficiency of most motif discovery algorithms. Results: In this paper, we accelerate MEME algorithm targeted on Intel Many Integrated Core (MIC) Architecture and present a parallel implementation of MEME called MIC-MEME base on hybrid CPU/MIC computing framework. Our method focuses on parallelizing the starting point searching method and improving iteration updating strategy of the algorithm. MIC-MEME has achieved significant speedups of 26.6 for ZOOPS model and 30.2 for OOPS model on average for the overall runtime when benchmarked on the experimental platform with two Xeon Phi 3120 coprocessors. Conclusions: Furthermore, MIC-MEME has been compared with state-of-arts methods and it shows good scalability with respect to dataset size and the number of MICs. Source code:
https://github.com/hkwkevin28/MIC-MEME . [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
15. Prediction of potential disease-associated microRNAs using structural perturbation method.
- Author
-
Zeng, Xiangxiang, Liu, Li, Lü, Linyuan, and Zou, Quan
- Subjects
MICRORNA ,DISEASE susceptibility ,BIOINFORMATICS ,PROPHECY ,PERTURBATION theory - Abstract
Motivation: The identification of disease-related microRNAs (miRNAs) is an essential but challenging task in bioinformatics research. Similarity-based link prediction methods are often used to predict potential associations between miRNAs and diseases. In these methods, all unobserved associations are ranked by their similarity scores. Higher score indicates higher probability of existence. However, most previous studies mainly focus on designing advanced methods to improve the prediction accuracy while neglect to investigate the link predictability of the networks that present the miRNAs and diseases associations. In this work, we construct a bilayer network by integrating the miRNA-disease network, the miRNA similarity network and the disease similarity network. We use structural consistency as an indicator to estimate the link predictability of the related networks. On the basis of the indicator, a derivative algorithm, called structural perturbation method (SPM), is applied to predict potential associations between miRNAs and diseases. Results: The link predictability of bilayer network is higher than that of miRNA-disease network, indicating that the prediction of potential miRNAs-diseases associations on bilayer network can achieve higher accuracy than based merely on the miRNA-disease network. A comparison between the SPM and other algorithms reveals the reliable performance of SPM which performed well in a 5-fold crossvalidation. We test fifteen networks. The AUC values of SPM are higher than some well-known methods, indicating that SPM could serve as a useful computational method for improving the identification accuracy of miRNA-disease associations. Moreover, in a case study on breast neoplasm, 80% of the top-20 predictedmiRNAs have been manually confirmed by previous experimental studies. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
16. Special Protein Molecules Computational Identification.
- Author
-
Zou, Quan and He, Wenying
- Subjects
- *
PROTEIN expression , *PROTEIN genetics , *COMPUTATIONAL biology , *MOLECULAR genetics , *GENETIC regulation - Abstract
Computational identification of special protein molecules is a key issue in understanding protein function. It can guide molecular experiments and help to save costs. I assessed 18 papers published in the special issue of Int. J. Mol. Sci., and also discussed the related works. The computational methods employed in this special issue focused on machine learning, network analysis, and molecular docking. New methods and new topics were also proposed. There were in addition several wet experiments, with proven results showing promise. I hope our special issue will help in protein molecules identification researches. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
17. Hadoop Applications in Bioinformatics
- Author
-
Li Xu-bin, Zou Quan, Jiang Wenrui, and Jiang Yi
- Subjects
Service (systems architecture) ,Distributed database ,business.industry ,Computer science ,Reliable computing ,Cloud computing ,Bioinformatics ,computer.software_genre ,Open source ,Scalability ,Data_FILES ,Operating system ,Data-intensive computing ,Distributed File System ,business ,computer - Abstract
Bioinformatics is in a dilemma that traditional analysis tools work hard on the large-scale data from the high-throughout sequencing. In recent years, the open source Apache Hadoop project, which adopts MapReduce framework and distributed file system, brings bioinformatics researchers opportunities to obtain a scalable, efficient and reliable computing performance on Linux clusters and Cloud Computing Service. In this paper, we present Hadoop-based applications employed in bioinformatics, covering next-generation sequencing and other biological domains. In addition, we discuss obstacles and future works about Hadoop in bioinformatics.
- Published
- 2012
18. Editorial: Machine Learning Techniques on Gene Function Prediction Volume II.
- Author
-
Qi, Ren, Sangaiah, Arun Kumar, Mrozek, Dariusz, and Zou, Quan
- Subjects
FEATURE selection ,DEEP learning ,GENES ,FORECASTING - Published
- 2022
- Full Text
- View/download PDF
19. A comprehensive overview and evaluation of circular RNA detection tools.
- Author
-
Zeng, Xiangxiang, Lin, Wei, Guo, Maozu, and Zou, Quan
- Subjects
CIRCULAR RNA ,RNA splicing ,MICRORNA ,MEDICAL databases ,RIBONUCLEASES ,BIOMARKERS - Abstract
Circular RNA (circRNA) is mainly generated by the splice donor of a downstream exon joining to an upstream splice acceptor, a phenomenon known as backsplicing. It has been reported that circRNA can function as microRNA (miRNA) sponges, transcriptional regulators, or potential biomarkers. The availability of massive non-polyadenylated transcriptomes data has facilitated the genome-wide identification of thousands of circRNAs. Several circRNA detection tools or pipelines have recently been developed, and it is essential to provide useful guidelines on these pipelines for users, including a comprehensive and unbiased comparison. Here, we provide an improved and easy-to-use circRNA read simulator that can produce mimicking backsplicing reads supporting circRNAs deposited in CircBase. Moreover, we compared the performance of 11 circRNA detection tools on both simulated and real datasets. We assessed their performance regarding metrics such as precision, sensitivity, F1 score, and Area under Curve. It is concluded that no single method dominated on all of these metrics. Among all of the state-of-the-art tools, CIRI, CIRCexplorer, and KNIFE, which achieved better balanced performance between their precision and sensitivity, compared favorably to the other methods. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
20. A novel features ranking metric with application to scalable visual and bioinformatics data classification.
- Author
-
Zou, Quan, Zeng, Jiancang, Cao, Liujuan, and Ji, Rongrong
- Subjects
- *
BIG data , *BIOINFORMATICS , *CLASSIFICATION algorithms , *DIMENSION reduction (Statistics) , *PROTEIN-protein interactions , *TASK performance - Abstract
Coming with the big data era, the filtering of uninformative data becomes emerging. To this end, ranking the high dimensionality features plays an important role. However, most of the state-of-art methods focus on improving the classification accuracy while the stability of the dimensionality reduction is simply ignored. In this paper, we proposed a Max-Relevance-Max-Distance (MRMD) feature ranking method, which balances accuracy and stability of feature ranking and prediction task. In order to prove the effectiveness on big data, we tested our method on two different datasets. The first one is image classification, which is a benchmark dataset with high dimensionality, while the second one is protein–protein interaction prediction data, which comes from our previous private research and has massive instances. Experiments prove that our method maintained the accuracy together with the stability on both two big datasets. Moreover, our method runs faster than other filtering and wrapping methods, such as mRMR and Information Gain. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
21. System-Level Insights into the Cellular Interactome of a Non-Model Organism: Inferring, Modelling and Analysing Functional Gene Network of Soybean (Glycine max).
- Author
-
Xu, Yungang, Guo, Maozu, Zou, Quan, Liu, Xiaoyan, Wang, Chunyu, and Liu, Yang
- Subjects
CROP genetics ,SOYBEAN ,PROTEIN-protein interactions ,CELLULAR signal transduction ,BIOINFORMATICS ,GENE regulatory networks ,MODULAR design - Abstract
Cellular interactome, in which genes and/or their products interact on several levels, forming transcriptional regulatory-, protein interaction-, metabolic-, signal transduction networks, etc., has attracted decades of research focuses. However, such a specific type of network alone can hardly explain the various interactive activities among genes. These networks characterize different interaction relationships, implying their unique intrinsic properties and defects, and covering different slices of biological information. Functional gene network (FGN), a consolidated interaction network that models fuzzy and more generalized notion of gene-gene relations, have been proposed to combine heterogeneous networks with the goal of identifying functional modules supported by multiple interaction types. There are yet no successful precedents of FGNs on sparsely studied non-model organisms, such as soybean (Glycine max), due to the absence of sufficient heterogeneous interaction data. We present an alternative solution for inferring the FGNs of soybean (SoyFGNs), in a pioneering study on the soybean interactome, which is also applicable to other organisms. SoyFGNs exhibit the typical characteristics of biological networks: scale-free, small-world architecture and modularization. Verified by co-expression and KEGG pathways, SoyFGNs are more extensive and accurate than an orthology network derived from Arabidopsis. As a case study, network-guided disease-resistance gene discovery indicates that SoyFGNs can provide system-level studies on gene functions and interactions. This work suggests that inferring and modelling the interactome of a non-model plant are feasible. It will speed up the discovery and definition of the functions and interactions of other genes that control important functions, such as nitrogen fixation and protein or lipid synthesis. The efforts of the study are the basis of our further comprehensive studies on the soybean functional interactome at the genome and microRNome levels. Additionally, a web tool for information retrieval and analysis of SoyFGNs can be accessed at SoyFN: . [ABSTRACT FROM AUTHOR]
- Published
- 2014
- Full Text
- View/download PDF
22. Scalable Data Mining Algorithms in Computational Biology and Biomedicine.
- Author
-
Zou, Quan, Mrozek, Dariusz, Ma, Qin, and Xu, Yungang
- Subjects
- *
ALGORITHMS , *SERIAL publications , *DATA mining , *BIOINFORMATICS - Published
- 2017
- Full Text
- View/download PDF
23. Survey of MapReduce frame operation in bioinformatics.
- Author
-
Zou, Quan, Li, Xu-Bin, Jiang, Wen-Rui, Lin, Zi-Yu, Li, Gui-Lin, and Chen, Ke
- Subjects
- *
BIOINFORMATICS , *NUCLEOTIDE sequencing , *CLOUD computing , *GENE mapping research , *COMPUTATIONAL biology - Abstract
Bioinformatics is challenged by the fact that traditional analysis tools have difficulty in processing large-scale data from high-throughput sequencing. The open source Apache Hadoop project, which adopts the MapReduce framework and a distributed file system, has recently given bioinformatics researchers an opportunity to achieve scalable, efficient and reliable computing performance on Linux clusters and on cloud computing services. In this article, we present MapReduce frame-based applications that can be employed in the next-generation sequencing and other biological domains. In addition, we discuss the challenges faced by this field as well as the future works on parallel computing in bioinformatics. [ABSTRACT FROM PUBLISHER]
- Published
- 2014
- Full Text
- View/download PDF
24. Protein Remote Homology Detection by Combining Chou's Pseudo Amino Acid Composition and Profile-Based Protein Representation.
- Author
-
Liu, Bin, Wang, Xiaolong, Zou, Quan, Dong, Qiwen, and Chen, Qingcai
- Subjects
PROTEIN research ,HOMOLOGY (Biology) ,SUPPORT vector machines ,AMINO acids ,BIOINFORMATICS - Abstract
Protein remote homology detection is a key problem in bioinformatics. Currently the discriminative methods, such as Support Vector Machine (SVM) can achieve the best performance. The most efficient approach to improve the performance of SVM-based methods is to find a general protein representation method that is able to convert proteins with different lengths into fixed length vectors and captures the different properties of the proteins for the discrimination. The bottleneck of designing the protein representation method is that native proteins have different lengths. Motivated by the success of the pseudo amino acid composition (PseAAC) proposed by Chou, we applied this approach for protein remote homology detection. Some new indices derived from the amino acid index (AAIndex) database are incorporated into the PseAAC to improve the generalization ability of this method. Finally, the performance is further improved by combining the modified PseAAC with profile-based protein representation containing the evolutionary information extracted from the frequency profiles. Our experiments on a well-known benchmark show this method achieves superior or comparable performance with current state-of-the-art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
25. Genomic analysis of silkworm microRNA promoters and clusters.
- Author
-
Huang, Yong, Shen, Xing, Zou, Quan, Huang, Jin, and Tang, Shun
- Subjects
GENOMICS ,RNA ,SILKWORMS ,ANIMAL genetics ,ANTISENSE RNA ,MESSENGER RNA ,GENETIC transcription ,BIOINFORMATICS - Abstract
MicroRNAs (miRNAs) are endogenous single-stranded RNAs of 18-22 nt in length, which can regulate the complementary mRNAs at the post-transcriptional level by cleavage or repression of translation of the target mRNAs. Studies have shown that the majority of animal miRNAs are transcribed from independent transcription units, and some are transcribed together with their host genes. However, the nature of the primary transcript of intergenic miRNAs remains unknown. Silkworm ( Bombyx mori) miRNAs are representative of those of the Lepidoptera insects and many of them are conserved in Caenorhabditis elegans and other animal species. To date, little is known about the transcriptional regulation of silkworm miRNA genes. We performed the genomic analysis on the silkworm miRNA transcripts around the promoter region including the transcription start site (TSS) and the TATA-box, and on the organization of the miRNA cluster. In 73 pre-miRNAs from the silkworm 131 promoters were detected via a bioinformatics approach. Among them the portion of non-conserved promoters is greater than that of the conserved ones. The genomic organization of pre-miRNAs of the silkworm was globally analyzed and it was determined that 11 of them were organized into five clusters. Sequence alignment showed that paralogs existed for some of the miRNAs in the cluster. These results may increase the understanding of the specific sequences upstream of the pre-miRNAs and of the functional implications of miRNA clusters in the silkworm. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
26. Advanced Machine Learning Techniques for Bioinformatics.
- Author
-
Zou, Quan and Liu, Qi
- Abstract
The papers in this special section focus on the machine learning methods, and applications of these methods to computational biology. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
27. Molecular Computing and Bioinformatics.
- Author
-
Liang, Xin, Zhu, Wen, Lv, Zhibin, and Zou, Quan
- Subjects
BIOINFORMATICS ,MOLECULAR biology ,MOLECULAR structure ,ORGANIC chemistry ,COMPUTER engineering ,INTERDISCIPLINARY approach to knowledge - Abstract
Molecular computing and bioinformatics are two important interdisciplinary sciences that study molecules and computers. Molecular computing is a branch of computing that uses DNA, biochemistry, and molecular biology hardware, instead of traditional silicon-based computer technologies. Research and development in this area concerns theory, experiments, and applications of molecular computing. The core advantage of molecular computing is its potential to pack vastly more circuitry onto a microchip than silicon will ever be capable of—and to do it cheaply. Molecules are only a few nanometers in size, making it possible to manufacture chips that contain billions—even trillions—of switches and components. To develop molecular computers, computer scientists must draw on expertise in subjects not usually associated with their field, including organic chemistry, molecular biology, bioengineering, and smart materials. Bioinformatics works on the contrary; bioinformatics researchers develop novel algorithms or software tools for computing or predicting the molecular structure or function. Molecular computing and bioinformatics pay attention to the same object, and have close relationships, but work toward different orientations. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
28. Misprediction of Structural Disorder in Halophiles.
- Author
-
Pancsa, Rita, Kovacs, Denes, Tompa, Peter, and Zou, Quan
- Subjects
HALOPHILIC microorganisms ,MICROBIOLOGY of extreme environments ,PROTEIN structure ,BIOINFORMATICS ,PROTEOMICS - Abstract
Whereas the concept of intrinsic disorder derives from biophysical observations of the lack of structure of proteins or protein regions under native conditions, many of our respective concepts rest on proteome-scale bioinformatics predictions. It is established that most predictors work reliably on proteins commonly encountered, but it is often neglected that we know very little about their performance on proteins of microorganisms that thrive in environments of extreme temperature, pH, or salt concentration, which may cause adaptive sequence composition bias. To address this issue, we predicted structural disorder for the complete proteomes of different extremophile groups by popular prediction methods and compared them to those of the reference mesophilic group. While significant deviations from mesophiles could be explained by a lack or gain of disordered regions in hyperthermophiles and radiotolerants, respectively, we found systematic overprediction in the case of halophiles. Additionally, examples were collected from the Protein Data Bank (PDB) to demonstrate misprediction and to help understand the underlying biophysical principles, i.e., halophilic proteins maintain a highly acidic and hydrophilic surface to avoid aggregation in high salt conditions. Although sparseness of data on disordered proteins from extremophiles precludes the development of dedicated general predictors, we do formulate recommendations for how to address their disorder with current bioinformatics tools. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
29. Deep learning methods for bioinformatics and biomedicine.
- Author
-
Wang, Yansu, Xu, Lei, and Zou, Quan
- Subjects
- *
BIOINFORMATICS , *DEEP learning - Published
- 2023
- Full Text
- View/download PDF
30. Bioinformatics applications on Apache Spark.
- Author
-
Guo, Runxin, Zhao, Yi, Zou, Quan, Fang, Xiaodong, and Peng, Shaoliang
- Subjects
BIOINFORMATICS ,FAULT tolerance (Engineering) - Abstract
With the rapid development of next-generation sequencing technology, ever-increasing quantities of genomic data pose a tremendous challenge to data processing. Therefore, there is an urgent need for highly scalable and powerful computational systems. Among the state-of–the-art parallel computing platforms, Apache Spark is a fast, general-purpose, in-memory, iterative computing framework for large-scale data processing that ensures high fault tolerance and high scalability by introducing the resilient distributed dataset abstraction. In terms of performance, Spark can be up to 100 times faster in terms of memory access and 10 times faster in terms of disk access than Hadoop. Moreover, it provides advanced application programming interfaces in Java, Scala, Python, and R. It also supports some advanced components, including Spark SQL for structured data processing, MLlib for machine learning, GraphX for computing graphs, and Spark Streaming for stream computing. We surveyed Spark-based applications used in next-generation sequencing and other biological domains, such as epigenetics, phylogeny, and drug discovery. The results of this survey are used to provide a comprehensive guideline allowing bioinformatics researchers to apply Spark in their own fields. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
31. CWLy-SVM: A support vector machine-based tool for identifying cell wall lytic enzymes.
- Author
-
Meng, Chaolu, Guo, Fei, and Zou, Quan
- Subjects
- *
LYSINS , *INTERNET servers , *FEATURE selection , *FEATURE extraction , *DRUG development , *BACTERIAL cell walls - Abstract
• We identified cell wall lytic enzymes in bioinformatic way to overcome inefficiency of in vitro experiments and provide a website tool by wrapping the proposed model. • Our proposed model outperforms the state-of-the-art method in jackknife cross validation test. • We comprehensively analyzed the optimal feature set of proposed model from the prospective of data and biological meaning. Cell wall lytic enzymes, as an important biotechnical tool in drug development, agriculture and the food industry, have attracted more research attention. In this research, the accurate identification of cell wall lytic enzymes is one of the key and fundamental tasks. In this study, in order to eliminate the inefficiency of in vitro experiments, a support vector machine-based cell wall lytic enzyme identification model was constructed using bioinformatics. This machine learning process includes feature extraction, feature selection, model training and optimization. According to the jackknife cross validation test, this model obtained a sensitivity of 0.853, a specificity of 0.977, an MCC of 0.845 and an AUC of 0.915. These benchmark results demonstrate that the proposed model outperforms the state-of-the-art method and that it has powerful cell wall lytic enzyme identification ability. Furthermore, we comprehensively analyzed the selected optimal features and used the proposed model to construct a user friendly web server called the CWLy-SVM to identify cell wall lytic enzymes, which is available at http://server.malab.cn/CWLy-SVM/index.jsp. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.