37 results on '"Expression Data"'
Search Results
2. DPEBic: detecting essential proteins in gene expressions using encoding and biclustering algorithm
- Author
-
Ali, Anooja, Hulipalled, Vishwanath R., Patil, S. S., and Abdulkader, Raees
- Published
- 2021
- Full Text
- View/download PDF
3. Efficient Algorithm for Microarray Probes Re-annotation
- Author
-
Foszner, Pawel, Gruca, Aleksandra, Polanski, Andrzej, Marczyk, Michal, Jaksik, Roman, Polanska, Joanna, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, Jędrzejowicz, Piotr, editor, Nguyen, Ngoc Thanh, editor, and Hoang, Kiem, editor
- Published
- 2011
- Full Text
- View/download PDF
4. Normalization of Biological Expression Data Based on Selection of a Stable Element Set
- Author
-
Bouki, Yoshihiko, Yoshihiro, Takuya, Inoue, Etsuko, Nakagawa, Masaru, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Nierstrasz, Oscar, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Sudan, Madhu, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Vardi, Moshe Y., Series editor, Weikum, Gerhard, Series editor, Goebel, Randy, editor, Siekmann, Jörg, editor, Wahlster, Wolfgang, editor, König, Andreas, editor, Dengel, Andreas, editor, Hinkelmann, Knut, editor, Kise, Koichi, editor, Howlett, Robert J., editor, and Jain, Lakhmi C., editor
- Published
- 2011
- Full Text
- View/download PDF
5. Prediction of Combinatorial Protein-Protein Interaction Networks from Expression Data Using Statistics on Conditional Probability
- Author
-
Fujiki, Takatoshi, Inoue, Etsuko, Yoshihiro, Takuya, Nakagawa, Masaru, Hutchison, David, Kanade, Takeo, Kittler, Josef, Kleinberg, Jon M., Mattern, Friedemann, Mitchell, John C., Naor, Moni, Nierstrasz, Oscar, Pandu Rangan, C., Steffen, Bernhard, Sudan, Madhu, Terzopoulos, Demetri, Tygar, Doug, Vardi, Moshe Y., Weikum, Gerhard, Goebel, Randy, Siekmann, Jörg, Wahlster, Wolfgang, Setchi, Rossitza, editor, Jordanov, Ivan, editor, Howlett, Robert J., editor, and Jain, Lakhmi C., editor
- Published
- 2010
- Full Text
- View/download PDF
6. Identifying Coexpressed Genes
- Author
-
Wang, Qihua, Härdle, Wolfgang, Mori, Yuichi, and Vieu, Philippe
- Published
- 2007
- Full Text
- View/download PDF
7. Estimating Gene Networks from Expression Data and Binding Location Data via Boolean Networks
- Author
-
Hirose, Osamu, Nariai, Naoki, Tamada, Yoshinori, Bannai, Hideo, Imoto, Seiya, Miyano, Satoru, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Gervasi, Osvaldo, editor, Gavrilova, Marina L., editor, Kumar, Vipin, editor, Laganà, Antonio, editor, Lee, Heow Pueh, editor, Mun, Youngsong, editor, Taniar, David, editor, and Tan, Chih Jeng Kenneth, editor
- Published
- 2005
- Full Text
- View/download PDF
8. Generating Contexts for Expression Data Using Pathway Queries
- Author
-
Sohler, Florian, Hutchison, David, editor, Kanade, Takeo, editor, Kittler, Josef, editor, Kleinberg, Jon M., editor, Mattern, Friedemann, editor, Mitchell, John C., editor, Naor, Moni, editor, Nierstrasz, Oscar, editor, Pandu Rangan, C., editor, Steffen, Bernhard, editor, Sudan, Madhu, editor, Terzopoulos, Demetri, editor, Tygar, Dough, editor, Vardi, Moshe Y., editor, Weikum, Gerhard, editor, Fages, François, editor, and Soliman, Sylvain, editor
- Published
- 2005
- Full Text
- View/download PDF
9. Erratum: A Noise Removal Algorithm for Time Series Microarray Data
- Author
-
Naresh Doni Jayavelu and Nadav Bar
- Subjects
Differentially expressed genes ,Series (mathematics) ,Computer science ,Expression data ,Microarray analysis techniques ,Noise removal ,Raw data ,Algorithm ,Selection (genetic algorithm) ,Fold change - Abstract
The Time-series microarray data that we used to demonstrate the algorithm was received from the groups of Prof. A. Laegreid and Prof. M. Kuiper. In the original submission we accidently referred to the source of the raw data, GSE32869 and GSE13009, but we actually received and used modified datasets. Specifically, in the Methods section we write, ≪Prior to the expression data fitting, we performed the selection of the differentially expressed genes based on fold change and p-value including normalization.≫, whereas this selection actually was performed by other groups at our University. We would like therefore to add the following acknowledgements
- Published
- 2013
10. A Unified Adaptive Co-identification Framework for High-D Expression Data
- Author
-
Cody Ashby, Shuzhong Zhang, Xiuzhen Huang, Bilian Chen, and Kun Wang
- Subjects
Identification (information) ,Expression data ,Computer science ,Pattern recognition (psychology) ,Key (cryptography) ,Data mining ,computer.software_genre ,Cluster analysis ,computer ,Expression (mathematics) ,Block (data storage) - Abstract
High-throughput techniques are producing large-scale high-dimensional (e.g., 4D with genes vs timepoints vs conditions vs tissues) genome-wide gene expression data. This induces increasing demands for effective methods for partitioning the data into biologically relevant groups. Current clustering and co-clustering approaches have limitations, which may be very time consuming and work for only low-dimensional expression datasets. In this work, we introduce a new notion of "co-identification", which allows systematical identification of genes participating different functional groups under different conditions or different development stages. The key contribution of our work is to build a unified computational framework of co-identification that enables clustering to be high-dimensional and adaptive. Our framework is based upon a generic optimization model and a general optimization method termed Maximum Block Improvement. Testing results on yeast and Arabidopsis expression data are presented to demonstrate high efficiency of our approach and its effectiveness.
- Published
- 2012
11. Automatic Inference of Regulatory and Dynamical Properties from Incomplete Gene Interaction and Expression Data
- Author
-
Eric Fanchon, Claudine Chaouiya, Fabien Corblin, Denis Thieffry, and Laurent Trilling
- Subjects
Formalism (philosophy of mathematics) ,Answer set programming ,Theoretical computer science ,Gene interaction ,Computer science ,Expression data ,Automatic inference ,Constraint programming ,Gene regulatory network ,Algorithm - Abstract
Advanced mathematical methods and computational tools are required to properly understand the behavior of large and complex regulatory networks that control cellular processes. Since available data are predominantly qualitative or semi-quantitative, discrete (logical) modeling approaches are increasingly used to model these networks. Here, relying on the multilevel logical formalism developed by R. Thomas et al. [7,9,8], we propose a computational approach enabling (i) to check the existence of at least one consistent model, given partial data on the regulatory structure and dynamical properties, and (ii) to infer properties common to all consistent models. Such properties represent non trivial deductions and could be used by the biologist to design new experiments. Rather than focusing on a single plausible solution, i.e. a model fully defined, we consider the whole class of models consistent with the available data and some economy criteria, from which we deduce shared properties. We use constraint programming to represent this class of models as the set of all solutions of a set of constraints [3]. For the sake of efficiency, we have developed a framework, called SysBiOX, enabling (i) the integration of partial gene interaction and expression data into constraints and (ii) the resolution of these constraints in order to infer properties about the structure or the behaviors of the gene network. SysBiOX is implemented in ASP (Answer Set Programming) using Clingo [4].
- Published
- 2012
12. Normalization of Biological Expression Data Based on Selection of a Stable Element Set
- Author
-
Yoshihiko Bouki, Takuya Yoshihiro, Etsuko Inoue, and Masaru Nakagawa
- Subjects
Normalization (statistics) ,Computer science ,Expression data ,Stable element ,Data mining ,computer.software_genre ,Scaling ,computer ,Algorithm - Abstract
Normalization of biological expression data is the process to remove several experimental or technical bias from the data in order to perform accurate analysis. Recently computational analysis of expression data is widely applied since very large number of genes or proteins can be examined simultaneously. In this paper we proposed a new normalization method for expression data which is based on selection of a stable element set. Our idea is that using a part of genes or proteins which is relatively stably expressed leads more accurate normalization. We formulate the problem and give the algorithm to solve it in practical time. Through evaluation with artificial and real data, we found that our method outperforms global scaling, and global scaling tend to over-correct the bias.
- Published
- 2011
13. Functional Classification of Plant Plasma Membrane Transporters
- Author
-
Burkhard Schulz
- Subjects
Membrane ,Transport activity ,Expression data ,Chemistry ,Guard cell ,Gene family ,Membrane Transporters ,Transporter ,Membrane transport ,Cell biology - Abstract
A short overview of the very large and diverse group of plasma membrane transport systems is presented here. Emphasized are the transporters with important physiological functions in higher plants, which are localized to the plasma membrane. Members of gene families or transporters that do not localize to the PM are generally not discussed in this overview. In most cases, structural features of transporters rather than their transport activity were used for functional classification because many groups and families of transporters have a broad transport spectrum and can assume characteristics of biphasic or high- and low-affinity transporters. Expression data was mentioned if it helped to indicate possible functional aspects of transporters.
- Published
- 2010
14. Discovering Regulatory Overlapping RNA Transcripts
- Author
-
Gerald R. Fink, Sudeep D. Agarwala, Timothy Danford, Robin D. Dowell, Paula Grisafi, and David K. Gifford
- Subjects
Genetics ,Tiling array ,genetic structures ,Regulatory sequence ,Expression data ,Transcription (biology) ,Gene expression ,RNA ,sense organs ,Computational biology ,Two Color Microarray ,Biology ,Gene - Abstract
STEREO is a novel algorithm that discovers cis-regulatory RNA interactions by assembling complete and potentially overlapping same-strand RNA transcripts from tiling expression data STEREO first identifies coherent segments of transcription and then discovers individual transcripts that are consistent with the observed segments given intensity and shape constraints We used STEREO to identify 1446 regions of overlapping transcription in two strains of yeast, including transcripts that comprise a new form of molecular toggle switch that controls gene variegation.
- Published
- 2010
15. A Hybrid Methodology for Pattern Recognition in Signaling Cervical Cancer Pathways
- Author
-
Jaime Berumen, David Escarcega, Fernando Ramos, and Ana María Espinosa
- Subjects
Cervical cancer ,Messenger RNA ,Microarray ,business.industry ,Pattern recognition ,Biology ,medicine.disease ,Mapk signaling pathway ,Expression data ,Pattern recognition (psychology) ,Gene expression ,medicine ,Artificial intelligence ,business ,Signalling pathways - Abstract
Cervical Cancer (CC) is the result of the infection of high risk Human Papilloma Viruses. mRNA microarray expression data provides biologists with evidences of cellular compensatory gene expression mechanisms in the CC progression. Pattern recognition of signalling pathways through expression data can reveal interesting insights for the understanding of CC. Consequently, gene expression data should be submitted to different pre-processing tasks. In this paper we propose a methodology based on the integration of expression data and signalling pathways as a needed phase for the pattern recognition within signaling CC pathways. Our results provide a top-down interpretation approach where biologists interact with the recognized patterns inside signalling pathways.
- Published
- 2010
16. Prediction of Combinatorial Protein-Protein Interaction Networks from Expression Data Using Statistics on Conditional Probability
- Author
-
Takuya Yoshihiro, Etsuko Inoue, Takatoshi Fujiki, and Masaru Nakagawa
- Subjects
Computer science ,Bayesian network ,Conditional probability ,computer.software_genre ,Protein expression ,Protein protein interaction network ,Protein–protein interaction ,Data set ,Boolean network ,Expression data ,Interaction network ,Statistics ,Data mining ,computer - Abstract
In this paper we propose a method to retrieve combinatorial protein-protein interaction to predict the interaction networks from protein expression data based on statistics on conditional probability. Our method retrieves the combinations of three proteins A, B and C which include combinatorial effects among them. The combinatorial effect considered in this paper does not include the "sole effect" between two proteins A-C or B-C, so that we can retrieve the combinatorial effect which appears only when proteins A, B and C get together. We evaluate our method with a real protein expression data set and obtain several combinations of three proteins in which protein-protein interactions are prediced.
- Published
- 2010
17. Detection of Significant Biclusters in Gene Expression Data using Reactive Greedy Randomized Adaptive Search Algorithm
- Author
-
Achuthsankar S. Nair and Smitha Dharan
- Subjects
Biclustering ,Expression data ,business.industry ,Computer science ,Search algorithm ,GRASP ,Gene expression ,Pattern recognition ,Artificial intelligence ,business ,Metaheuristic ,Greedy randomized adaptive search procedure ,Randomness - Abstract
In microarray gene expression data, a bicluster is a subset of genes which exhibit similar expression patterns along a subset of conditions. Each bicluster is represented as a tightly co-regulated submatrix of the gene expression matrix. One of the most popular measures to evaluate the quality of a bicluster is the mean squared residue score. Our approach aims to detect significant biclusters from a large microarray dataset through a metaheuristic algorithm called reactive greedy randomized adaptive search procedure (R-GRASP). The method finds biclusters by starting from small tightly co-regulated bicluster seeds and iteratively adds more genes and conditions to it while keeping the mean squared residue score below a predetermined threshold. In R-GRASP, the parameter that defines the blend of greediness and randomness is self adjustable depending on the quality of solutions found previously. We performed statistical validation of the biclusters obtained through p-value calculation and evaluated the results against the results of basic GRASP and as well as with the classic work of Cheng and Church. The experimental results on the Yeast gene expression data indicate that the Reactive GRASP outperforms the basic GRASP and the Cheng and Church approach.
- Published
- 2009
18. Integrating Multiple-Platform Expression Data through Gene Set Features
- Author
-
Jiří Kléma, Filip Železný, Jakub Tolar, and Matěj Holec
- Subjects
Boosting (machine learning) ,business.industry ,Gene sets ,Biology ,computer.software_genre ,Machine learning ,Random subspace method ,Fully coupled ,ComputingMethodologies_PATTERNRECOGNITION ,Expression data ,Artificial intelligence ,Data mining ,business ,Gene ,computer ,Multiple platform - Abstract
We demonstrate a set-level approach to the integration of multiple platform gene expression data for predictive classification and show its utility for boosting classification performance when single- platform samples are rare. We explore three ways of defining gene sets, including a novel way based on the notion of a fully coupled flux related to metabolic pathways. In two tissue classification tasks, we empirically show that the gene set based approach is useful for combining heterogeneous expression data, while surprisingly, in experiments constrained to a single platform, biologically meaningful gene sets acting as sample features are often outperformed by random gene sets with no biological relevance.
- Published
- 2009
19. Similarity Matches of Gene Expression Data Based on Wavelet Transform
- Author
-
Mong-Shu Lee, Li-Yu Liu, and Mu-Yen Chen
- Subjects
Series (mathematics) ,Structural similarity ,business.industry ,Wavelet transform ,Pattern recognition ,computer.software_genre ,Measure (mathematics) ,Similarity (network science) ,Expression data ,Data mining ,Artificial intelligence ,Time series ,business ,computer ,Mathematics ,Regulator gene - Abstract
This study presents a similarity-determining method for measuring regulatory relationships between pairs of genes from microarray time series data. The proposed similarity metrics are based on a new method to measure structural similarity to compare the quality of images. We make use of the Dual-Tree Wavelet Transform (DTWT) since it provides approximate shift invariance and maintain the structures between pairs of regulation related time series expression data. Despite the simplicity of the presented method, experimental results demonstrate that it enhances the similarity index when tested on known transcriptional regulatory genes.
- Published
- 2009
20. Evaluating Between-Pathway Models with Expression Data
- Author
-
Lenore J. Cowen, Donna K. Slonim, Benjamin Hescott, and Mark D.M. Leiserson
- Subjects
Network motif ,Deletion mutant ,Expression data ,Computer science ,Microarray gene expression ,Computational biology ,Data mining ,computer.software_genre ,Gene ,computer - Abstract
Between-Pathway Models (BPMs) are network motifs consisting of pairs of putative redundant pathways. In this paper, we show how adding another source of high-throughput data, microarray gene expression data from knockout experiments, allows us to identify a compensatory functional relationship between genes from the two BPM pathways. We evaluate the quality of the BPMs from four different studies, and we describe how our methods might be extended to refine pathways.
- Published
- 2009
21. Mimosa: Mixture Model of Co-expression to Detect Modulators of Regulatory Interaction
- Author
-
Logan J. Everett, Sridhar Hannenhalli, Matthew E. B. Hansen, and Larry N. Singh
- Subjects
0303 health sciences ,Maximum likelihood ,Computational biology ,Expression (computer science) ,Biology ,Bioinformatics ,Mixture model ,Correlation ,03 medical and health sciences ,0302 clinical medicine ,Expression data ,030220 oncology & carcinogenesis ,Mixture modeling ,Gene ,Transcription factor ,030304 developmental biology - Abstract
Functionally related genes tend to be correlated in their expression patterns across multiple conditions and/or tissue-types. Thus co-expression networks are often used to investigate functional groups of genes. In particular, when one of the genes is a transcription factor (TF), the co-expression-based interaction is interpreted, with caution, as a direct regulatory interaction. However, any particular TF, and more importantly, any particular regulatory interaction, is likely to be active only in a subset of experimental conditions. Moreover, the subset of expression samples where the regulatory interaction holds may be marked by presence or absence of a modifier gene, such as an enzyme that post-translationally modifies the TF. Such subtlety of regulatory interactions is overlooked when one computes an overall expression correlation. Here we present a novel mixture modeling approach where a TF-Gene pair is presumed to be significantly correlated (with unknown coefficient) in a (unknown) subset of expression samples. The parameters of the model are estimated using a Maximum Likelihood approach. The estimated mixture of expression samples is then mined to identify genes potentially modulating the TF-Gene interaction. We have validated our approach using synthetic data and on three biological cases in cow and in yeast. While limited in some ways, as discussed, the work represents a novel approach to mine expression data and detect potential modulators of regulatory interactions.
- Published
- 2009
22. geneSetFinder: A Multiagent Architecture for Gathering Biological Information
- Author
-
Daniel Glez-Peña, Reyes Pavón, Rosalía Laza, Julia Glez-Dopazo, and Florentino Fdez-Riverola
- Subjects
World Wide Web ,ComputingMethodologies_PATTERNRECOGNITION ,Expression data ,Computer science ,Gene sets ,Volume (computing) ,Context (language use) ,DNA microarray experiment ,Genomics ,Architecture ,Base (topology) ,Data science - Abstract
The past decade has seen a tremendous growth in the amount of experimental and computational biomedical data, specifically in the areas of genomics and proteomics. In this context, the immense volume of data resulting from DNA microarray experiment presents a major data analysis challenge. Current methods for genome-wide analysis of expression data typically rely on biologically relevant gene sets instead of individual genes. This has translated into a need for sophisticated tools to mine, integrate and prioritize massive amounts of information. In this work we report the development of a multiagent architecture that gives support to the construction of gene sets coming from multiple heterogeneous data sources. The proposed architecture is the base of a publicly available web portal in which final users are able to extract lists of genes from multiple heterogeneous data sources.
- Published
- 2009
23. The Impact of Gene Selection on Imbalanced Microarray Expression Data
- Author
-
Xingquan Zhu, Abhijit S. Pandya, Muhammad Shoaib, Abu H. M. Kamal, and Sam Hsu
- Subjects
Clustering high-dimensional data ,Gene selection ,Microarray ,Expression data ,Microarray analysis techniques ,Data mining ,Computational biology ,Biology ,computer.software_genre ,Data imbalance ,Gene ,computer ,Random forest - Abstract
Microarray experiments usually output small volumes but high dimensional data. Selecting a number of genes relevant to the tasks at hand is usually one of the most important steps for the expression data analysis. While numerous researches have demonstrated the effectiveness of gene selection from different perspectives, existing endeavors, unfortunately, ignore the data imbalance reality, where one type of samples (e.g., cancer tissues) may be significantly fewer than the other (e.g., normal tissues). In this paper, we carry out a systematic study to investigate the impact of gene selection on imbalanced microarray data. Our objective is to understand that if gene selection is applied to imbalanced expression data, what kind of consequences it may bring to the final results? For this purpose, we apply five gene selection measures to eleven microarray datasets, and employ four learning methods to build classification models from the data containing selected genes only. Our study will bring important findings and draw numerous conclusions on (1) the impact of gene selection on imbalanced data, and (2) behaviors of different learning methods on the selected data.
- Published
- 2009
24. Semi-supervised Clustering of Yeast Gene Expression Data
- Author
-
Alexander Schönhuth, Alexander Schliep, and Ivan G. Costa
- Subjects
Set (abstract data type) ,DNA binding site ,Biological data ,Expression data ,Gene expression ,Constrained clustering ,Computational biology ,Data mining ,Biology ,Cluster analysis ,computer.software_genre ,computer ,Semi supervised clustering - Abstract
To identify modules of interacting molecules often gene expression is analyzed with clustering methods. Constrained or semi-supervised clustering provides a framework to augment the primary, gene expression data with secondary data, to arrive at biological meaningful clusters. Here, we present an approach using constrained clustering and present favorable results on a biological data set of gene expression time-courses in Yeast together with predicted transcription factor binding site information.
- Published
- 2009
25. Biclustering Expression Data Based on Expanding Localized Substructures
- Author
-
Melih Sözdinler, Cesim Erten, Erten, Cesim, Işık Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Işık University, Faculty of Engineering, Department of Computer Engineering, and Sözdinler, Melih
- Subjects
Enrichment ratio ,Localize substructure ,Bioinformatics ,Real data sets ,Biclustering algorithm ,Biclusters ,computer.software_genre ,Gene ,Matrix algebra ,Data matrix (multivariate statistics) ,Correlation ,Biclustering ,Matrix (mathematics) ,Data matrices ,Biology ,Microarray data ,Yeast cell cycle ,Mathematics ,Bipartite graph ,Matrix ,business.industry ,Search spaces ,Block matrix ,Pattern recognition ,Sub-matrices ,Quantitative Biology::Genomics ,Expression (mathematics) ,Gene expression data ,Adaptive noise ,Expression data ,Gene expression ,Artificial intelligence ,Data mining ,Localization procedure ,business ,Row ,computer ,Algorithms - Abstract
Biclustering gene expression data is the problem of extracting submatrices of genes and conditions exhibiting significant correlation across both the rows and the columns of a data matrix of expression values. We provide a method, LEB (Localize-and-Extract Biclusters) which reduces the search space into local neighborhoods within the matrix by first localizing correlated structures. The localization procedure takes its roots from effective use of graph-theoretical methods applied to problems exhibiting a similar structure to that of biclustering. Once interesting structures are localized the search space reduces to small neighborhoods and the biclusters are extracted from these localities. We evaluate the effectiveness of our method with extensive experiments both using artificial and real datasets. Univ Connecticut; Booth Engn Ctr Adv Technol Publisher's Version
- Published
- 2009
26. Using Supervised Complexity Measures in the Analysis of Cancer Gene Expression Data Sets
- Author
-
Ana Carolina Lorena, Marcilio C. P. de Souto, Liciana R. Peres, and Ivan G. Costa
- Subjects
business.industry ,Computer science ,Pattern recognition ,Machine learning ,computer.software_genre ,Support vector machine ,Data set ,Complexity index ,Expression data ,Cancer gene ,Artificial intelligence ,High dimensionality ,business ,Classifier (UML) ,computer ,Linear separability - Abstract
Supervised Machine Learning methods have been successfully applied for performing gene expression based cancer diagnosis. Characteristics intrinsic to cancer gene expression data sets, such as high dimensionality, low number of samples and presence of noise makes the classification task very difficult. Furthermore, limitations in the classifier performance may often be attributed to characteristics intrinsic to a particular data set. This paper presents an analysis of gene expression data sets for cancer diagnosis using classification complexity measures. Such measures consider data geometry, distribution and linear separability as indications of complexity of the classification task. The results obtained indicate that the cancer data sets investigated are formed by mostly linearly separable non-overlapping classes, supporting the good predictive performance of robust linear classifiers, such as SVMs, on the given data sets. Furthermore, we found two complexity indices, which were good indicators for the difficulty of gene expression based cancer diagnosis.
- Published
- 2009
27. Boosting the Performance of Inference Algorithms for Transcriptional Regulatory Networks Using a Phylogenetic Approach
- Author
-
Xiuwei Zhang and Bernard M. E. Moret
- Subjects
Boosting (machine learning) ,Phylogenetic tree ,business.industry ,Bayesian network ,Inference ,Biology ,Machine learning ,computer.software_genre ,Expression data ,Data mining ,Artificial intelligence ,business ,Algorithm ,computer ,Biological network ,Network model - Abstract
Inferring transcriptional regulatory networks from gene-expression data remains a challenging problem, in part because of the noisy nature of the data and the lack of strong network models. Time-series expression data have shown promise and recent work by Babu on the evolution of regulatory networks in E. coliand S. cerevisiaeopened another avenue of investigation. In this paper we take the evolutionary approach one step further, by developing ML-based refinement algorithms that take advantage of established phylogenetic relationships among a group of related organisms and of a simple evolutionary model for regulatory networks to improve the inference of these networks for these organisms from expression data gathered under similar conditions. We use simulations with different methods for generating gene-expression data, different phylogenies, and different evolutionary rates, and use different network inference algorithms, to study the performance of our algorithmic boosters. The results of simulations (including various tests to exclude confounding factors) demonstrate clear and significant improvements (in both specificity and sensitivity) on the performance of current inference algorithms. Thus gene-expression studies across a range of related organisms could yield significantly more accurate regulatory networks than single-organism studies.
- Published
- 2008
28. Fuzzy Support Vector Machine for Genes Expression Data Analysis
- Author
-
Agnieszka Wieclawek, Joanna Musioł, and Urszula Mazurek
- Subjects
Support vector machine ,Relevance vector machine ,Set (abstract data type) ,ComputingMethodologies_PATTERNRECOGNITION ,Current (mathematics) ,Structured support vector machine ,Computer science ,Expression data ,Data mining ,computer.software_genre ,computer ,Fuzzy logic ,Class (biology) - Abstract
The current study presents two approaches to the fuzzy support vector machine. The first approach implements the fuzzy support vector machine for solving a two class problem. The second approach employs the fuzzy support vector machine for a multi-class problem. In both cases fuzzy classifiers have been used for genes expression data analysis. The first method has been tested on clinical data acquired at the Silesian Medical University. Then the dataset from Kent Ridge Biomedical Data Set Repository has been used to simulate the performance of the second tool.
- Published
- 2008
29. APMA Database for Affymetrix Target Sequences Mapping, Quality Assessment and Expression Data Mining
- Author
-
Atif Shahab, Joanne Chen, Yuriy L. Orlov, Vladimir A. Kuznetsov, and Jiangtao Zhou
- Subjects
Annotation ,Interactive presentation ,Quality assessment ,Expression data ,Interspersed repeat ,Gene chip analysis ,Computational biology ,Data mining ,Biology ,computer.software_genre ,Genome ,computer ,Sequence mapping - Abstract
We have developed an online database APMA (Affymetrix Probe Mapping and Annotation) for interactive presentation, search and visualization of Affymetrix target sequences mapping and annotation 〈http://apma.bii.astar. edu.sg〉. APMA contains revised genome localization of the Affymetrix U133 GeneChip initial (target) probe sequences. We designed APMA to use it as a filter before data analysis and data mining so that noise expression signals, false correlations and false gene expression patterns can be reduced. Discrepancies found in probeset annotation and target sequence mapping account for up to 30% of probesets, including about 25% of Affymetrix probesets derived from target sequences overlapped interspersed repeats and 1.8% of original target sequences with erroneous orientation of the sequences. 86% of U133 target sequences passed our quality-control filtering.
- Published
- 2007
30. Facial Expression Analysis on Semantic Neighborhood Preserving Embedding
- Author
-
Youdong Zhao, Yunde Jia, and Shuang Xu
- Subjects
Facial expression ,business.industry ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Representation (systemics) ,Pattern recognition ,Expression Feature ,Expression (mathematics) ,law.invention ,law ,Expression data ,Metric (mathematics) ,Embedding ,Computer vision ,Artificial intelligence ,business ,Manifold (fluid mechanics) ,Mathematics - Abstract
In this study, an expression manifold is constructed by Neighborhood Preserving Embedding (NPE) based on the expression semantic metric for a global representation of all possible facial expression images. On this learned manifold, images with semantic `similar' expression are mapped onto nearby points whatever their lighting, pose and individual appearance are quite different. The proposed manifold extracts the universal expression feature and reveals the intrinsic semantic global structure and the essential relations of the expression data. Experimental results demonstrate the effectiveness of our approach.
- Published
- 2007
31. Identifying Coexpressed Genes
- Author
-
Qihua Wang
- Subjects
business.industry ,Computer science ,False positives and false negatives ,nutritional and metabolic diseases ,Pattern recognition ,A-weighting ,nervous system diseases ,Reduction (complexity) ,Noise ,Expression data ,parasitic diseases ,Outlier ,population characteristics ,Cluster tree ,Artificial intelligence ,business ,Cluster analysis ,health care economics and organizations - Abstract
Some gene expression data contain outliers and noise because of experiment error. In clustering, outliers and noise can result in false positives and false negatives. This motivates us to develop a weighting method to adjust the expression data such that the outlier and noise effect decrease, and hence result in a reduction in false positives and false negatives in clustering.
- Published
- 2006
32. Biological Specifications for a Synthetic Gene Expression Data Generation Model
- Author
-
Giorgio Valentini, Francesca Ruffino, and Marco Muselli
- Subjects
Gene expression profiling ,Set (abstract data type) ,Expression data ,Synthetic gene ,Computer science ,Open problem ,Gene expression ,Data mining ,Computational biology ,computer.software_genre ,Gene ,Independent component analysis ,computer - Abstract
An open problem in gene expression data analysis is the evaluation of the performance of gene selection methods applied to discover biologically relevant sets of genes. The problem is difficult, as the entire set of genes involved in specific biological processes is usually unknown or only partially known, making unfeasible a correct comparison between different gene selection methods. The natural solution to this problem consists in developing an artificial model to generate gene expression data, in order to know in advance the set of biologically relevant genes. The models proposed in the literature, even if useful for a preliminary evaluation of gene selection methods, did not explicitly consider the biological characteristics of gene expression data. The main aim of this work is to individuate the main biological characteristics that need to be considered to design a model for validating gene selection methods based on the analysis of DNA microarray data.
- Published
- 2006
33. Gene Regulatory Network Construction Using Dynamic Bayesian Network (DBN) with Structure Expectation Maximization (SEM)
- Author
-
Peifa Jia, Yu Zhang, Zhidong Deng, and Hongshan Jiang
- Subjects
Structure (mathematical logic) ,Computer science ,business.industry ,Gene regulatory network ,Bayesian network ,Machine learning ,computer.software_genre ,Variable-order Bayesian network ,ComputingMethodologies_PATTERNRECOGNITION ,Expression data ,Expectation–maximization algorithm ,ComputingMethodologies_GENERAL ,Artificial intelligence ,Rough set ,business ,computer ,Dynamic Bayesian network - Abstract
Discovering gene relationship from gene expression data is a hot topic in the post-genomic era. In recent years, Bayesian network has become a popular method to reconstruct the gene regulatory network due to the statistical nature. However, it is not suitable for analyzing the time-series data and cannot deal with cycles in the gene regulatory network. In this paper we apply the dynamic Bayesian network to model the gene relationship in order to overcome these difficulties. By incorporating the structural expectation maximization algorithm into the dynamic Bayesian network model, we develop a new method to learn the regulatory network from the S.Cerevisiae cell cycle gene expression data. The experimental results demonstrate that the accuracy of our method outperforms the previous work
- Published
- 2006
34. Identifying Submodules of Cellular Regulatory Networks
- Author
-
Neil D. Lawrence, Guido Sanguinetti, and Magnus Rattray
- Subjects
Network architecture ,Biological data ,Theoretical computer science ,Expression data ,Scale (chemistry) ,Cellular network ,Throughput ,Data mining ,Biology ,computer.software_genre ,computer ,Eigendecomposition of a matrix ,Task (project management) - Abstract
Recent high throughput techniques in molecular biology have brought about the possibility of directly identifying the architecture of regulatory networks on a genome-wide scale. However, the computational task of estimating fine-grained models on a genome-wide scale is daunting. Therefore, it is of great importance to be able to reliably identify submodules of the network that can be effectively modelled as independent subunits. In this paper we present a procedure to obtain submodules of a cellular network by using information from gene-expression measurements. We integrate network architecture data with genome-wide gene expression measurements in order to determine which regulatory relations are actually confirmed by the expression data. We then use this information to obtain non-trivial submodules of the regulatory network using two distinct algorithms, a naive exhaustive algorithm and a spectral algorithm based on the eigendecomposition of an affinity matrix. We test our method on two yeast biological data sets, using regulatory information obtained from chromatin immunoprecipitation.
- Published
- 2006
35. Generating Contexts for Expression Data Using Pathway Queries
- Author
-
Florian Sohler
- Subjects
Microarray ,Computer science ,Expression data ,Gene expression ,Computational biology ,Data mining ,DNA microarray ,computer.software_genre ,computer ,Throughput (business) ,Biological network - Abstract
The measurement of gene expression data using microarrays has become a standard high throughput method in many areas of biology and medicine. Despite some issues in quality and reproducibility of microarray and derived data [3,4], microarrays are still considered one of the most promising experimental techniques for the understanding of complex molecular mechanisms, and the analysis of gene expression data is still a very active area of research in bioinformatics and statistics.
- Published
- 2005
36. Korrelation von Genexpressionsprofilen mit prognostisch relevanten Parametern beim kolorektalen Karzinom
- Author
-
Birgit Weber, Thomas Brümmendorf, M. Heinze, Irina Klaman, Klaus Hermann, J. Gröne, Heinz-Johannes Buhr, and Benno Mann
- Subjects
Gene expression profiling ,Colorectal cancer ,Expression data ,medicine ,Cancer research ,Cancer ,Lymph node metastasis ,Biology ,medicine.disease ,Gene ,Laser capture microdissection ,Invasion front - Abstract
Gene expression profiling is a new powerful tool to obtain additional information from the patient’s prognosis compared to established classifications like e.g. TNM and UICC as shown for other tumor entities. The aim of this study was to identify prognostic signatures for patients with colorectal cancer of all UICC subgroups, especially for patients with versus without lymph node metastasis (N0 vs. N +), and for patients with versus without distant metastasis (M0 vs. M +). We applied Affymetrix GeneChips (33,000 genes) and Laser Capture Microdissection (LCM) in 25 patients with colorectal cancer. Hierarchical cluster analysis discriminates characteristic expression patterns from normal mucosae and cancer tissues (tumor-cluster) of the central (ZT) and the invasion front (IF). A characteristic pattern for N+ could be identified within the tumorcluster. Significant correlation of tumor-clusters and UICC classification and M0 vs. M+ was not found. To obtain additional information to the established prognostic parameters by gene expression profiling in colorectal cancer, expression data have to be correlated with clinicopathological data and the number of patients has to be expanded.
- Published
- 2003
37. A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites
- Author
-
Nir Friedman, Gill Bejerano, and Yoseph Barash
- Subjects
Genetics ,DNA binding site ,Simple (abstract algebra) ,Expression data ,Gene expression ,Promoter ,Computational biology ,Biology ,Gene ,Hypergeometric distribution ,Conserved sequence - Abstract
A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The recent flood of genomic and post-genomic data opens the way for computational methods elucidating the key components that play a role in these mechanisms. One important consequence is the ability to recognize groups of genes that are co-expressed using microarray expression data. We then wish to identify in-silico putative transcription factor binding sites in the promoter regions of these gene, that might explain the coregulation, and hint at possible regulators. In this paper we describe a simple and fast, yet powerful, two stages approach to this task. Using a rigorous hypergeometric statistical analysis and a straightforward computational procedure we find small conserved sequence kernels. These are then stochastically expanded into PSSMs using an EM-like procedure. We demonstrate the utility and speed of our methods by applying them to several data sets from recent literature. We also compare these results with those of MEME when run on the same sets.
- Published
- 2001
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.