18 results on '"Oliveira, Arlindo"'
Search Results
2. DegreeCox - a network-based regularization method for survival analysis.
- Author
-
Veríssimo, André, Oliveira, Arlindo Limede, Sagot, Marie-France, and Vinga, Susana
- Subjects
- *
PROPORTIONAL hazards models , *MATHEMATICAL regularization , *SURVIVAL analysis (Biometry) , *ONCOLOGY , *COST functions , *MANAGEMENT - Abstract
Background: Modeling survival oncological data has become a major challenge as the increase in the amount of molecular information nowadays available means that the number of features greatly exceeds the number of observations. One possible solution to cope with this dimensionality problem is the use of additional constraints in the cost function optimization. LASSO and other sparsity methods have thus already been successfully applied with such idea. Although this leads to more interpretable models, these methods still do not fully profit from the relations between the features, specially when these can be represented through graphs. We propose DEGREECOX, a method that applies network-based regularizers to infer Cox proportional hazard models, when the features are genes and the outcome is patient survival. In particular, we propose to use network centrality measures to constrain the model in terms of significant genes. Results: We applied DEGREECOX to three datasets of ovarian cancer carcinoma and tested several centrality measures such as weighted degree, betweenness and closeness centrality. The a priori network information was retrieved from Gene Co-Expression Networks and Gene Functional Maps. When compared with RIDGE and LASSO, DEGREECOX shows an improvement in the classification of high and low risk patients in a par with NET-COX. The use of network information is especially relevant with datasets that are not easily separated. In terms of RMSE and C-index, DEGREECOX gives results that are similar to those of the best performing methods, in a few cases slightly better. Conclusions: Network-based regularization seems a promising framework to deal with the dimensionality problem. The centrality metrics proposed can be easily expanded to accommodate other topological properties of different biological networks. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
3. GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge.
- Author
-
Carvalho, Alexandra M. and Oliveira, Arlindo L.
- Subjects
- *
ALGORITHMS , *NUCLEIC acids , *LEAVENING agents , *FERMENTATION , *DEOXYRIBOSE , *DNA - Abstract
Background: Position-specific priors (PSP) have been used with success to boost EM and Gibbs sampler-based motif discovery algorithms. PSP information has been computed from different sources, including orthologous conservation, DNA duplex stability, and nucleosome positioning. The use of prior information has not yet been used in the context of combinatorial algorithms. Moreover, priors have been used only independently, and the gain of combining priors from different sources has not yet been studied. Results: We extend RISOTTO, a combinatorial algorithm for motif discovery, by post-processing its output with a greedy procedure that uses prior information. PSP's from different sources are combined into a scoring criterion that guides the greedy search procedure. The resulting method, called GRISOTTO, was evaluated over 156 yeast TF ChIP-chip sequence-sets commonly used to benchmark prior-based motif discovery algorithms. Results show that GRISOTTO is at least as accurate as other twelve state-of-the-art approaches for the same task, even without combining priors. Furthermore, by considering combined priors, GRISOTTO is considerably more accurate than the state-of-the-art approaches for the same task. We also show that PSP's improve GRISOTTO ability to retrieve motifs from mouse ChiP-seq data, indicating that the proposed algorithm can be applied to data from a different technology and for a higher eukaryote. Conclusions: The conclusions of this work are twofold. First, post-processing the output of combinatorial algorithms by incorporating prior information leads to a very efficient and effective motif discovery method. Second, combining priors from different sources is even more beneficial than considering them separately. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
4. CONSTANT TIME CLASH DETECTION IN PROTEIN FOLDING.
- Author
-
BUGALHO, MIGUEL M. F. and OLIVEIRA, ARLINDO L.
- Subjects
- *
PROTEIN folding , *PROTEIN conformation , *STEREOCHEMISTRY , *ALGORITHMS , *MOLECULAR structure , *PHYSICAL & theoretical chemistry - Abstract
Applications for the manipulation of molecular structures are usually computationally intensive. Problems like protein docking or ab-initio protein folding need to frequently determine if two atoms in the structure collide. Therefore, an efficient algorithm for this problem, usually referred as clash detection, can greatly improve the application efficiency. This work focus mainly on the ab-initio protein folding problem. A naive approach for the clash problem, the most commonly-used by molecular structure programs, consists in calculating the distance between every pair of atoms. We propose an efficient data structure that uses a three-dimensional array to store the atoms' position. We compare the proposed data structure with one of the best known general data structures for this type of problems (SAT tree) and with the naive approach. While the naive approach takes linear time to the number of atoms to verify if a new atom clashes with any previously-set atoms, the proposed data structure takes constant time to perform the same verification. The SAT tree takes logarithmic time for the same task. The results show that the proposed data structure surpasses the other techniques for any protein size. The proposed data structure takes near half of the time of the SAT data structure and close to a fifth of the time of the naive approach for the larger proteins. We believe that this data structure can improve the existing molecular structure applications by decreasing the computational cost needed for clash detection. The data structure presented in this work can be used for any protein structure clash verification, as long as the atoms that need to be checked are kept in the 3D array. This data structure is particulary useful when manipulating large sets of atoms, for example, in applications like loop prediction, structure refinement of large proteins, and protein docking. [ABSTRACT FROM AUTHOR]
- Published
- 2009
- Full Text
- View/download PDF
5. A compressed self-index using a Ziv-Lempel dictionary.
- Author
-
Russo, Luís M. S. and Oliveira, Arlindo L.
- Subjects
- *
INDEXING (Machine-shop practice) , *MACHINE-shop practice , *HYPERSPACE , *DATA transmission systems , *CULTURAL policy , *DATABASE management , *DATA compression , *CODING theory , *DIGITAL electronics - Abstract
A compressed full-text self-index for a text T, of size u, is a data structure used to search for patterns P, of size m, in T, that requires reduced space, i.e. space that depends on the empirical entropy (Hk or H0) of T, and is, furthermore, able to reproduce any substring of T. In this paper we present a new compressed self-index able to locate the occurrences of P in O((m + occ)log u) time, where occ is the number of occurrences. The fundamental improvement over previous LZ78 based indexes is the reduction of the search time dependency on m from O(m²) to O(m). To achieve this result we point out the main obstacle to linear time algorithms based on LZ78 data compression and expose and explore the nature of a recurrent structure in LZ-indexes, the T78 suffix tree. We show that our method is very competitive in practice by comparing it against other state of the art compressed indexes. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
6. Inference of regular languages using state merging algorithms with search
- Author
-
Bugalho, Miguel and Oliveira, Arlindo L.
- Subjects
- *
GENETIC algorithms , *NP-complete problems , *COMPUTATIONAL complexity , *ARTIFICIAL intelligence - Abstract
Abstract: State merging algorithms have emerged as the solution of choice for the problem of inferring regular grammars from labeled samples, a known NP-complete problem of great importance in the grammatical inference area. These methods derive a small deterministic finite automaton from a set of labeled strings (the training set), by merging parts of the acceptor that corresponds to this training set. Experimental and theoretical evidence have shown that the generalization ability exhibited by the resulting automata is highly correlated with the number of states in the final solution. As originally proposed, state merging algorithms do not perform search. This means that they are fast, but also means that they are limited by the quality of the heuristics they use to select the states to be merged. Sub-optimal choices lead to automata that have many more states than needed and exhibit poor generalization ability. In this work, we survey the existing approaches that generalize state merging algorithms by using search to explore the tree that represents the space of possible sequences of state mergings. By using heuristic guided search in this space, many possible state merging sequences can be considered, leading to smaller automata and improved generalization ability, at the expense of increased computation time. We present comparisons of existing algorithms that show that, in widely accepted benchmarks, the quality of the derived solutions is improved by applying this type of search. However, we also point out that existing algorithms are not powerful enough to solve the more complex instances of the problem, leaving open the possibility that better and more powerful approaches need to be designed. [Copyright &y& Elsevier]
- Published
- 2005
- Full Text
- View/download PDF
7. On the Problem of Gate Assignment Under Different Rise and Fall Delays.
- Author
-
Oliveira, Arlindo L. and Murgai, Rajeev
- Subjects
- *
LOGIC devices , *DELAY lines , *MATHEMATICAL optimization - Abstract
In most libraries, gate parameters such as the pin-to-pin intrinsic delays, load-dependent coefficients, and input pin capacitances have different values for rising and falling signals. Most performance optimization algorithms, however, assume a single value for each parameter. It is known that under the load-independent delay model, the gate assignment (or resizing) problem is solvable in time polynomial in the circuit size when a single value is assumed for each parameter (Kukimoto et al., 1998). We show that, in the presence of different rise and fall parameter values, this problem is NP-complete even for chain and tree topology circuits under the simple load-independent delay model (Murgai, 1999). However, we also show that, for tree circuits, the problem is not NP-complete in the strong sense, and we propose a dynamic programming algorithm that solves it exactly in pseudopolynomial time. More specifically, we show that the problem can be solved in time proportional to the size of the circuit, the number of choices available in the library for each gate and the delay of the circuit. We also present a straightforward way of extending this algorithm to general directed acyclic networks. We present experimental results on a set of benchmark problems using a Standard commercial library and show that our algorithm generates provably optimum delays for 69 out of 73 circuits. We also compare our technique with two approaches traditionally used to solve this problem in the industry and academia and show that it performs better than these two. Interestingly, both traditional approaches also yield delays that are, in general, not far from the optimum. [ABSTRACT FROM AUTHOR]
- Published
- 2003
- Full Text
- View/download PDF
8. Techniques for the Creation of Digital Watermarks in Sequential Circuit Designs.
- Author
-
Oliveira, Arlindo L.
- Subjects
- *
WATERMARKS , *SEQUENTIAL processing (Computer science) , *INTELLECTUAL property - Abstract
Presents information on a study which proposed a methodology for the creation of digital watermarks on the state transition graph (STG) of synchronous sequential circuits. Concept of digital watermarking; Approaches to the STG technique in watermarking; Algorithm for watermarking STG; Possible attacks on the method.
- Published
- 2001
- Full Text
- View/download PDF
9. Exact Minimization of Binary Decision Diagrams Using Implicit Techniques.
- Author
-
Oliveira, Arlindo L. and Carloni, Luca P.
- Subjects
- *
ALGORITHMS , *COMPUTER arithmetic - Abstract
Presents a study that proposed an exact algorithm for the problem of binary decision diagram (BDD) minimization. Heuristic algorithms for the problem; Details on BDD: Complexity of the problem.
- Published
- 1998
- Full Text
- View/download PDF
10. Learning the dynamics of realistic models of C. elegans nervous system with recurrent neural networks.
- Author
-
Barbulescu, Ruxandra, Mestre, Gonçalo, Oliveira, Arlindo L., and Silveira, Luís Miguel
- Subjects
- *
RECURRENT neural networks , *NERVOUS system , *CAENORHABDITIS elegans , *STANDARD deviations , *NEURAL circuitry - Abstract
Given the inherent complexity of the human nervous system, insight into the dynamics of brain activity can be gained from studying smaller and simpler organisms. While some of the potential target organisms are simple enough that their behavioural and structural biology might be well-known and understood, others might still lead to computationally intractable models that require extensive resources to simulate. Since such organisms are frequently only acting as proxies to further our understanding of underlying phenomena or functionality, often one is not interested in the detailed evolution of every single neuron in the system. Instead, it is sufficient to observe the subset of neurons that capture the effect that the profound nonlinearities of the neuronal system have in response to different stimuli. In this paper, we consider the well-known nematode Caenorhabditis elegans and seek to investigate the possibility of generating lower complexity models that capture the system's dynamics with low error using only measured or simulated input-output information. Such models are often termed black-box models. We show how the nervous system of C. elegans can be modelled and simulated with data-driven models using different neural network architectures. Specifically, we target the use of state-of-the-art recurrent neural network architectures such as Long Short-Term Memory and Gated Recurrent Units and compare these architectures in terms of their properties and their accuracy (Root Mean Square Error), as well as the complexity of the resulting models. We show that Gated Recurrent Unit models with a hidden layer size of 4 are able to accurately reproduce the system response to very different stimuli. We furthermore explore the relative importance of their inputs as well as scalability to more scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. Deep Learning-Based Extraction of Biomarkers for the Prediction of the Functional Outcome of Ischemic Stroke Patients.
- Author
-
Oliveira, Gonçalo, Fonseca, Ana Catarina, Ferro, José, and Oliveira, Arlindo L.
- Subjects
- *
ISCHEMIC stroke , *STROKE patients , *FUNCTIONAL status , *DEEP learning , *BIOMARKERS , *DEATH forecasting - Abstract
Accurately predicting functional outcomes in stroke patients remains challenging yet clinically relevant. While brain CTs provide prognostic information, their practical value for outcome prediction is unclear. We analyzed a multi-center cohort of 743 ischemic stroke patients (<72 h onset), including their admission brain NCCT and CTA scans as well as their clinical data. Our goal was to predict the patients' future functional outcome, measured by the 3-month post-stroke modified Rankin Scale (mRS), dichotomized into good (mRS ≤ 2) and poor (mRS > 2). To this end, we developed deep learning models to predict the outcome from CT data only, and models that incorporate other patient variables. Three deep learning architectures were tested in the image-only prediction, achieving 0.779 ± 0.005 AUC. In addition, we created a model fusing imaging and tabular data by feeding the output of a deep learning model trained to detect occlusions on CT angiograms into our prediction framework, which achieved an AUC of 0.806 ± 0.082. These findings highlight how further refinement of prognostic models incorporating both image biomarkers and clinical data could enable more accurate outcome prediction for ischemic stroke patients. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
12. Using Information Interaction to Discover Epistatic Effects in Complex Diseases.
- Author
-
Anunciação, Orlando, Vinga, Susana, and Oliveira, Arlindo L.
- Subjects
- *
DISEASE complications , *HUMAN genetic variation , *EPISTASIS (Genetics) , *LOCUS (Genetics) , *SINGLE nucleotide polymorphisms , *COMPETITION (Biology) , *GENE regulatory networks - Abstract
It is widely agreed that complex diseases are typically caused by the joint effects of multiple instead of a single genetic variation. These genetic variations may show stronger effects when considered together than when considered individually, a phenomenon known as epistasis or multilocus interaction. In this work, we explore the applicability of information interaction to discover pairwise epistatic effects related with complex diseases. We start by showing that traditional approaches such as classification methods or greedy feature selection methods (such as the Fleuret method) do not perform well on this problem. We then compare our information interaction method with BEAM and SNPHarvester in artificial datasets simulating epistatic interactions and show that our method is more powerful to detect pairwise epistatic interactions than its competitors. We show results of the application of information interaction method to the WTCCC breast cancer dataset. Our results are validated using permutation tests. We were able to find 89 statistically significant pairwise interactions with a p-value lower than . Even though many recent algorithms have been designed to find epistasis with low marginals, we observed that all (except one) of the SNPs involved in statistically significant interactions have moderate or high marginals. We also report that the interactions found in this work were not present in gene-gene interaction network STRING. [ABSTRACT FROM AUTHOR]
- Published
- 2013
- Full Text
- View/download PDF
13. Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood.
- Author
-
Carvalho, Alexandra M., Roos, Teemu, Oliveira, Arlindo L., and Myllymäki, Petri
- Subjects
- *
MACHINE learning , *BAYESIAN analysis , *PARAMETER estimation , *APPROXIMATION theory , *CLASSIFICATION , *EMPIRICAL research , *PERFORMANCE evaluation - Abstract
We propose an efficient and parameter-free scoring criterion, the factorized conditional log-likelihood (f̂CLL), for learning Bayesian network classifiers. The proposed score is an approximation of the conditional log-likelihood criterion. The approximation is devised in order to guarantee decomposability over the network structure, as well as efficient estimation of the optimal parameters, achieving the same time and space complexity as the traditional log-likelihood scoring criterion. The resulting criterion has an information-theoretic interpretation based on interaction information, which exhibits its discriminative nature. To evaluate the performance of the proposed criterion, we present an empirical comparison with state-of-the-art classifiers. Results on a large suite of benchmark data sets from the UCI repository show that f̂CLL-trained classifiers achieve at least as good accuracy as the best compared classifiers, using significantly less computational resources. [ABSTRACT FROM AUTHOR]
- Published
- 2011
14. TFRank: network-based prioritization of regulatory associations underlying transcriptional responses.
- Author
-
Gonçalves, Joana P., Francisco, Alexandre P., Mira, Nuno P., Teixeira, Miguel C., Sá-Correia, Isabel, Oliveira, Arlindo L., and Madeira, Sara C.
- Subjects
- *
GENETIC transcription , *GENE expression , *GENETIC regulation , *BINDING sites , *GRAPH theory , *BREAST tumors , *METASTASIS - Abstract
Motivation: Uncovering mechanisms underlying gene expression control is crucial to understand complex cellular responses. Studies in gene regulation often aim to identify regulatory players involved in a biological process of interest, either transcription factors coregulating a set of target genes or genes eventually controlled by a set of regulators. These are frequently prioritized with respect to a context-specific relevance score. Current approaches rely on relevance measures accounting exclusively for direct transcription factor–target interactions, namely overrepresentation of binding sites or target ratios. Gene regulation has, however, intricate behavior with overlapping, indirect effect that should not be neglected. In addition, the rapid accumulation of regulatory data already enables the prediction of large-scale networks suitable for higher level exploration by methods based on graph theory. A paradigm shift is thus emerging, where isolated and constrained analyses will likely be replaced by whole-network, systemic-aware strategies.Results: We present TFRank, a graph-based framework to prioritize regulatory players involved in transcriptional responses within the regulatory network of an organism, whereby every regulatory path containing genes of interest is explored and incorporated into the analysis. TFRank selected important regulators of yeast adaptation to stress induced by quinine and acetic acid, which were missed by a direct effect approach. Notably, they reportedly confer resistance toward the chemicals. In a preliminary study in human, TFRank unveiled regulators involved in breast tumor growth and metastasis when applied to genes whose expression signatures correlated with short interval to metastasis.Availability: Prototype at http://kdbio.inesc-id.pt/software/tfrank/.Contact: jpg@kdbio.inesc-id.pt; sara.madeira@ist.utl.pt;Supplementary Information: Supplementary data are available at Bioinformatics online. [ABSTRACT FROM AUTHOR]
- Published
- 2011
15. QUANTITATIVE MODELING OF THE SACCHAROMYCES CEREVISIAE FLR1 REGULATORY NETWORK USING AN S-SYSTEM FORMALISM.
- Author
-
CALÇADA, DULCE, VINGA, SUSANA, FREITAS, ANA T., and OLIVEIRA, ARLINDO L.
- Subjects
- *
SACCHAROMYCES cerevisiae , *MATHEMATICAL models , *FUNGICIDES , *TRANSCRIPTION factors , *GENETICS , *DIFFERENTIAL equations , *MATHEMATICAL decoupling - Abstract
In this study we address the problem of finding a quantitative mathematical model for the genetic network regulating the stress response of the yeast Saccharomyces cerevisiae to the agricultural fungicide mancozeb. An S-system formalism was used to model the interactions of a five-gene network encoding four transcription factors (Yap1, Yrr1, Rpn4 and Pdr3) regulating the transcriptional activation of the FLR1 gene. Parameter estimation was accomplished by decoupling the resulting system of nonlinear ordinary differential equations into a larger nonlinear algebraic system, and using the Levenberg-Marquardt algorithm to fit the models predictions to experimental data. The introduction of constraints in the model, related to the putative topology of the network, was explored. The results show that forcing the network connectivity to adhere to this topology did not lead to better results than the ones obtained using an unrestricted network topology. Overall, the modeling approach obtained partial success when trained on the nonmutant datasets, although further work is required if one wishes to obtain more accurate prediction of the time courses. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
16. Haplotype inference with pseudo-Boolean optimization.
- Author
-
Graça, Ana, Marques-Silva, João, Lynce, Inês, and Oliveira, Arlindo
- Subjects
- *
NUCLEOTIDE sequence , *CHROMOSOME polymorphism , *PARSIMONIOUS models , *LINEAR programming , *BRANCH & bound algorithms - Abstract
The fast development of sequencing techniques in the recent past has required an urgent development of efficient and accurate haplotype inference tools. Besides being a crucial issue in genetics, haplotype inference is also a challenging computational problem. Among others, pure parsimony is a viable modeling approach to solve the problem of haplotype inference and also an interesting NP-hard problem in itself. Recently, the introduction of SAT-based methods, including pseudo-Boolean optimization (PBO) methods, has produced very efficient solvers. This paper provides a detailed description of RPoly, a PBO approach for the haplotype inference by pure parsimony (HIPP) problem. Moreover, an extensive evaluation of existent HIPP solvers, on a comprehensive set of instances, confirms that RPoly is currently the most efficient and robust HIPP approach. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
17. Efficient alignment of pyrosequencing reads for re-sequencing applications.
- Author
-
Fernandes, Francisco, da Fonseca, Paulo G. S., Russo, Luis M. S., Oliveira, Arlindo L., and Freitas, Ana T.
- Subjects
- *
NUCLEOTIDE sequence , *DNA , *GENOMES , *ALGORITHMS , *GENETICS - Abstract
Background: Over the past few years, new massively parallel DNA sequencing technologies have emerged. These platforms generate massive amounts of data per run, greatly reducing the cost of DNA sequencing. However, these techniques also raise important computational difficulties mostly due to the huge volume of data produced, but also because of some of their specific characteristics such as read length and sequencing errors. Among the most critical problems is that of efficiently and accurately mapping reads to a reference genome in the context of re-sequencing projects. Results: We present an efficient method for the local alignment of pyrosequencing reads produced by the GS FLX (454) system against a reference sequence. Our approach explores the characteristics of the data in these resequencing applications and uses state of the art indexing techniques combined with a flexible seed-based approach, leading to a fast and accurate algorithm which needs very little user parameterization. An evaluation performed using real and simulated data shows that our proposed method outperforms a number of mainstream tools on the quantity and quality of successful alignments, as well as on the execution time. Conclusions: The proposed methodology was implemented in a software tool called TAPyR-Tool for the Alignment of Pyrosequencing Reads-which is publicly available from http://www.tapyr.net. [ABSTRACT FROM AUTHOR]
- Published
- 2011
- Full Text
- View/download PDF
18. An analysis of the positional distribution of DNA motifs in promoter regions and its biological relevance.
- Author
-
Casimiro, Ana C., Vinga, Susana, Freitas, Ana T., and Oliveira, Arlindo L.
- Subjects
- *
ALGORITHMS , *DISTRIBUTION (Probability theory) , *DNA , *PROMOTERS (Genetics) , *NUCLEOTIDE sequence , *SACCHAROMYCES cerevisiae - Abstract
Background: Motif finding algorithms have developed in their ability to use computationally efficient methods to detect patterns in biological sequences. However the posterior classification of the output still suffers from some limitations, which makes it difficult to assess the biological significance of the motifs found. Previous work has highlighted the existence of positional bias of motifs in the DNA sequences, which might indicate not only that the pattern is important, but also provide hints of the positions where these patterns occur preferentially. Results: We propose to integrate position uniformity tests and over-representation tests to improve the accuracy of the classification of motifs. Using artificial data, we have compared three different statistical tests (Chi-Square, Kolmogorov-Smirnov and a Chi-Square bootstrap) to assess whether a given motif occurs uniformly in the promoter region of a gene. Using the test that performed better in this dataset, we proceeded to study the positional distribution of several well known cis-regulatory elements, in the promoter sequences of different organisms (S. cerevisiae, H. sapiens, D. melanogaster, E. coli and several Dicotyledons plants). The results show that position conservation is relevant for the transcriptional machinery. Conclusion: We conclude that many biologically relevant motifs appear heterogeneously distributed in the promoter region of genes, and therefore, that non-uniformity is a good indicator of biological relevance and can be used to complement over-representation tests commonly used. In this article we present the results obtained for the S. cerevisiae data sets. [ABSTRACT FROM AUTHOR]
- Published
- 2008
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.