437 results
Search Results
2. Learning Theories for Artificial Intelligence Promoting Learning Processes
- Author
-
Gibson, David, Kovanovic, Vitomir, Ifenthaler, Dirk, Dexter, Sara, and Feng, Shihui
- Abstract
This paper discusses a three-level model that synthesizes and unifies existing learning theories to model the roles of artificial intelligence (AI) in promoting learning processes. The model, drawn from developmental psychology, computational biology, instructional design, cognitive science, complexity and sociocultural theory, includes a causal learning mechanism that explains how learning occurs and works across micro, meso and macro levels. The model also explains how information gained through learning is aggregated, or brought together, as well as dissipated, or released and used within and across the levels. Fourteen roles for AI in education are proposed, aligned with the model's features: four roles at the individual or micro level, four roles at the meso level of teams and knowledge communities and six roles at the macro level of cultural historical activity. Implications for research and practice, evaluation criteria and a discussion of limitations are included. Armed with the proposed model, AI developers can focus their work with learning designers, researchers and practitioners to leverage the proposed roles to improve individual learning, team performance and building knowledge communities.
- Published
- 2023
- Full Text
- View/download PDF
3. Can Textbook Annotations Serve as an Early Predictor of Student Learning?
- Author
-
Winchell, Adam, Mozer, Michael, Lan, Andrew, Grimaldi, Phillip, and Pashler, Harold
- Abstract
When engaging with a textbook, students are inclined to highlight key content. Although students believe that highlighting and subsequent review of the highlights will further their educational goals, the psychological literature provides no evidence of benefits. Nonetheless, a student's choice of text for highlighting may serve as a window into their mental state--their level of comprehension, grasp of the key ideas, reading goals, etc. We explore this hypothesis via an experiment in which 198 participants read sections from a college-level biology text, briefly reviewed the text, and then took a quiz on the material. During initial reading, participants were able to highlight words, phrases, and sentences, and these highlights were displayed along with the complete text during the subsequent review. Consistent with past research, the amount of highlighted material is unrelated to quiz performance. However, our main goal is to examine highlighting as a data source for inferring student understanding. We explored multiple representations of the highlighting patterns and tested Bayesian linear regression and neural network models, but we found little or no relationship between a student's highlights and quiz performance. Our long-term goal is to design digital textbooks that serve not only as conduits of information into the mind of the reader, but also allow us to draw inferences about the reader at a point where interventions may increase the effectiveness of the material. [For the full proceedings, see ED593090.]
- Published
- 2018
4. Machine Learning and Hebrew NLP for Automated Assessment of Open-Ended Questions in Biology
- Author
-
Ariely, Moriah, Nazaretsky, Tanya, and Alexandron, Giora
- Abstract
Machine learning algorithms that automatically score scientific explanations can be used to measure students' conceptual understanding, identify gaps in their reasoning, and provide them with timely and individualized feedback. This paper presents the results of a study that uses Hebrew NLP to automatically score student explanations in Biology according to fine-grained analytic grading rubrics that were developed for formative assessment. The experimental results show that our algorithms achieve a high-level of agreement with human experts, on par with previous work on automated assessment of scientific explanations in English, and that [approximately]500 examples are typically enough to build reliable scoring models. The main contribution is twofold. First, we present a conceptual framework for constructing analytic grading rubrics for scientific explanations, which are composed of dichotomous categories that generalize across items. These categories are designed to support automated guidance, but can also be used to provide a composite score. Second, we apply this approach in a new context -- Hebrew, which belongs to a group of languages known as Morphologically-Rich. In languages of this group, among them also Arabic and Turkish, each input token may consist of multiple lexical and functional units, making them particularly challenging for NLP. This is the first study on automatic assessment of scientific explanations (and more generally, of open-ended questions) in Hebrew, and among the firsts to do so in Morphologically-Rich Languages.
- Published
- 2023
- Full Text
- View/download PDF
5. Meta-Analysis of EMF-Induced Pollution by COVID-19 in Virtual Teaching and Learning with an Artificial Intelligence Perspective
- Author
-
Das, Sanjita, Srivastava, Shilpa, Tripathi, Aprna, and Das, Saumya
- Abstract
Concerns about the health effects of frequent exposure to electromagnetic fields (EMF) emitted from mobile towers and handsets have been raised because of the gradual increase in usage of cell phones and frequent setting up of mobile towers. Present study is targeted to detrimental effects of EMF radiation on various biological systems mainly due to online teaching and learning process by suppressing the immune system. During COVID-19 pandemic the increased usage of internet due to online education and online office leads to more detrimental effects of EMF radiation. Further inculcation of soft computing techniques in EMF radiation has been presented. A literature review focusing on the usage of soft computing techniques in the domain of EMF radiation has been presented in the article. An online survey has been conducted targeting Indian academic stakeholders' (Specially Teachers, Students and Parents termed as population in paper) for analyzing the awareness towards the bio hazards of EMF exposure.
- Published
- 2022
- Full Text
- View/download PDF
6. Exploring What Is Encoded in Distributional Word Vectors: A Neurobiologically Motivated Analysis
- Author
-
Utsumi, Akira
- Abstract
The pervasive use of distributional semantic models or word embeddings for both cognitive modeling and practical application is because of their remarkable ability to represent the meanings of words. However, relatively little effort has been made to explore what types of information are encoded in distributional word vectors. Knowing the internal knowledge embedded in word vectors is important for cognitive modeling using distributional semantic models. Therefore, in this paper, we attempt to identify the knowledge encoded in word vectors by conducting a computational experiment using Binder et al.'s (2016) featural conceptual representations based on neurobiologically motivated attributes. In an experiment, these conceptual vectors are predicted from text-based word vectors using a neural network and linear transformation, and prediction performance is compared among various types of information. The analysis demonstrates that abstract information is generally predicted more accurately by word vectors than perceptual and spatiotemporal information, and specifically, the prediction accuracy of cognitive and social information is higher. Emotional information is also found to be successfully predicted for abstract words. These results indicate that language can be a major source of knowledge about abstract attributes, and they support the recent view that emphasizes the importance of language for abstract concepts. Furthermore, we show that word vectors can capture some types of perceptual and spatiotemporal information about concrete concepts and some relevant word categories. This suggests that language statistics can encode more perceptual knowledge than often expected.
- Published
- 2020
- Full Text
- View/download PDF
7. Reflective Writing about the Utility Value of Science as a Tool for Increasing STEM Motivation and Retention -- Can AI Help Scale Up?
- Author
-
Beigman Klebanov, Beata, Burstein, Jill, Harackiewicz, Judith M., Priniski, Stacy J., and Mulholland, Matthew
- Abstract
The integration of subject matter learning with reading and writing skills takes place in multiple ways. Students learn to read, interpret, and write texts in the discipline-relevant genres. However, writing can be used not only for the purposes of practice in professional communication, but also as an opportunity to reflect on the learned material. In this paper, we address a writing intervention--Utility Value (UV) intervention--that has been shown to be effective for promoting interest and retention in STEM subjects in laboratory studies and field experiments. We conduct a detailed investigation into the potential of natural language processing technology to support evaluation of such writing at scale: We devise a set of features that characterize UV writing across different genres, present common themes, and evaluate UV scoring models using essays on known and new biology topics. The automated UV scoring results are, we believe, promising, especially for the personal essay genre.
- Published
- 2017
- Full Text
- View/download PDF
8. The International Conference on Intelligent Biology and Medicine (ICIBM) 2018: genomics with bigger data and wider applications.
- Author
-
Wu Z, Yan J, Wang K, Liu X, Guo Y, Zhi D, Ruan J, and Zhao Z
- Subjects
- Humans, Artificial Intelligence, Biology methods, Medicine methods
- Abstract
The sixth International Conference on Intelligent Biology and Medicine (ICIBM) took place in Los Angeles, California, USA on June 10-12, 2018. This conference featured eleven regular scientific sessions, four tutorials, one poster session, four keynote talks, and four eminent scholar talks. The scientific program covered a wide range of topics from bench to bedside, including 3D Genome Organization, reconstruction of large scale evolution of genomes and gene functions, artificial intelligence in biological and biomedical fields, and precision medicine. Both method development and application in genomic research continued to be a main component in the conference, including studies on genetic variants, regulation of transcription, genetic-epigenetic interaction at both single cell and tissue level and artificial intelligence. Here, we write a summary of the conference and also briefly introduce the four high quality papers selected to be published in BMC Genomics that cover novel methodology development or innovative data analysis.
- Published
- 2019
- Full Text
- View/download PDF
9. Deep learning enables the atomic structure determination of the Fanconi Anemia core complex from cryoEM
- Author
-
Shabih Shakeel, Frank DiMaio, Daniel P. Farrell, David Baker, Anna Lauko, Ivan Anishchenko, and Lori A. Passmore
- Subjects
Computer science ,Protein subunit ,fanconi anemia core complex ,Interstrand crosslink ,Computational biology ,Biochemistry ,Convolutional neural network ,03 medical and health sciences ,0302 clinical medicine ,Fanconi anemia ,medicine ,Atomic model ,General Materials Science ,Protein secondary structure ,030304 developmental biology ,Physics ,0303 health sciences ,Crystallography ,biology ,business.industry ,Deep learning ,deep learning ,General Chemistry ,Condensed Matter Physics ,medicine.disease ,Research Papers ,Ubiquitin ligase ,cryoem ,QD901-999 ,biology.protein ,Protein folding ,Artificial intelligence ,distance predictions ,business ,Model building ,030217 neurology & neurosurgery - Abstract
This paper describes a method for determining an atomic model of a protein complex using moderate-resolution cryoEM data and distance predictions from deep learning., Cryo-electron microscopy of protein complexes often leads to moderate resolution maps (4–8 Å), with visible secondary-structure elements but poorly resolved loops, making model building challenging. In the absence of high-resolution structures of homologues, only coarse-grained structural features are typically inferred from these maps, and it is often impossible to assign specific regions of density to individual protein subunits. This paper describes a new method for overcoming these difficulties that integrates predicted residue distance distributions from a deep-learned convolutional neural network, computational protein folding using Rosetta, and automated EM-map-guided complex assembly. We apply this method to a 4.6 Å resolution cryoEM map of Fanconi Anemia core complex (FAcc), an E3 ubiquitin ligase required for DNA interstrand crosslink repair, which was previously challenging to interpret as it comprises 6557 residues, only 1897 of which are covered by homology models. In the published model built from this map, only 387 residues could be assigned to the specific subunits with confidence. By building and placing into density 42 deep-learning-guided models containing 4795 residues not included in the previously published structure, we are able to determine an almost-complete atomic model of FAcc, in which 5182 of the 6557 residues were placed. The resulting model is consistent with previously published biochemical data, and facilitates interpretation of disease-related mutational data. We anticipate that our approach will be broadly useful for cryoEM structure determination of large complexes containing many subunits for which there are no homologues of known structure.
- Published
- 2020
10. Clever Zone - An Interactive Mobile Learning Aid for Advanced Level Biology Students in Sri Lanka.
- Author
-
Chandula, Yasith, Jayashantha, Chaduni Nethmini, Udayantha, Nipuna, Jayasekara, Vishwa, Kasthuriarachchi, Sanvitha, and Rajendran, Karthiga
- Subjects
ARTIFICIAL intelligence ,TECHNOLOGICAL innovations ,INFORMATION technology ,MACHINE learning - Abstract
This research paper discusses the mobile application developed as a learning aid for A/L Biology students to enhance their learning experiences in the field of biology, specifically focusing on microbes, animal classification, and human body systems. The main feature of this includes an Artificial Intelligence (AI) chatbot that utilizes natural language processing and image recognition technologies to help students understand key biological topics. Extensive research was conducted to align the app's content with the Advanced Level Biology syllabus, ensuring its relevance to the curriculum. The app offers a user-friendly interface with engaging visuals and a carefully curated knowledge base, allowing students to explore and study independently. User research with A/L Biology students demonstrated that the app significantly improved comprehension and memory of the subject matter, while the chatbot's ability to provide accurate information and foster interactive learning was highly rated by participants. Overall, the A/L Biology mobile app demonstrates the potential of mobile technology and AI in revolutionizing biology education, providing an easy and interesting platform for students to improve their learning outcomes and achieve academic goals. By utilizing machine learning for various functions, A/L Biology students can greatly benefit from enhanced learning of complex subjects with an accuracy of more than 90% in every learning category. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
11. Visual discrimination and resolution in freshwater stingrays (Potamotrygon motoro)
- Author
-
Martha M M Daniel, Laura Alvermann, Vera Schluessel, and Imke Böök
- Subjects
Brightness ,Visual perception ,Visual acuity ,genetic structures ,Physiology ,Behavioral cognition ,Fresh Water ,Stimulus (physiology) ,Discrimination Learning ,Behavioral Neuroscience ,Cognition ,Memory ,Orientation ,medicine ,Learning ,Animals ,Ecology, Evolution, Behavior and Systematics ,Visual resolution ,Potamotrygon ,Original Paper ,Elasmobranch ,biology ,Behavior, Animal ,business.industry ,Shape ,Pattern recognition ,Memory retention ,biology.organism_classification ,Visual discrimination ,Visual Perception ,Animal Science and Zoology ,Artificial intelligence ,medicine.symptom ,Psychology ,business ,Elasmobranchii - Abstract
Potamotrygon motoro has been shown to use vision to orient in a laboratory setting and has been successfully trained in cognitive behavioral studies using visual stimuli. This study explores P. motoro’s visual discrimination abilities in the context of two-alternative forced-choice experiments, with a focus on shape and contrast, stimulus orientation, and visual resolution. Results support that stingrays are able to discriminate stimulus-presence and -absence, overall stimulus contrasts, two forms, horizontal from vertical stimulus orientations, and different colors that also vary in brightness. Stingrays tested in visual resolution experiments demonstrated a range of visual acuities from
- Published
- 2020
12. Humanization of antibodies using a machine learning approach on large-scale repertoire data
- Author
-
Claire Marks, Alissa M Hummer, Charlotte M. Deane, and Mark Chin
- Subjects
0301 basic medicine ,Statistics and Probability ,AcademicSubjects/SCI01060 ,Computer science ,medicine.drug_class ,Monoclonal antibody ,Machine learning ,computer.software_genre ,Biochemistry ,03 medical and health sciences ,0302 clinical medicine ,Variable domain ,medicine ,Set (psychology) ,Molecular Biology ,Supplementary data ,biology ,business.industry ,Scale (chemistry) ,Immunogenicity ,Repertoire ,Original Papers ,Computer Science Applications ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,biology.protein ,Artificial intelligence ,Antibody ,business ,Sequence Analysis ,computer ,030215 immunology - Abstract
Motivation Monoclonal antibody (mAb) therapeutics are often produced from non-human sources (typically murine), and can therefore generate immunogenic responses in humans. Humanization procedures aim to produce antibody therapeutics that do not elicit an immune response and are safe for human use, without impacting efficacy. Humanization is normally carried out in a largely trial-and-error experimental process. We have built machine learning classifiers that can discriminate between human and non-human antibody variable domain sequences using the large amount of repertoire data now available. Results Our classifiers consistently outperform the current best-in-class model for distinguishing human from murine sequences, and our output scores exhibit a negative relationship with the experimental immunogenicity of existing antibody therapeutics. We used our classifiers to develop a novel, computational humanization tool, Hu-mAb, that suggests mutations to an input sequence to reduce its immunogenicity. For a set of therapeutic antibodies with known precursor sequences, the mutations suggested by Hu-mAb show substantial overlap with those deduced experimentally. Hu-mAb is therefore an effective replacement for trial-and-error humanization experiments, producing similar results in a fraction of the time. Availability and implementation Hu-mAb (humanness scoring and humanization) is freely available to use at opig.stats.ox.ac.uk/webapps/humab. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2021
13. Positional encoding in cotton-top tamarins (Saguinus oedipus)
- Author
-
Natalie Shelton-May, Jessica R. Rogge, Elisabetta Versace, Andrea Ravignani, Artificial Intelligence, and Informatics and Applied Informatics
- Subjects
0106 biological sciences ,Male ,Similarity (geometry) ,Artificial grammar learning ,Computer science ,Movement ,Rule learning ,Experimental and Cognitive Psychology ,Cotton-top tamarins ,Relative position ,010603 evolutionary biology ,01 natural sciences ,050105 experimental psychology ,Task (project management) ,03 medical and health sciences ,0302 clinical medicine ,Generalization (learning) ,Encoding (memory) ,Animals ,Learning ,0501 psychology and cognitive sciences ,050102 behavioral science & comparative psychology ,Non-adjacent dependency ,Ecology, Evolution, Behavior and Systematics ,Mathematics ,Original Paper ,biology ,business.industry ,05 social sciences ,Pattern recognition ,biology.organism_classification ,Saguinus oedipus ,Positional rule ,Female ,Artificial intelligence ,business ,Absolute position ,Saguinus ,Reinforcement, Psychology ,030217 neurology & neurosurgery - Abstract
Strategies used in artificial grammar learning can shed light into the abilities of different species to extract regularities from the environment. In the A(X)nB rule, A and B items are linked but assigned to different positional categories and separated by distractor items. Open questions are how widespread is the ability to extract positional regularities from A(X)nB patterns, which strategies are used to encode positional regularities and whether individuals exhibit preferences for absolute or relative position encoding. We used visual arrays to investigate whether cotton-top tamarins (Saguinus oedipus) can learn this rule and which strategies they use. After training on a subset of exemplars, half of the tested monkeys successfully generalized to novel combinations. These tamarins discriminated between categories of tokens with different properties (A, B, X) and detected a positional relationship between non-adjacent items even in the presence of novel distractors. Generalization, though, was incomplete, since we observed a failure with items that during training had always been presented in reinforced arrays. The pattern of errors revealed that successful subjects used visual similarity with training stimuli to solve the task, and that tamarins extracted the relative position of As and Bs rather than their absolute position, similarly to what observed in other species. Relative position encoding appears to be the default strategy in different tasks and taxa.
- Published
- 2019
14. Combining artificial intelligence: deep learning with Hi-C data to predict the functional effects of non-coding variants
- Author
-
Xiang-He Meng, Hong-Mei Xiao, and Hong-Wen Deng
- Subjects
Statistics and Probability ,Quantitative Trait Loci ,Genome-wide association study ,Biology ,Biochemistry ,Polymorphism, Single Nucleotide ,03 medical and health sciences ,0302 clinical medicine ,Deep Learning ,Artificial Intelligence ,Molecular Biology ,030304 developmental biology ,Genetic association ,Sequence (medicine) ,0303 health sciences ,Mechanism (biology) ,business.industry ,Deep learning ,Original Papers ,Computer Science Applications ,Chromatin ,Computational Mathematics ,Computational Theory and Mathematics ,Expression quantitative trait loci ,Artificial intelligence ,business ,030217 neurology & neurosurgery ,Reference genome ,Genome-Wide Association Study - Abstract
Motivation Although genome-wide association studies (GWASs) have identified thousands of variants for various traits, the causal variants and the mechanisms underlying the significant loci are largely unknown. In this study, we aim to predict non-coding variants that may functionally affect translation initiation through long-range chromatin interaction. Results By incorporating the Hi-C data, we propose a novel and powerful deep learning model of artificial intelligence to classify interacting and non-interacting fragment pairs and predict the functional effects of sequence alteration of single nucleotide on chromatin interaction and thus on gene expression. The changes in chromatin interaction probability between the reference sequence and the altered sequence reflect the degree of functional impact for the variant. The model was effective and efficient with the classification of interacting and non-interacting fragment pairs. The predicted causal SNPs that had a larger impact on chromatin interaction were more likely to be identified by GWAS and eQTL analyses. We demonstrate that an integrative approach combining artificial intelligence—deep learning with high throughput experimental evidence of chromatin interaction leads to prioritizing the functional variants in disease- and phenotype-related loci and thus will greatly expedite uncover of the biological mechanism underlying the association identified in genomic studies. Availability and implementation Source code used in data preparing and model training is available at the GitHub website (https://github.com/biocai/DeepHiC). Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2020
15. CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence
- Author
-
Atara Posner, Andrew D Pattison, Joshy George, Carolyn A. Paisie, Stephen B. Fox, Yue Zhao, Kanwal Pratap Singh Raghav, Sheng Li, Honey V. Reddi, Anthony J. Gill, R. Krishna Murthy Karuturi, Ziwei Pan, Jens Rueter, Shiva Balachander, Richard W. Tothill, Sandeep Namburi, and William F. Flynn
- Subjects
0301 basic medicine ,Microarray ,Cancer-of-unknown-primary ,Cell of origin ,Inception model ,lcsh:Medicine ,Convolutional neural network ,Genomics ,Computational biology ,Biology ,General Biochemistry, Genetics and Molecular Biology ,Workflow ,03 medical and health sciences ,0302 clinical medicine ,Breast cancer ,Artificial Intelligence ,Databases, Genetic ,Machine learning ,Biomarkers, Tumor ,Carcinoma ,medicine ,Humans ,Neoplasm Metastasis ,Gene ,Cancer ,Hyperparameter ,lcsh:R5-920 ,lcsh:R ,Computational Biology ,Reproducibility of Results ,Deep learning ,General Medicine ,TCGA ,medicine.disease ,Classification ,Random forest ,030104 developmental biology ,030220 oncology & carcinogenesis ,Neoplasms, Unknown Primary ,RNA ,Cell-of-origin ,Neural Networks, Computer ,lcsh:Medicine (General) ,Algorithms ,Software ,Research Paper - Abstract
Background Cancer of unknown primary (CUP), representing approximately 3-5% of all malignancies, is defined as metastatic cancer where a primary site of origin cannot be found despite a standard diagnostic workup. Because knowledge of a patient's primary cancer remains fundamental to their treatment, CUP patients are significantly disadvantaged and most have a poor survival outcome. Developing robust and accessible diagnostic methods for resolving cancer tissue of origin, therefore, has significant value for CUP patients. Methods We developed an RNA-based classifier called CUP-AI-Dx that utilizes a 1D Inception convolutional neural network (1D-Inception) model to infer a tumor's primary tissue of origin. CUP-AI-Dx was trained using the transcriptional profiles of 18,217 primary tumours representing 32 cancer types from The Cancer Genome Atlas project (TCGA) and International Cancer Genome Consortium (ICGC). Gene expression data was ordered by gene chromosomal coordinates as input to the 1D-CNN model, and the model utilizes multiple convolutional kernels with different configurations simultaneously to improve generality. The model was optimized through extensive hyperparameter tuning, including different max-pooling layers and dropout settings. For 11 tumour types, we also developed a random forest model that can classify the tumour's molecular subtype according to prior TCGA studies. The optimised CUP-AI-Dx tissue of origin classifier was tested on 394 metastatic samples from 11 tumour types from TCGA and 92 formalin-fixed paraffin-embedded (FFPE) samples representing 18 cancer types from two clinical laboratories. The CUP-AI-Dx molecular subtype was also independently tested on independent ovarian and breast cancer microarray datasets Findings CUP-AI-Dx identifies the primary site with an overall top-1-accuracy of 98.54% in cross-validation and 96.70% on a test dataset. When applied to two independent clinical-grade RNA-seq datasets generated from two different institutes from the US and Australia, our model predicted the primary site with a top-1-accuracy of 86.96% and 72.46% respectively. Interpretation The CUP-AI-Dx predicts tumour primary site and molecular subtype with high accuracy and therefore can be used to assist the diagnostic work-up of cancers of unknown primary or uncertain origin using a common and accessible genomics platform. Funding NIH R35 GM133562, NCI P30 CA034196, Victorian Cancer Agency Australia.
- Published
- 2020
16. Web- and Artificial Intelligence–Based Image Recognition For Sperm Motility Analysis: Verification Study
- Author
-
Ju-Ton Hsieh, Yuan-Hung Pong, Hong-Chiang Chang, Vincent F.S. Tsai, and Bin Zhuang
- Subjects
Infertility ,endocrine system ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Motility ,Health Informatics ,Semen analysis ,Biology ,smartphone ,semen analysis ,Male infertility ,03 medical and health sciences ,0302 clinical medicine ,Health Information Management ,medicine ,Computer vision ,reproductive and urinary physiology ,home sperm test ,Sperm motility ,030304 developmental biology ,Original Paper ,0303 health sciences ,030219 obstetrics & reproductive medicine ,medicine.diagnostic_test ,business.industry ,urogenital system ,cloud computing ,Motile sperm ,artificial intelligence ,medicine.disease ,Sperm ,telemedicine ,Artificial intelligence ,business ,Treatment monitoring - Abstract
Background Human sperm quality fluctuates over time. Therefore, it is crucial for couples preparing for natural pregnancy to monitor sperm motility. Objective This study verified the performance of an artificial intelligence–based image recognition and cloud computing sperm motility testing system (Bemaner, Createcare) composed of microscope and microfluidic modules and designed to adapt to different types of smartphones. Methods Sperm videos were captured and uploaded to the cloud with an app. Analysis of sperm motility was performed by an artificial intelligence–based image recognition algorithm then results were displayed. According to the number of motile sperm in the vision field, 47 (deidentified) videos of sperm were scored using 6 grades (0-5) by a male-fertility expert with 10 years of experience. Pearson product-moment correlation was calculated between the grades and the results (concentration of total sperm, concentration of motile sperm, and motility percentage) computed by the system. Results Good correlation was demonstrated between the grades and results computed by the system for concentration of total sperm (r=0.65, P Conclusions This smartphone-based sperm motility test (Bemaner) accurately measures motility-related parameters and could potentially be applied toward the following fields: male infertility detection, sperm quality test during preparation for pregnancy, and infertility treatment monitoring. With frequent at-home testing, more data can be collected to help make clinical decisions and to conduct epidemiological research.
- Published
- 2020
17. Deep learning-based clustering robustly identified two classes of sepsis with both prognostic and predictive values
- Author
-
Lifeng Xing, Pengpeng Chen, Huiqing Ge, Qing Pan, Yucai Hong, and Zhongheng Zhang
- Subjects
Male ,Research paper ,lcsh:Medicine ,Computational biology ,Biology ,General Biochemistry, Genetics and Molecular Biology ,Sepsis ,Endotype ,Deep Learning ,Predictive Value of Tests ,medicine ,Cluster Analysis ,Humans ,Cluster analysis ,Gene expression omnibus ,lcsh:R5-920 ,business.industry ,Deep learning ,Gene Expression Profiling ,lcsh:R ,Computational Biology ,General Medicine ,Autoencoder ,medicine.disease ,Prognosis ,Predictive value ,Gene expression profiling ,ROC Curve ,Cohort ,Female ,Artificial intelligence ,Disease Susceptibility ,business ,lcsh:Medicine (General) ,Transcriptome ,Algorithms ,Biomarkers - Abstract
Background: Sepsis is a heterogenous syndrome and individualized management strategy is the key to successful treatment. Genome wide expression profiling has been utilized for identifying subclasses of sepsis, but the clinical utility of these subclasses was limited because of the classification instability, and the lack of a robust class prediction model with extensive external validation. The study aimed to develop a parsimonious class model for the prediction of class membership and validate the model for its prognostic and predictive capability in external datasets. Methods: The Gene Expression Omnibus (GEO) and ArrayExpress databases were searched from inception to April 2020. Datasets containing whole blood gene expression profiling in adult sepsis patients were included. Autoencoder was used to extract representative features for k-means clustering. Genetic algorithms (GA) were employed to derive a parsimonious 5-gene class prediction model. The class model was then applied to external datasets (n = 780) to evaluate its prognostic and predictive performance. Findings: A total of 12 datasets involving 1613 patients were included. Two classes were identified in the discovery cohort (n = 685). Class 1 was characterized by immunosuppression with higher mortality than class 2 (21.8% [70/321] vs. 12.1% [44/364]; p < 0.01 for Chi-square test). A 5-gene class model (C14orf159, AKNA, PILRA, STOM and USP4) was developed with GA. In external validation cohorts, the 5-gene class model (AUC: 0.707; 95% CI: 0.664 – 0.750) performed better in predicting mortality than sepsis response signature (SRS) endotypes (AUC: 0.610; 95% CI: 0.521 – 0.700), and performed equivalently to the APACHE II score (AUC: 0.681; 95% CI: 0.595 – 0.767). In the dataset E-MTAB-7581, the use of hydrocortisone was associated with increased risk of mortality (OR: 3.15 [1.13, 8.82]; p = 0.029) in class 2. The effect was not statistically significant in class 1 (OR: 1.88 [0.70, 5.09]; p = 0.211). Interpretation: Our study identified two classes of sepsis that showed different mortality rates and responses to hydrocortisone therapy. Class 1 was characterized by immunosuppression with higher mortality rate than class 2. We further developed a 5-gene class model to predict class membership. Funding: The study was funded by the National Natural Science Foundation of China (Grant No. 81,901,929).
- Published
- 2020
18. Classification and Recall With Binary Hyperdimensional Computing: Tradeoffs in Choice of Density and Mapping Characteristics.
- Author
-
Kleyko, Denis, Rahimi, Abbas, Rachkovskij, Dmitri A., Osipov, Evgeny, and Rabaey, Jan M.
- Subjects
ARTIFICIAL neural networks ,MACHINE learning ,ARTIFICIAL intelligence - Abstract
Hyperdimensional (HD) computing is a promising paradigm for future intelligent electronic appliances operating at low power. This paper discusses tradeoffs of selecting parameters of binary HD representations when applied to pattern recognition tasks. Particular design choices include density of representations and strategies for mapping data from the original representation. It is demonstrated that for the considered pattern recognition tasks (using synthetic and real-world data) both sparse and dense representations behave nearly identically. This paper also discusses implementation peculiarities which may favor one type of representations over the other. Finally, the capacity of representations of various densities is discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
19. New methods of removing debris and high-throughput counting of cyst nematode eggs extracted from field soil
- Author
-
Upender Kalwa, Gregory L. Tylka, Santosh Pandey, Elizabeth Wlezien, and Christopher Legner
- Subjects
0106 biological sciences ,Sucrose ,Microfluidics ,Soybean cyst nematode ,Holography ,Video Recording ,Centrifugation ,Disaccharides ,01 natural sciences ,Quantitative Biology - Quantitative Methods ,Machine Learning ,Soil ,Filter Paper ,Computer software ,Quantitative Methods (q-bio.QM) ,Video recording ,0303 health sciences ,Multidisciplinary ,biology ,Heterodera ,Organic Compounds ,Image and Video Processing (eess.IV) ,Laboratory Equipment ,Horticulture ,Separation Processes ,Chemistry ,Physical Sciences ,embryonic structures ,Engineering and Technology ,Medicine ,Fluidics ,Algorithms ,Research Article ,Density Gradient Centrifugation ,Computer and Information Sciences ,Soil test ,Imaging Techniques ,Science ,Carbohydrates ,Equipment ,Research and Analysis Methods ,Computer Software ,03 medical and health sciences ,Deep Learning ,Artificial Intelligence ,Field soil ,FOS: Electrical engineering, electronic engineering, information engineering ,Animals ,Tylenchoidea ,030304 developmental biology ,Ovum ,Organic Chemistry ,Chemical Compounds ,Electrical Engineering and Systems Science - Image and Video Processing ,biology.organism_classification ,Debris ,Nematode ,FOS: Biological sciences ,Parasitology ,Software ,010606 plant biology & botany - Abstract
The soybean cyst nematode (SCN), Heterodera glycines, is the most damaging pathogen of soybeans in the United States. To assess the severity of nematode infestations in the field, SCN egg population densities are determined. Cysts (dead females) of the nematode must be extracted from soil samples and then ground to extract the eggs within. Sucrose centrifugation commonly is used to separate debris from suspensions of extracted nematode eggs. We present a method using OptiPrep as a density gradient medium with improved separation and recovery of extracted eggs compared to the sucrose centrifugation technique. Also, computerized methods were developed to automate the identification and counting of nematode eggs from the processed samples. In one approach, a high-resolution scanner was used to take static images of extracted eggs and debris on filter papers, and a deep learning network was trained to identify and count the eggs among the debris. In the second approach, a lensless imaging setup was developed using off-the-shelf components, and the processed egg samples were passed through a microfluidic flow chip made from double-sided adhesive tape. Holographic videos were recorded of the passing eggs and debris, and the videos were reconstructed and processed by custom software program to obtain egg counts. The performance of the software programs for egg counting was characterized with SCN-infested soil collected from two farms, and the results using these methods were compared with those obtained through manual counting.
- Published
- 2019
20. Is it the time of autophagy fine-tuners for neuroprotection?
- Author
-
David Romeo-Guitart, Sara Marmolejo-Martínez-Artesero, and Caty Casas
- Subjects
0301 basic medicine ,Acamprosate ,Biology ,Neuroprotection ,Rats, Sprague-Dawley ,03 medical and health sciences ,Phosphatidylinositol 3-Kinases ,PARP1 ,Sirtuin 1 ,Neuronal damage ,Artificial Intelligence ,Ribavirin ,medicine ,Autophagy ,Animals ,SIRT1/AKT/FOXO3a ,neonatal motoneurons ,Molecular Biology ,Cells, Cultured ,Motor Neurons ,030102 biochemistry & molecular biology ,Neurodegeneration ,Forkhead Box Protein O3 ,Cell Biology ,medicine.disease ,Autophagic Punctum ,Rats ,Disease Models, Animal ,Drug Combinations ,030104 developmental biology ,Neuroprotective Agents ,neuroprotection ,NeuroHeal ,Neuroscience ,Research Paper ,Signal Transduction - Abstract
Rationale: Protective mechanisms allow healthy neurons to cope with diverse stresses. Excessive damage as well as aging can lead to defective functioning of these mechanisms. We recently designed NeuroHeal using artificial intelligence with the goal of bolstering endogenous neuroprotective mechanisms. Understanding the key nodes involved in neuroprotection will allow us to identify even more effective strategies for treatment of neurodegenerative diseases. Methods: We used a model of peripheral nerve axotomy in rat pups, that induces retrograde apoptotic death of motoneurons. Nourishing mothers received treatment with vehicle, NeuroHeal or NeuroHeal plus nicotinamide, an inhibitor of sirtuins, and analysis of the pups were performed by immunohistochemistry, electron microscopy, and immunoblotting. In vitro, the post-translational status of proteins of interest was detailed using organotypic spinal cord cultures and genetic modifications in cell lines to unravel the neuroprotective mechanisms involved. Results: We found that the concomitant activation of the NAD+-dependent deacetylase SIRT1 and the PI3K/AKT signaling pathway converge to increase the presence of deacetylated and phosphorylated FOXO3a, a transcription factor, in the nucleus. This favors the activation of autophagy, a pro-survival process, and prevents pro-apoptotic PARP1/2 cleavage. Major conclusion: NeuroHeal is a neuroprotective agent for neonatal motoneurons that fine-tunes autophagy on by converging SIRT1/AKT/FOXO3a axis. NeuroHeal is a combo of repurposed drugs that allow its readiness for prospective pediatric use.
- Published
- 2020
21. Tumour budding, poorly differentiated clusters, and T-cell response in colorectal cancer
- Author
-
David J. Papke, Tomotaka Ugai, Shanshan Shi, Jeffrey A. Meyerhardt, Melissa Zhao, Annacarolina da Silva, Hongmei Nan, Jonathan A. Nowak, Marios Giannakis, Shuji Ogino, Naohiko Akimoto, Tyler S. Twombly, Andrew T. Chan, Xuehong Zhang, Mai Chan Lau, Mingyang Song, Koichiro Haruki, Simeng Gu, Juha P. Väyrynen, Kenji Fujiyoshi, Kota Arima, Jennifer Borowsky, Kana Wu, Jochen K. Lennerz, Junko Kishikawa, and Charles S. Fuchs
- Subjects
0301 basic medicine ,Research paper ,Colorectal cancer ,epithelial mesenchymal transition ,lcsh:Medicine ,PDCs, poorly differentiated clusters ,artificial intelligence, clinical outcomes ,0302 clinical medicine ,HPFS, Health Professionals Follow-up Study ,PCR, polymerase chain reaction ,Cytotoxic T cell ,host-tumour interaction ,lcsh:R5-920 ,Tissue microarray ,ITBCC, International Tumour Budding Consensus Conference ,TNM, tumour, node, and metastases ,General Medicine ,MSI, microsatellite instability ,artificial intelligence ,Prognosis ,clinical outcomes ,FFPE, formalin-fixed paraffin-embedded ,CMS, consensus molecular subtype ,030220 oncology & carcinogenesis ,AJCC, American Joint Committee on Cancer ,Adenocarcinoma ,lcsh:Medicine (General) ,Colorectal Neoplasms ,Biology ,EMT, epithelial mesenchymal transition ,PTPRC ,General Biochemistry, Genetics and Molecular Biology ,03 medical and health sciences ,Molecular pathological epidemiology ,medicine ,TMA, tissue microarray ,NHS, Nurses’ Health Study ,Humans ,LINE-1, long-interspersed nucleotide element-1 ,adenocarcinoma ,lcsh:R ,Microsatellite instability ,IPW, inverse probability weighting ,medicine.disease ,HR, hazard ratio ,CI, confidence interval ,OR, odds ratio ,030104 developmental biology ,molecular pathological epidemiology ,CIMP, CpG island methylator phenotype ,Cancer research ,biology.protein ,SD, standard deviation ,CD8 - Abstract
Background/Objectives: Tumour budding and poorly differentiated clusters (PDC) represent forms of tumour invasion. We hypothesised that T-cell densities (reflecting adaptive anti-tumour immunity) might be inversely associated with tumour budding and PDC in colorectal carcinoma. Methods: Utilising 915 colon and rectal carcinomas in two U.S.-wide prospective cohort studies, and multiplex immunofluorescence combined with machine learning algorithms, we assessed CD3, CD4, CD8, CD45RO (PTPRC), and FOXP3 co-expression patterns in lymphocytes. Tumour budding and PDC at invasive fronts were quantified by digital pathology and image analysis using the International tumour Budding Consensus Conference criteria. Using covariate data of 4,420 incident colorectal cancer cases, inverse probability weighting (IPW) was integrated with multivariable logistic regression analysis that assessed the association of T-cell subset densities with tumour budding and PDC while adjusting for selection bias due to tissue availability and potential confounders, including microsatellite instability status. Findings: Tumour budding counts were inversely associated with density of CD3+CD8+ [lowest vs. highest: multivariable odds ratio (OR), 0.50; 95% confidence interval (CI), 0.35–0.70; Ptrend < 0.001] and CD3+CD8+CD45RO+ cells (lowest vs. highest: multivariable OR, 0.44; 95% CI, 0.31–0.63; Ptrend < 0.001) in tumour epithelial region. Tumour budding levels were associated with higher colorectal cancer-specific mortality (multivariable hazard ratio, 2.13; 95% CI, 1.57–2.89; Ptrend < 0.001) in Cox regression analysis. There were no significant associations of PDC with T-cell subsets. Interpretation: Tumour epithelial naïve and memory cytotoxic T cell densities are inversely associated with tumour budding at invasive fronts, suggesting that cytotoxic anti-tumour immunity suppresses tumour microinvasion.
- Published
- 2020
22. Enzyme promiscuity prediction using hierarchy-informed multi-label classification
- Author
-
Michael C. Hughes, Soha Hassoun, and Gian Marco Visani
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Computer Science - Machine Learning ,AcademicSubjects/SCI01060 ,Computer science ,Machine learning ,computer.software_genre ,Biochemistry ,Machine Learning (cs.LG) ,03 medical and health sciences ,Cell Behavior (q-bio.CB) ,Code (cryptography) ,Molecule ,Molecular Biology ,030304 developmental biology ,Multi-label classification ,chemistry.chemical_classification ,0303 health sciences ,Hierarchy (mathematics) ,biology ,business.industry ,Systems Biology ,030302 biochemistry & molecular biology ,Frame (networking) ,Enzyme Commission number ,Original Papers ,Computer Science Applications ,Computational Mathematics ,Enzyme ,Computational Theory and Mathematics ,chemistry ,FOS: Biological sciences ,biology.protein ,Quantitative Biology - Cell Behavior ,Enzyme promiscuity ,Artificial intelligence ,business ,computer - Abstract
As experimental efforts are costly and time consuming, computational characterization of enzyme capabilities is an attractive alternative. We present and evaluate several machine-learning models to predict which of 983 distinct enzymes, as defined via the Enzyme Commission, EC, numbers, are likely to interact with a given query molecule. Our data consists of enzyme-substrate interactions from the BRENDA database. Some interactions are attributed to natural selection and involve the enzyme's natural substrates. The majority of the interactions however involve non-natural substrates, thus reflecting promiscuous enzymatic activities. We frame this enzyme promiscuity prediction problem as a multi-label classification task. We maximally utilize inhibitor and unlabelled data to train prediction models that can take advantage of known hierarchical relationships between enzyme classes. We report that a hierarchical multi-label neural network, EPP-HMCNF, is the best model for solving this problem, outperforming k-nearest neighbors similarity-based and other machine learning models. We show that inhibitor information during training consistently improves predictive power, particularly for EPP-HMCNF. We also show that all promiscuity prediction models perform worse under a realistic data split when compared to a random data split, and when evaluating performance on non-natural substrates compared to natural substrates. We provide Python code for EPP-HMCNF and other models in a repository termed EPP (Enzyme Promiscuity Prediction) at https://github.com/hassounlab/EPP., Presented as a poster at the 2019 Machine Learning for Computational Biology Symposium, Vancouver, CA Accepted for publication, Bioinformatics, Jan 22, 2021
- Published
- 2020
23. Quantitative scoring of epithelial and mesenchymal qualities of cancer cells using machine learning and quantitative phase imaging
- Author
-
Thanh Nguyen, Vy Bui, George Nehmetallah, Van K. Lam, Christopher B. Raub, Lin-Ching Chang, and Byung Min Chung
- Subjects
Paper ,Cell ,Biomedical Engineering ,Gingiva ,Holography ,epithelial ,Breast Neoplasms ,Biology ,mesenchymal ,Machine learning ,computer.software_genre ,01 natural sciences ,Imaging ,010309 optics ,Biomaterials ,Machine Learning ,Breast cancer ,0103 physical sciences ,medicine ,Humans ,support vector machine ,quantitative phase ,Fibroblast ,business.industry ,Mesenchymal stem cell ,Cancer ,Epithelial Cells ,Mesenchymal Stem Cells ,Fibroblasts ,medicine.disease ,Phenotype ,Atomic and Molecular Physics, and Optics ,Electronic, Optical and Magnetic Materials ,medicine.anatomical_structure ,Cell culture ,Cancer cell ,cancer cells ,MCF-7 Cells ,Female ,Artificial intelligence ,business ,computer ,Algorithms - Abstract
Significance: We introduce an application of machine learning trained on optical phase features of epithelial and mesenchymal cells to grade cancer cells’ morphologies, relevant to evaluation of cancer phenotype in screening assays and clinical biopsies. Aim: Our objective was to determine quantitative epithelial and mesenchymal qualities of breast cancer cells through an unbiased, generalizable, and linear score covering the range of observed morphologies. Approach: Digital holographic microscopy was used to generate phase height maps of noncancerous epithelial (Gie-No3B11) and fibroblast (human gingival) cell lines, as well as MDA-MB-231 and MCF-7 breast cancer cell lines. Several machine learning algorithms were evaluated as binary classifiers of the noncancerous cells that graded the cancer cells by transfer learning. Results: Epithelial and mesenchymal cells were classified with 96% to 100% accuracy. Breast cancer cells had scores in between the noncancer scores, indicating both epithelial and mesenchymal morphological qualities. The MCF-7 cells skewed toward epithelial scores, while MDA-MB-231 cells skewed toward mesenchymal scores. Linear support vector machines (SVMs) produced the most distinct score distributions for each cell line. Conclusions: The proposed epithelial–mesenchymal score, derived from linear SVM learning, is a sensitive and quantitative approach for detecting epithelial and mesenchymal characteristics of unknown cells based on well-characterized cell lines. We establish a framework for rapid and accurate morphological evaluation of single cells and subtle phenotypic shifts in imaged cell populations.
- Published
- 2020
24. Quantitative single-cell transcriptomics
- Author
-
Wolfgang Enard, Ines Hellmann, Christoph Ziegenhain, Swati Parekh, and Beate Vieth
- Subjects
0301 basic medicine ,Normalization (statistics) ,Differential expression analysis ,DNA, Complementary ,Process (engineering) ,Single cell transcriptomics ,power analysis ,Cell Separation ,Biology ,Machine learning ,computer.software_genre ,Biochemistry ,03 medical and health sciences ,transcriptomics ,0302 clinical medicine ,Genetics ,Molecular Biology ,Research question ,Data processing ,single-cell RNA-seq ,business.industry ,Sequence Analysis, RNA ,Gene Expression Profiling ,General Medicine ,Molecular network ,030104 developmental biology ,normalization ,differential expression analysis ,Papers ,Benchmark (computing) ,Artificial intelligence ,Single-Cell Analysis ,business ,computer ,030217 neurology & neurosurgery - Abstract
Single-cell RNA sequencing (scRNA-seq) is currently transforming our understanding of biology, as it is a powerful tool to resolve cellular heterogeneity and molecular networks. Over 50 protocols have been developed in recent years and also data processing and analyzes tools are evolving fast. Here, we review the basic principles underlying the different experimental protocols and how to benchmark them. We also review and compare the essential methods to process scRNA-seq data from mapping, filtering, normalization and batch corrections to basic differential expression analysis. We hope that this helps to choose appropriate experimental and computational methods for the research question at hand.
- Published
- 2018
25. Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data
- Author
-
Daniel J. Wilson, A. Sarah Walker, Timothy M Walker, Zamin Iqbal, E. Grace Smith, Tim E. A. Peto, Yang Yang, Tingting Zhu, Derrick W. Crook, Katherine E. Niehaus, and David A. Clifton
- Subjects
0301 basic medicine ,Statistics and Probability ,Ofloxacin ,Tuberculosis ,030106 microbiology ,Moxifloxacin ,Antitubercular Agents ,Drug resistance ,Microbial Sensitivity Tests ,Machine learning ,computer.software_genre ,Biochemistry ,Mycobacterium tuberculosis ,Machine Learning ,03 medical and health sciences ,Ciprofloxacin ,Tuberculosis, Multidrug-Resistant ,medicine ,Isoniazid ,Humans ,Molecular Biology ,Ethambutol ,biology ,business.industry ,Sequence Analysis, DNA ,Pyrazinamide ,medicine.disease ,biology.organism_classification ,bacterial infections and mycoses ,Original Papers ,3. Good health ,Computer Science Applications ,Computational Mathematics ,030104 developmental biology ,Computational Theory and Mathematics ,Streptomycin ,Artificial intelligence ,Rifampin ,business ,computer ,Sequence Analysis ,Rifampicin ,medicine.drug - Abstract
Motivation Correct and rapid determination of Mycobacterium tuberculosis (MTB) resistance against available tuberculosis (TB) drugs is essential for the control and management of TB. Conventional molecular diagnostic test assumes that the presence of any well-studied single nucleotide polymorphisms is sufficient to cause resistance, which yields low sensitivity for resistance classification. Summary Given the availability of DNA sequencing data from MTB, we developed machine learning models for a cohort of 1839 UK bacterial isolates to classify MTB resistance against eight anti-TB drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, ciprofloxacin, moxifloxacin, ofloxacin, streptomycin) and to classify multi-drug resistance. Results Compared to previous rules-based approach, the sensitivities from the best-performing models increased by 2-4% for isoniazid, rifampicin and ethambutol to 97% (P < 0.01), respectively; for ciprofloxacin and multi-drug resistant TB, they increased to 96%. For moxifloxacin and ofloxacin, sensitivities increased by 12 and 15% from 83 and 81% based on existing known resistance alleles to 95% and 96% (P < 0.01), respectively. Particularly, our models improved sensitivities compared to the previous rules-based approach by 15 and 24% to 84 and 87% for pyrazinamide and streptomycin (P < 0.01), respectively. The best-performing models increase the area-under-the-ROC curve by 10% for pyrazinamide and streptomycin (P < 0.01), and 4–8% for other drugs (P < 0.01). Availability and implementation The details of source code are provided at http://www.robots.ox.ac.uk/~davidc/code.php. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2017
26. CIPPN: computational identification of protein pupylation sites by using neural network
- Author
-
Zhu-Hong You, Wenzheng Bao, and De-Shuang Huang
- Subjects
0301 basic medicine ,disease ,030102 biochemistry & molecular biology ,Artificial neural network ,business.industry ,Systems biology ,Protein pupylation ,Biology ,Machine learning ,computer.software_genre ,Field (computer science) ,post translational modification ,03 medical and health sciences ,Identification (information) ,030104 developmental biology ,Information engineering ,Oncology ,classification ,Key (cryptography) ,Posttranslational modification ,Artificial intelligence ,business ,computer ,Research Paper - Abstract
// Wenzheng Bao 1, * , Zhu-Hong You 2, * and De-Shuang Huang 1 1 Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China 2 Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China * The first two authors should be regarded as joint First Authors Correspondence to: De-Shuang Huang, email: dshuang@tongji.edu.cn Keywords: disease; post translational modification; classification Received: July 14, 2017 Accepted: September 03, 2017 Published: November 06, 2017 ABSTRACT Recently, experiments revealed the pupylation to be a signal for the selective regulation of proteins in several serious human diseases. As one of the most significant post translational modification in the field of biology and disease, pupylation has the ability to playing the key role in the regulation various diseases’ biological processes. Meanwhile, effectively identification such type modification will be helpful for proteins to perform their biological functions and contribute to understanding the molecular mechanism, which is the foundation of drug design. The existing algorithms of identification such types of modified sites often have some defects, such as low accuracy and time-consuming. In this research, the pupylation sites’ identification model, CIPPN, demonstrates better performance than other existing approaches in this field. The proposed predictor achieves Acc value of 89.12 and Mcc value of 0.7949 in 10-fold cross-validation tests in the Pupdb Database ( http://cwtung.kmu.edu.tw/pupdb ). Significantly, such algorithm not only investigates the sequential, structural and evolutionary hallmarks around pupylation sites but also compares the differences of pupylation from the environmental, conservative and functional characterization of substrates. Therefore, the proposed feature description approach and algorithm results prove to be useful for further experimental investigation of such modification’s identification.
- Published
- 2017
27. Computational tools for plant small RNA detection and categorization
- Author
-
Frank Johannes, Lionel Morgado, and Bioinformatics
- Subjects
Paper ,Small RNA ,sRNA function prediction ,0206 medical engineering ,TARGET PREDICTION ,02 engineering and technology ,Biology ,Machine learning ,computer.software_genre ,MIRNA ,CLASSIFICATION ,Machine Learning ,03 medical and health sciences ,sRNA sequencing ,RNA, Small Interfering ,Molecular Biology ,DNA METHYLATION ,030304 developmental biology ,0303 health sciences ,IDENTIFICATION ,business.industry ,Sequence Analysis, RNA ,MICRORNA ,Cellular Regulation ,Computational Biology ,High-Throughput Nucleotide Sequencing ,sRNA structural features ,Plants ,ARABIDOPSIS ,ddc ,Plant development ,MicroRNAs ,SEQ ,Categorization ,RNA, Plant ,WEB SERVER ,NATURAL ANTISENSE TRANSCRIPTS ,Identification (biology) ,Artificial intelligence ,business ,computer ,small RNA categorization ,020602 bioinformatics ,Algorithms ,Software ,Information Systems - Abstract
Small RNAs (sRNAs) are important short-length molecules with regulatory functions essential for plant development and plasticity. High-throughput sequencing of total sRNA populations has revealed that the largest share of sRNA remains uncategorized. To better understand the role of sRNA-mediated cellular regulation, it is necessary to create accurate and comprehensive catalogues of sRNA and their sequence features, a task that currently relies on nontrivial bioinformatic approaches. Although a large number of computational tools have been developed to predict features of sRNA sequences, these tools are mostly dedicated to microRNAs and none integrates the functionalities necessary to describe units from all sRNA pathways thus far discovered in plants. Here, we review the different classes of sRNA found in plants and describe available bioinformatics tools that can help in their detection and categorization.
- Published
- 2019
28. Antibody Complementarity Determining Region Design Using High-Capacity Machine Learning
- Author
-
Ziheng Wang, Michael E. Birnbaum, Stefan Ewert, Haoyang Zeng, Brandon Carter, Ge Liu, Jonas Schilz, David K. Gifford, Jonas Mueller, and Geraldine Horny
- Subjects
Statistics and Probability ,Phage display ,Computer science ,030303 biophysics ,Complementarity determining region ,Machine learning ,computer.software_genre ,Biochemistry ,Human Immunoglobulin G ,Antibodies ,Machine Learning ,03 medical and health sciences ,0302 clinical medicine ,Code (cryptography) ,Molecule ,Humans ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Protein therapeutics ,biology ,business.industry ,High capacity ,Accession number (bioinformatics) ,Antigen binding ,Complementarity Determining Regions ,Original Papers ,Structural Bioinformatics ,In vitro ,3. Good health ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,030220 oncology & carcinogenesis ,Path (graph theory) ,biology.protein ,Artificial intelligence ,Antibody ,business ,computer - Abstract
Motivation The precise targeting of antibodies and other protein therapeutics is required for their proper function and the elimination of deleterious off-target effects. Often the molecular structure of a therapeutic target is unknown and randomized methods are used to design antibodies without a model that relates antibody sequence to desired properties. Results Here, we present Ens-Grad, a machine learning method that can design complementarity determining regions of human Immunoglobulin G antibodies with target affinities that are superior to candidates derived from phage display panning experiments. We also demonstrate that machine learning can improve target specificity by the modular composition of models from different experimental campaigns, enabling a new integrative approach to improving target specificity. Our results suggest a new path for the discovery of therapeutic molecules by demonstrating that predictive and differentiable models of antibody binding can be learned from high-throughput experimental data without the need for target structural data. Availability and implementation Sequencing data of the phage panning experiment are deposited at NIH’s Sequence Read Archive (SRA) under the accession number SRP158510. We make our code available at https://github.com/gifford-lab/antibody-2019. Supplementary information Supplementary data are available at Bioinformatics online.
- Published
- 2019
- Full Text
- View/download PDF
29. Dual Laplacian regularized matrix completion for microRNA-disease associations prediction
- Author
-
Xiaofeng Sha, Chang Tang, Hua Zhou, Xiao Zheng, and Yanming Zhang
- Subjects
Male ,Lymphoma ,Biology ,Machine learning ,computer.software_genre ,Sensitivity and Specificity ,Cross-validation ,Task (project management) ,03 medical and health sciences ,Matrix (mathematics) ,0302 clinical medicine ,Semantic similarity ,Humans ,Computer Simulation ,Genetic Predisposition to Disease ,Molecular Biology ,030304 developmental biology ,0303 health sciences ,Matrix completion ,business.industry ,Computational Biology ,Prostatic Neoplasms ,Cell Biology ,Prognosis ,Kidney Neoplasms ,Term (time) ,MicroRNAs ,Early Diagnosis ,030220 oncology & carcinogenesis ,Colonic Neoplasms ,Colon neoplasm ,Prostate neoplasm ,Female ,Artificial intelligence ,business ,computer ,Algorithms ,Research Paper - Abstract
Since lots of miRNA-disease associations have been verified, it is meaningful to discover more miRNA-disease associations for serving disease diagnosis and prevention of human complex diseases. However, it is not practical to identify potential associations using traditional biological experimental methods since the process is expensive and time consuming. Therefore, it is necessary to develop efficient computational methods to accomplish this task. In this work, we introduced a matrix completion model with dual Laplacian regularization (DLRMC) to infer unknown miRNA-disease associations in heterogeneous omics data. Specifically, DLRMC transformed the task of miRNA-disease association prediction into a matrix completion problem, in which the potential missing entries of the miRNA-disease association matrix were calculated, the missing association can be obtained based on the prediction scores after the completion procedure. Meanwhile, the miRNA functional similarity and the disease semantic similarity were fully exploited to serve the miRNA-disease association matrix completion by using a dual Laplacian regularization term. In the experiments, we conducted global and local Leave-One-Out Cross Validation (LOOCV) and case studies to evaluate the efficacy of DLRMC on the Human miRNA-disease associations dataset obtained from the HMDDv2.0 database. As a result, the AUCs of DLRMC is 0.9174 and 0.8289 in global LOOCV and local LOOCV, respectively, which significantly outperform a variety of previous methods. In addition, in the case studies on four significant diseases related to human health including Colon Neoplasms, Kidney neoplasms, Lymphoma and Prostate neoplasms, 90%, 92%, 92% and 94% out of the top 50 predicted miRNAs has been confirmed, respectively.
- Published
- 2019
30. IRWRLDA: improved random walk with restart for lncRNA-disease association prediction
- Author
-
Guiying Yan, Dun-Wei Gong, Zhu-Hong You, and Xing Chen
- Subjects
0301 basic medicine ,lncRNAs ,Normal Distribution ,random walk with restart ,Machine learning ,computer.software_genre ,Set (abstract data type) ,03 medical and health sciences ,0302 clinical medicine ,Chen ,Semantic similarity ,Similarity (network science) ,Beijing ,Databases, Genetic ,cancer ,Humans ,Computer Simulation ,Genetic Predisposition to Disease ,China ,Probability ,disease ,Leukemia ,Models, Statistical ,biology ,business.industry ,Computational Biology ,Reproducibility of Results ,Random walk ,biology.organism_classification ,Probability vector ,030104 developmental biology ,Oncology ,030220 oncology & carcinogenesis ,Area Under Curve ,Colonic Neoplasms ,RNA, Long Noncoding ,Artificial intelligence ,business ,computer ,Algorithms ,Research Paper - Abstract
// Xing Chen 1 , Zhu-Hong You 2 , Gui-Ying Yan 3, 4 , Dun-Wei Gong 1 1 School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou, 221116, China 2 School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China 3 Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, China 4 National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing, 100190, China Correspondence to: Xing Chen, email: xingchen@amss.ac.cn Zhu-Hong You, email: zhuhongyou@gmail.com Keywords: lncRNAs, disease, cancer, random walk with restart Received: January 20, 2016 Accepted: July 06, 2016 Published: August 09, 2016 ABSTRACT In recent years, accumulating evidences have shown that the dysregulations of lncRNAs are associated with a wide range of human diseases. It is necessary and feasible to analyze known lncRNA-disease associations, predict potential lncRNA-disease associations, and provide the most possible lncRNA-disease pairs for experimental validation. Considering the limitations of traditional Random Walk with Restart (RWR), the model of Improved Random Walk with Restart for LncRNA-Disease Association prediction (IRWRLDA) was developed to predict novel lncRNA-disease associations by integrating known lncRNA-disease associations, disease semantic similarity, and various lncRNA similarity measures. The novelty of IRWRLDA lies in the incorporation of lncRNA expression similarity and disease semantic similarity to set the initial probability vector of the RWR. Therefore, IRWRLDA could be applied to diseases without any known related lncRNAs. IRWRLDA significantly improved previous classical models with reliable AUCs of 0.7242 and 0.7872 in two known lncRNA-disease association datasets downloaded from the lncRNADisease database, respectively. Further case studies of colon cancer and leukemia were implemented for IRWRLDA and 60% of lncRNAs in the top 10 prediction lists have been confirmed by recent experimental reports.
- Published
- 2016
31. Predicting microRNA-disease associations using bipartite local models and hubness-aware regression
- Author
-
Jun-Yan Cheng, Xing Chen, and Jun Yin
- Subjects
0301 basic medicine ,Jaccard index ,Carcinoma, Hepatocellular ,Lung Neoplasms ,Esophageal Neoplasms ,Stability (learning theory) ,Disease ,Biology ,Machine learning ,computer.software_genre ,Cross-validation ,Standard deviation ,03 medical and health sciences ,Neoplasms ,Humans ,Genetic Predisposition to Disease ,Molecular Biology ,Computational model ,Models, Genetic ,business.industry ,Liver Neoplasms ,Computational Biology ,Cell Biology ,Regression ,Regression, Psychology ,MicroRNAs ,030104 developmental biology ,Bipartite graph ,Artificial intelligence ,business ,computer ,Research Paper - Abstract
The development and progression of numerous complex human diseases have been confirmed to be associated with microRNAs (miRNAs) by various experimental and clinical studies. Predicting potential miRNA-disease associations can help us understand the underlying molecular and cellular mechanisms of diseases and promote the development of disease treatment and diagnosis. Due to the high cost of conventional experimental verification, proposing a new computational method for miRNA-disease association prediction is an efficient and economical way. Since previous computational models ignored the hubness phenomenon, we presented a novel computational model of Bipartite Local models and Hubness-Aware Regression for MiRNA-Disease Association prediction (BLHARMDA). In this method, we first used known miRNA-disease associations to calculate the Jaccard similarity between miRNAs and between diseases, then utilized a modified kNNs model in the bipartite local model method. As a result, we effectively alleviated the detriments from 'bad' hubs. BLHARMDA obtained AUCs of 0.9141 and 0.8390 in the global and local leave-one-out cross validation, respectively, which outperformed most of the previous models and proved high prediction performance of BLHARMDA. Besides, the standard deviation of 0.0006 in 5-fold cross validation confirmed our model's prediction stability and the averaged prediction accuracy of 0.9120 showed the high precision of our model. In addition, to further evaluate our model's accuracy, we implemented BLHARMDA on three typical human diseases in three different types of case studies. As a result, 49 (Esophageal Neoplasms), 50 (Lung Neoplasms) and 50 (Carcinoma Hepatocellular) out of the top 50 related miRNAs were validated by recent experimental discoveries.
- Published
- 2018
32. Development and application of a machine learning algorithm for classification of elasmobranch behaviour from accelerometry data
- Author
-
Samuel H. Gruber, Alexander C. Hansell, Lauran R. Brewster, Michael Elliott, Ian G. Cowx, Jonathan J. Dale, Nicholas M. Whitney, Tristan L. Guttridge, and Adrian C. Gleiss
- Subjects
0106 biological sciences ,Original Paper ,Ecology ,biology ,Artificial neural network ,business.industry ,010604 marine biology & hydrobiology ,Aquatic Science ,biology.organism_classification ,Logistic regression ,Headshaking ,Accelerometer ,Machine learning ,computer.software_genre ,010603 evolutionary biology ,01 natural sciences ,Random forest ,Negaprion brevirostris ,14. Life underwater ,Gradient boosting ,Artificial intelligence ,business ,computer ,Classifier (UML) ,Ecology, Evolution, Behavior and Systematics - Abstract
Discerning behaviours of free-ranging animals allows for quantification of their activity budget, providing important insight into ecology. Over recent years, accelerometers have been used to unveil the cryptic lives of animals. The increased ability of accelerometers to store large quantities of high resolution data has prompted a need for automated behavioural classification. We assessed the performance of several machine learning (ML) classifiers to discern five behaviours performed by accelerometer-equipped juvenile lemon sharks (Negaprion brevirostris) at Bimini, Bahamas (25°44′N, 79°16′W). The sharks were observed to exhibit chafing, burst swimming, headshaking, resting and swimming in a semi-captive environment and these observations were used to ground-truth data for ML training and testing. ML methods included logistic regression, an artificial neural network, two random forest models, a gradient boosting model and a voting ensemble (VE) model, which combined the predictions of all other (base) models to improve classifier performance. The macro-averaged F-measure, an indicator of classifier performance, showed that the VE model improved overall classification (F-measure 0.88) above the strongest base learner model, gradient boosting (0.86). To test whether the VE model provided biologically meaningful results when applied to accelerometer data obtained from wild sharks, we investigated headshaking behaviour, as a proxy for prey capture, in relation to the variables: time of day, tidal phase and season. All variables were significant in predicting prey capture, with predations most likely to occur during early evening and less frequently during the dry season and high tides. These findings support previous hypotheses from sporadic visual observations. Electronic supplementary material The online version of this article (10.1007/s00227-018-3318-y) contains supplementary material, which is available to authorized users.
- Published
- 2018
33. An efficient and robust hybrid method for segmentation of zebrafish objects from bright-field microscope images
- Author
-
Yuanhao Guo, Zhan Xiong, and Fons J. Verbeek
- Subjects
0301 basic medicine ,Microscope ,Level set method ,animal structures ,Computer science ,High-throughput imaging ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,02 engineering and technology ,law.invention ,03 medical and health sciences ,law ,Special Issue Paper ,0202 electrical engineering, electronic engineering, information engineering ,Zebrafish segmentation ,Hybrid method ,Segmentation ,Transparency (data compression) ,Mean-shift ,Representation (mathematics) ,Zebrafish ,Bright-field microscope ,biology ,business.industry ,fungi ,Pattern recognition ,biology.organism_classification ,Mean shift algorithm ,Computer Science Applications ,030104 developmental biology ,Hardware and Architecture ,Pattern recognition (psychology) ,020201 artificial intelligence & image processing ,Computer Vision and Pattern Recognition ,Artificial intelligence ,business ,Software - Abstract
Accurate segmentation of zebrafish from bright-field microscope images is crucial to many applications in the life sciences. Early zebrafish stages are used, and in these stages the zebrafish is partially transparent. This transparency leads to edge ambiguity as is typically seen in the larval stages. Therefore, segmentation of zebrafish objects from images is a challenging task in computational bio-imaging. Popular computational methods fail to segment the relevant edges, which subsequently results in inaccurate measurements and evaluations. Here we present a hybrid method to accomplish accurate and efficient segmentation of zebrafish specimens from bright-field microscope images. We employ the mean shift algorithm to augment the colour representation in the images. This improves the discrimination of the specimen to the background and provides a segmentation candidate retaining the overall shape of the zebrafish. A distance-regularised level set function is initialised from this segmentation candidate and fed to an improved level set method, such that we can obtain another segmentation candidate which preserves the explicit contour of the object. The two candidates are fused using heuristics, and the hybrid result is refined to represent the contour of the zebrafish specimen. We have applied the proposed method on two typical datasets. From experiments, we conclude that the proposed hybrid method improves both efficiency and accuracy of the segmentation of the zebrafish specimen. The results are going to be used for high-throughput applications with zebrafish.
- Published
- 2018
34. Requirements for coregistration accuracy in on-scalp MEG
- Author
-
Lauri Parkkonen, Matti Stenroos, Joonas Iivanainen, Rasmus Zetter, Department of Neuroscience and Biomedical Engineering, Aalto-yliopisto, and Aalto University
- Subjects
Beamforming ,Magnetometer ,Computer science ,media_common.quotation_subject ,Acoustics ,0206 medical engineering ,02 engineering and technology ,030218 nuclear medicine & medical imaging ,law.invention ,03 medical and health sciences ,0302 clinical medicine ,law ,Position (vector) ,biology.animal ,medicine ,Humans ,Contrast (vision) ,Computer vision ,Radiology, Nuclear Medicine and imaging ,Sensitivity (control systems) ,media_common ,Original Paper ,Brain Mapping ,Scalp ,biology ,Radiological and Ultrasound Technology ,medicine.diagnostic_test ,business.industry ,Orientation (computer vision) ,Optically-pumped magnetometer ,Brain ,Magnetoencephalography ,020601 biomedical engineering ,SQUID ,Dipole ,Neurology ,Artificial intelligence ,Neurology (clinical) ,Anatomy ,business ,030217 neurology & neurosurgery ,Coregistration - Abstract
Recent advances in magnetic sensing has made on-scalp magnetoencephalography (MEG) possible. In particular, optically-pumped magnetometers (OPMs) have reached sensitivity levels that enable their use in MEG. In contrast to the SQUID sensors used in current MEG systems, OPMs do not require cryogenic cooling and can thus be placed within millimetres from the head, enabling the construction of sensor arrays that conform to the shape of an individual’s head. To properly estimate the location of neural sources within the brain, one must accurately know the position and orientation of sensors in relation to the head. With the adaptable on-scalp MEG sensor arrays, this coregistration becomes more challenging than in current SQUID-based MEG systems that use rigid sensor arrays. Here, we used simulations to quantify how accurately one needs to know the position and orientation of sensors in an on-scalp MEG system. The effects that different types of localisation errors have on forward modelling and source estimates obtained by minimum-norm estimation, dipole fitting, and beamforming are detailed. We found that sensor position errors generally have a larger effect than orientation errors and that these errors affect the localisation accuracy of superficial sources the most. To obtain similar or higher accuracy than with current SQUID-based MEG systems, RMS sensor position and orientation errors should be \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$< 4\,\hbox {mm}$$\end{document}
- Published
- 2017
- Full Text
- View/download PDF
35. A single-array preprocessing method for estimating full-resolution raw copy numbers from all Affymetrix genotyping arrays including GenomeWideSNP 5 & 6
- Author
-
Terence P. Speed, Pratyaksha Wirapati, and Henrik Bengtsson
- Subjects
Statistics and Probability ,Genotype ,Gene Dosage ,Biology ,Polymorphism, Single Nucleotide ,Biochemistry ,Online analysis ,Humans ,Preprocessor ,Base Pairing ,Molecular Biology ,Genotyping ,Oligonucleotide Array Sequence Analysis ,Genetics ,Supplementary data ,Alternative methods ,Chromosomes, Human, Pair 10 ,Genome, Human ,business.industry ,Pattern recognition ,Genome Analysis ,Original Papers ,Computer Science Applications ,Computational Mathematics ,R package ,Open source ,ROC Curve ,Computational Theory and Mathematics ,Artificial intelligence ,business ,Smoothing - Abstract
Motivation: High-resolution copy-number (CN) analysis has in recent years gained much attention, not only for the purpose of identifying CN aberrations associated with a certain phenotype, but also for identifying CN polymorphisms. In order for such studies to be successful and cost effective, the statistical methods have to be optimized. We propose a single-array preprocessing method for estimating full-resolution total CNs. It is applicable to all Affymetrix genotyping arrays, including the recent ones that also contain non-polymorphic probes. A reference signal is only needed at the last step when calculating relative CNs. Results: As with our method for earlier generations of arrays, this one controls for allelic crosstalk, probe affinities and PCR fragment-length effects. Additionally, it also corrects for probe sequence effects and co-hybridization of fragments digested by multiple enzymes that takes place on the latest chips. We compare our method with Affymetrix's CN5 method and the dChip method by assessing how well they differentiate between various CN states at the full resolution and various amounts of smoothing. Although CRMA v2 is a single-array method, we observe that it performs as well as or better than alternative methods that use data from all arrays for their preprocessing. This shows that it is possible to do online analysis in large-scale projects where additional arrays are introduced over time. Availability: A bounded-memory implementation that can process any number of arrays is available in the open source R package aroma.affymetrix. Contact: hb@stat.berkeley.edu Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2017
36. Inference of the Human Polyadenylation Code
- Author
-
Andrew Delong, Brendan J. Frey, and Michael K. K. Leung
- Subjects
0301 basic medicine ,Statistics and Probability ,Untranslated region ,Polyadenylation ,Genomics ,Computational biology ,Biology ,Biochemistry ,Genome ,Conserved sequence ,03 medical and health sciences ,0302 clinical medicine ,Humans ,Structural motif ,3' Untranslated Regions ,Molecular Biology ,Gene ,030304 developmental biology ,Genetics ,Regulation of gene expression ,0303 health sciences ,business.industry ,Genome, Human ,Three prime untranslated region ,Deep learning ,Genome Analysis ,Original Papers ,Computer Science Applications ,Computational Mathematics ,030104 developmental biology ,Gene Expression Regulation ,Computational Theory and Mathematics ,Human genome ,Artificial intelligence ,business ,Poly A ,030217 neurology & neurosurgery - Abstract
Processing of transcripts at the 3’-end involves cleavage at a polyadenylation site followed by the addition of a poly(A)-tail. By selecting which polyadenylation site is cleaved, alternative polyadenylation enables genes to produce transcript isoforms with different 3’-ends. To facilitate the identification and treatment of disease-causing mutations that affect polyadenylation and to understand the underlying regulatory processes, a computational model that can accurately predict polyadenylation patterns based on genomic features is desirable. Previous works have focused on identifying candidate polyadenylation sites and classifying sites which may be tissue-specific. What is lacking is a predictive model of the underlying mechanism of site selection, competition, and processing efficiency in a tissue-specific manner. We develop a deep learning model that trains on 3’-end sequencing data and predicts tissue-specific site selection among competing polyadenylation sites in the 3’ untranslated region of the human genome.Two neural network architectures are evaluated: one built on hand-engineered features, and another that directly learns from the genomic sequence. The hand-engineered features include polyadenylation signals, cis-regulatory elements, n-mer counts, nucleosome occupancy, and RNA-binding protein motifs. The direct-from-sequence model is inferred without prior knowledge on polyadenylation, based on a convolutional neural network trained with genomic sequences surrounding each polyadenylation site as input. Both models are trained using the TensorFlow library.The proposed polyadenylation code can predict site selection among competing polyadenylation sites in different tissues. Importantly, it does so without relying on evolutionary conservation. The model can distinguish pathogenic from benign variants that appear near annotated polyadenylation sites in ClinVar and inspect the genome to find candidate polyadenylation sites. We also provide an analysis on how different features affect the model’s performance.
- Published
- 2017
- Full Text
- View/download PDF
37. Influence of Heartwood on Wood Density and Pulp Properties Explained by Machine Learning Techniques
- Author
-
António J.A. Santos, Carla Iglesias, Helena Pereira, Ofélia Anjos, and Javier J. González Martínez
- Subjects
0106 biological sciences ,Heartwood ,engineering.material ,Kappa number ,Machine learning ,computer.software_genre ,01 natural sciences ,Multi-Layer Perceptron (MLP) ,support vector machines ,multi-layer perceptron ,010608 biotechnology ,Linear regression ,Acacia melanoxylon ,heartwood ,pulp properties ,Multiple Linear Regression ,CART ,Support Vector Machines (SVM) ,Mathematics ,040101 forestry ,Pulp properties ,biology ,business.industry ,Pulp (paper) ,Pulpwood ,Forestry ,04 agricultural and veterinary sciences ,lcsh:QK900-989 ,15. Life on land ,Perceptron ,biology.organism_classification ,Regression ,Support vector machine ,engineering ,lcsh:Plant ecology ,0401 agriculture, forestry, and fisheries ,Artificial intelligence ,business ,computer - Abstract
The aim of this work is to develop a tool to predict some pulp properties e.g., pulp yield, Kappa number, ISO brightness (ISO 2470:2008), fiber length and fiber width, using the sapwood and heartwood proportion in the raw-material. For this purpose, Acacia melanoxylon trees were collected from four sites in Portugal. Percentage of sapwood and heartwood, area and the stem eccentricity (in N-S and E-W directions) were measured on transversal stem sections of A. melanoxylon R. Br. The relative position of the samples with respect to the total tree height was also considered as an input variable. Different configurations were tested until the maximum correlation coefficient was achieved. A classical mathematical technique (multiple linear regression) and machine learning methods (classification and regression trees, multi-layer perceptron and support vector machines) were tested. Classification and regression trees (CART) was the most accurate model for the prediction of pulp ISO brightness (R = 0.85). The other parameters could be predicted with fair results (R = 0.64–0.75) by CART. Hence, the proportion of heartwood and sapwood is a relevant parameter for pulping and pulp properties, and should be taken as a quality trait when assessing a pulpwood resource. info:eu-repo/semantics/acceptedVersion
- Published
- 2017
38. Functionally guided alignment of protein interaction networks for module detection
- Author
-
Waqar Ali and Charlotte M. Deane
- Subjects
Statistics and Probability ,Theoretical computer science ,Source code ,media_common.quotation_subject ,Structural alignment ,Sequence alignment ,Biology ,Biochemistry ,Protein sequencing ,Sequence Analysis, Protein ,Protein Interaction Mapping ,Databases, Protein ,Molecular Biology ,Alignment-free sequence analysis ,media_common ,Clustering coefficient ,Multiple sequence alignment ,business.industry ,Systems Biology ,Computational Biology ,Proteins ,Pattern recognition ,Original Papers ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Graph (abstract data type) ,Artificial intelligence ,business ,Sequence Alignment ,Algorithms - Abstract
Motivation: Functional module detection within protein interaction networks is a challenging problem due to the sparsity of data and presence of errors. Computational techniques for this task range from purely graph theoretical approaches involving single networks to alignment of multiple networks from several species. Current network alignment methods all rely on protein sequence similarity to map proteins across species. Results: Here we carry out network alignment using a protein functional similarity measure. We show that using functional similarity to map proteins across species improves network alignment in terms of functional coherence and overlap with experimentally verified protein complexes. Moreover, the results from functional similarity-based network alignment display little overlap ( Availability: Program binaries and source code is freely available at http://www.stats.ox.ac.uk/research/bioinfo/resources Contact: ali@stats.ox.ac.uk Supplementary Information: Supplementary data are available at Bioinformatics online.
- Published
- 2016
39. Large-scale machine learning for metagenomics sequence classification
- Author
-
Pierre Mahé, Maud Tournoud, Kevin Vervier, Jean-Baptiste Veyrieras, Jean-Philippe Vert, BioMerieux SA, Bioinformatics Research Department, bioMérieux, Centre de Bioinformatique (CBIO), MINES ParisTech - École nationale supérieure des mines de Paris, Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL), Cancer et génome: Bioinformatique, biostatistiques et épidémiologie d'un système complexe, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut Curie [Paris]-Institut National de la Santé et de la Recherche Médicale (INSERM)
- Subjects
0301 basic medicine ,FOS: Computer and information sciences ,computer.software_genre ,Biochemistry ,Quantitative Biology - Quantitative Methods ,Machine Learning (cs.LG) ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Machine Learning ,Computational Engineering, Finance, and Science (cs.CE) ,Software ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Statistics - Machine Learning ,Computer Science - Computational Engineering, Finance, and Science ,Quantitative Methods (q-bio.QM) ,ComputingMilieux_MISCELLANEOUS ,DNA sequencing theory ,Original Papers ,[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Data mining ,Sequence Analysis ,Algorithms ,Statistics and Probability ,Sample (statistics) ,Machine Learning (stat.ML) ,Biology ,Machine learning ,DNA sequencing ,03 medical and health sciences ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry, Molecular Biology/Genomics [q-bio.GN] ,Quantitative Biology - Genomics ,Molecular Biology ,Genomics (q-bio.GN) ,business.industry ,Scale (chemistry) ,Sequence Analysis, DNA ,Computer Science - Learning ,030104 developmental biology ,Metagenomics ,FOS: Biological sciences ,Metagenome ,Noise (video) ,Artificial intelligence ,business ,computer ,Reference genome - Abstract
Motivation: Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is assigned to a taxonomic clade. Because of the large volume of metagenomics datasets, binning methods need fast and accurate algorithms that can operate with reasonable computing requirements. While standard alignment-based methods provide state-of-the-art performance, compositional approaches that assign a taxonomic class to a DNA read based on the k-mers it contains have the potential to provide faster solutions. Results: We propose a new rank-flexible machine learning-based compositional approach for taxonomic assignment of metagenomics reads and show that it benefits from increasing the number of fragments sampled from reference genome to tune its parameters, up to a coverage of about 10, and from increasing the k-mer size to about 12. Tuning the method involves training machine learning models on about 108 samples in 107 dimensions, which is out of reach of standard softwares but can be done efficiently with modern implementations for large-scale machine learning. The resulting method is competitive in terms of accuracy with well-established alignment and composition-based tools for problems involving a small to moderate number of candidate species and for reasonable amounts of sequencing errors. We show, however, that machine learning-based compositional approaches are still limited in their ability to deal with problems involving a greater number of species and more sensitive to sequencing errors. We finally show that the new method outperforms the state-of-the-art in its ability to classify reads from species of lineage absent from the reference database and confirm that compositional approaches achieve faster prediction times, with a gain of 2–17 times with respect to the BWA-MEM short read mapper, depending on the number of candidate species and the level of sequencing noise. Availability and implementation: Data and codes are available at http://cbio.ensmp.fr/largescalemetagenomics. Contact: pierre.mahe@biomerieux.com Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2016
40. Quantifying the world and its webs: mathematical discrete vs continua in knowledge construction
- Author
-
Giuseppe Longo, Longo, Giuseppe, Centre Cavaillès, La République des savoirs : Lettres, Sciences, Philosophie, Collège de France (CdF (institution))-Centre National de la Recherche Scientifique (CNRS)-Département de Philosophie - ENS Paris, École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Collège de France (CdF (institution))-Centre National de la Recherche Scientifique (CNRS)-Département de Philosophie - ENS Paris, and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
- Subjects
Big Data ,Mathematical structures ,Sociology and Political Science ,Computer science ,Super-recursive algorithm ,Big data ,Short paper ,0507 social and economic geography ,Network ,[MATH] Mathematics [math] ,[INFO] Computer Science [cs] ,Information theory ,Computer ,symbols.namesake ,[SHS.PHIL] Humanities and Social Sciences/Philosophy ,[INFO]Computer Science [cs] ,Control Theory ,Letter to Turing ,Biology ,Turing ,computer.programming_language ,Lettre à Turing ,business.industry ,05 social sciences ,Alan Turing ,050301 education ,General Social Sciences ,Informatique ,Structures mathématiques ,Turing tarpit ,symbols ,Artificial intelligence ,Mathematical structure ,business ,Biologie ,050703 geography ,0503 education ,Mathematical economics ,computer - Abstract
As a mathematician, I will focus on the consequences on knowledge construction of the very “mathematical structures” the new technologies of information are based on. The claim is that the use of discrete state (digital) devices both as mathematical models and as a knowledge paradigm in science and humanities is far from neutral. It will be then possible for the reader to develop some consequences of how the cultural and social relations may be affected by these technologies and their networks. In particular, these networks provide tools for knowledge as well as an image of the world; but, by their peculiar mathematical structure, the “causal relations” of phenomena, in all areas of knowledge, is often redesigned according to the relations proposed by the digital networks and their internal causality. I will discuss these issues in the informal style of a “personal letter” to Alan Turing. But let's first informally introduce some mathematical bases for this distant dialogue with the founding father of our computational universe...
- Published
- 2016
41. FERAL: Network-based classifier with application to breast cancer outcome prediction
- Author
-
Amin Allahyar and Jeroen de Ridder
- Subjects
Statistics and Probability ,Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland ,Breast Neoplasms ,Biology ,Machine learning ,computer.software_genre ,Biochemistry ,Group lasso ,Patient care ,Breast cancer ,Protein Interaction Mapping ,medicine ,Humans ,Disease ,Gene Regulatory Networks ,Molecular Biology ,business.industry ,Gene Expression Profiling ,Prognosis ,medicine.disease ,3. Good health ,Computer Science Applications ,Computational Mathematics ,Pathway information ,Computational Theory and Mathematics ,Female ,Artificial intelligence ,business ,Outcome prediction ,Classifier (UML) ,computer - Abstract
Motivation: Breast cancer outcome prediction based on gene expression profiles is an important strategy for personalize patient care. To improve performance and consistency of discovered markers of the initial molecular classifiers, network-based outcome prediction methods (NOPs) have been proposed. In spite of the initial claims, recent studies revealed that neither performance nor consistency can be improved using these methods. NOPs typically rely on the construction of meta-genes by averaging the expression of several genes connected in a network that encodes protein interactions or pathway information. In this article, we expose several fundamental issues in NOPs that impede on the prediction power, consistency of discovered markers and obscures biological interpretation. Results: To overcome these issues, we propose FERAL, a network-based classifier that hinges upon the Sparse Group Lasso which performs simultaneous selection of marker genes and training of the prediction model. An important feature of FERAL, and a significant departure from existing NOPs, is that it uses multiple operators to summarize genes into meta-genes. This gives the classifier the opportunity to select the most relevant meta-gene for each gene set. Extensive evaluation revealed that the discovered markers are markedly more stable across independent datasets. Moreover, interpretation of the marker genes detected by FERAL reveals valuable mechanistic insight into the etiology of breast cancer. Availability and implementation: All code is available for download at: http://homepage.tudelft.nl/53a60/resources/FERAL/FERAL.zip. Contact: j.deridder@tudelft.nl Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2015
42. IMPACT OF ARTIFICIAL INTELLIGENCE IN MEDICAL IMAGING.
- Author
-
PANDIAN, S. ATHEENA MILAGI, MURUGAN, RASHIKA, KUMAR, N. SRI MANOJ, SUDHERSON, M., and SAHIL, S. MOHAMMAD
- Subjects
ARTIFICIAL intelligence ,COMPUTER-assisted image analysis (Medicine) ,DIAGNOSTIC imaging ,COMPUTER algorithms ,SENSITIVITY & specificity (Statistics) - Abstract
Artificial Intelligence (AI) is a cutting-edge technology that analyzes complex data using computer algorithms. Diagnostic imaging is one of the most potential clinical uses of AI, and increasing effort is being put toward optimizing its functionality to make a wide range of clinical problems easier to identify and quantify. Research employing computeraided diagnostics has demonstrated exceptional precision, sensitivity, and specificity in identifying minute radiographic irregularities, which has promise for enhancing public health. However, lesion identification is often used to define result assessment in AI imaging research, neglecting the nature and biological aggressiveness of a lesion. This might lead to a distorted portrayal of AI's performance. Some AI imaging research evaluate clinically significant results, whereas others compute sensitivity and specificity to quantify diagnostic accuracy. Though AI frequently picks up on little changes to images, more significant outcome factors include newly discovered advanced disease, illnesses that need to be treated, or circumstances that might have an impact on long-term survival. AI-based research should concentrate on clinically significant events since they have a significant impact on quality of life, such as symptoms, the requirement for disease-modifying medication, and death. Numerous research have demonstrated that AI outperforms normal reading in terms of specificity and recall rates; nevertheless, the kind and biological aggressiveness of a lesion are often overlooked in the estimation of accuracy and sensitivity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. A Biologically Inspired Automatic System for Media Quality Assessment.
- Author
-
Zhang, Luming, Hong, Richang, Nie, Liqiang, and Hong, Chaoqun
- Subjects
ARTIFICIAL intelligence ,COMPUTER vision ,FEATURE extraction ,SUPERVISED learning ,ALGORITHMS - Abstract
Photo aesthetic quality evaluation is a challenging task in artificial intelligence systems. In this paper, we propose a biologically inspired aesthetic descriptor that mimicks humans sequentially perceiving visually/semantically salientref refid="fnote1"/ id="fnote1" asterisk="no"paraIn general, visually salient regions are perceived by low-level visual features, such as the high contrast between the foreground and the background objects; while semantically salient regions are perceived by high-level visual features such as human faces.pararegions in a photo. In particular, a weakly supervised learning paradigm is developed to project the local image descriptors into a low-dimensional semantic space. Then, each graphlet can be described by multiple types of visual features, both in low-level and in high-level. Since humans usually perceive only a few salient regions in a photo, a sparsity-constrained graphlet ranking algorithm is proposed that seamlessly integrates both the low-level and the high-level visual cues. Top-ranked graphlets are those visually/semantically prominent local aesthetic descriptors in a photo. They are sequentially linked into a path that simulates humans actively viewing process. Finally, we learn a probabilistic aesthetic measure based on such actively viewing paths (AVPs) from the training photos. Experimental results show that: 1) the AVPs are 87.65% consistent with real human gaze shifting paths, as verified by the eye-tracking data and 2) our aesthetic measure outperforms many of its competitors. [ABSTRACT FROM AUTHOR]
- Published
- 2016
- Full Text
- View/download PDF
44. World on Data Perspective.
- Author
-
Nasution, Mahyuddin K. M.
- Subjects
COVID-19 pandemic ,ARTIFICIAL intelligence ,COVID-19 vaccines - Abstract
It is not simple to consider the world from only one side, but analyzing all sides can cloud comprehension without reaching deep insight found at the core. In a word as a whole, there is potential for telling the whole world in one word, i.e., data, leading to interpretations as phenomena and paradigms at the core of this review. The tug of war between the two sides explains that data represent the world, or vice versa, and present a fundamental view that systems or subsystems frame the world, even though they are encoded and composed of culture, rules, or approaches such as the threshold of democracy. When the COVID-19 pandemic posed a threat, human efforts contributed to finding potentially answers to questions presented by the world: what, who, where, when, why, and how (5 wh); a calling in the form of a challenge, where facts show something. All these questions resulted in research, education, and service activities, with their respective data frameworks producing results. This paper aims to reveal the meaning of the outcomes through an observation from an outside perspective. Therefore, like COVID-19 and its vaccines, the assertion of convexity and concave contradictions in the treatment of data leads to a mutually conjugate treatment of data. In this regard, statistics and artificial intelligence play separate and complementary roles. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
45. Artificial Intelligence for Biology.
- Author
-
Hassoun, Soha, Jefferson, Felicia, Shi, Xinghua, Stucky, Brian, Wang, Jin, and Rosa, Epaminondas
- Subjects
ARTIFICIAL intelligence ,BIOLOGY ,MACHINE learning ,LIFE sciences ,TWENTY-first century - Abstract
Despite efforts to integrate research across different subdisciplines of biology, the scale of integration remains limited. We hypothesize that future generations of Artificial Intelligence (AI) technologies specifically adapted for biological sciences will help enable the reintegration of biology. AI technologies will allow us not only to collect, connect, and analyze data at unprecedented scales, but also to build comprehensive predictive models that span various subdisciplines. They will make possible both targeted (testing specific hypotheses) and untargeted discoveries. AI for biology will be the cross-cutting technology that will enhance our ability to do biological research at every scale. We expect AI to revolutionize biology in the 21st century much like statistics transformed biology in the 20th century. The difficulties, however, are many, including data curation and assembly, development of new science in the form of theories that connect the subdisciplines, and new predictive and interpretable AI models that are more suited to biology than existing machine learning and AI techniques. Development efforts will require strong collaborations between biological and computational scientists. This white paper provides a vision for AI for Biology and highlights some challenges. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
46. Biology and technology: guarding, nurturing, and governing life.
- Author
-
Keane, Michael and Han, Xiao
- Subjects
RISK perception ,BIOLOGY ,ARTIFICIAL intelligence ,MASSIVE open online courses - Abstract
The COVID-19 pandemic has changed perceptions of human mortality. In order to guard peoples' lives, many governments in Asia have deployed sophisticated data algorithms to track movement. The theme of this special issue takes its inspiration from two Chinese words: I weisheng i ( ) and I yangsheng i ( ). [Extracted from the article]
- Published
- 2021
- Full Text
- View/download PDF
47. DEEP LEARNING ANALYSIS ON THE RESULTING IMPACTS OF WEEKLY LOAD TRAINING ON STUDENTS' BIOLOGICAL SYSTEM.
- Author
-
Jiangui Peng and Jianzheng Xu
- Subjects
BIOLOGICAL systems ,MACHINE learning ,DEEP learning ,IMPACT loads ,ATHLETE training ,ARTIFICIAL intelligence - Abstract
Copyright of Revista Brasileira de Medicina do Esporte is the property of Redprint Editora Ltda. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2023
- Full Text
- View/download PDF
48. Effects of Gender on Perception and Interpretation of Video Game Character Behavior and Emotion.
- Author
-
Desai, Neesha, Zhao, Richard, and Szafron, Duane
- Abstract
Gender in video games is a popular topic. However, the focus is usually on how gender is portrayed within games. In this paper, we examine the effects of players' gender on the perception of virtual character behavior and emotion based on the results of two user studies involving story-based games. The first study compared players' perception of virtual character behaviors. We analyzed perceived differences both by gender and by gaming experience. In this study, we found that female gamers were more appreciative of complex behaviors than male gamers. In the second study, we examined the influence of gender on player' ability to identify the emotion being displayed by a virtual character. We found that most emotions were identified comparably, with the exception of anger. Female players were significantly better at identifying angry characters compared to male players. We also investigated any perception differences between emotions expressed by male and female virtual characters, but we did not identify any statistically significant differences. Overall, the studies suggest that there are differences in how male and female players perceive virtual characters, and if game designers want players to perceive these characters in a certain way, they should consider the gender of targeted players. [ABSTRACT FROM PUBLISHER]
- Published
- 2017
- Full Text
- View/download PDF
49. Crew exploration vehicle (CEV) attitude control using a neural–immunology/memory network.
- Author
-
Weng, Liguo, Xia, Min, Wang, Wei, and Liu, Qingshan
- Subjects
ARTIFICIAL satellite attitude control systems ,ROVING vehicles (Astronautics) ,ARTIFICIAL neural networks ,NEXT generation networks - Abstract
This paper addresses the problem of the crew exploration vehicle (CEV) attitude control. CEVs are NASA's next-generation human spaceflight vehicles, and they use reaction control system (RCS) jet engines for attitude adjustment, which calls for control algorithms for firing the small propulsion engines mounted on vehicles. In this work, the resultant CEV dynamics combines both actuation and attitude dynamics. Therefore, it is highly nonlinear and even coupled with significant uncertainties. To cope with this situation, a neural–immunology/memory network is proposed. It is inspired by the human memory and immune systems. The control network does not rely on precise system dynamics information. Furthermore, the overall control scheme has a simple structure and demands much less computation as compared with most existing methods, making it attractive for real-time implementation. The effectiveness of this approach is also verified via simulation. [ABSTRACT FROM PUBLISHER]
- Published
- 2015
- Full Text
- View/download PDF
50. AI, in the Transforming of Education Throughout the World.
- Author
-
Nathaniel-Supple, Sebastián, Rojas-Quiceno, Guillermo, and Palacio-Ureche, Remedios Catalina
- Subjects
SCIENCE education ,ARTIFICIAL intelligence ,DIGITAL transformation ,LITERATURE reviews ,BIOLOGY education - Abstract
Copyright of Revistade Ingenierías Interfaces is the property of Revista de Ingenierias INTERFACES and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.