76 results on '"Feng, Zeny"'
Search Results
52. Generalized genetic association study with samples of related individuals
- Author
-
Feng, Zeny, primary, Wong, William W. L., additional, Gao, Xin, additional, and Schenkel, Flavio, additional
- Published
- 2011
- Full Text
- View/download PDF
53. Genetic Variants of Nogo-66 Receptor with Possible Association to Schizophrenia Block Myelin Inhibition of Axon Growth
- Author
-
Budel, Stéphane, primary, Padukkavidana, Thihan, additional, Liu, Betty P., additional, Feng, Zeny, additional, Hu, Fenghua, additional, Johnson, Sam, additional, Lauren, Juha, additional, Park, James H., additional, McGee, Aaron W., additional, Liao, Ji, additional, Stillman, Althea, additional, Kim, Ji-Eun, additional, Yang, Bao-Zhu, additional, Sodi, Stefano, additional, Gelernter, Joel, additional, Zhao, Hongyu, additional, Hisama, Fuki, additional, Arnsten, Amy F. T., additional, and Strittmatter, Stephen M., additional
- Published
- 2008
- Full Text
- View/download PDF
54. No association between schizophrenia and polymorphisms of the PlexinA2 gene in Chinese Han Trios
- Author
-
Budel, Stephane, primary, Shim, Sang-ohk, additional, Feng, Zeny, additional, Zhao, Hongyu, additional, Hisama, Fuki, additional, and Strittmatter, Stephen M., additional
- Published
- 2008
- Full Text
- View/download PDF
55. Estimating the proportion of true null hypotheses using the pattern of observed p -values.
- Author
-
Tong, Tiejun, Feng, Zeny, Hilton, JuliaS., and Zhao, Hongyu
- Subjects
- *
GENE expression , *FALSE discovery rate , *MAXIMUM likelihood statistics , *MEAN square algorithms , *LEAST squares - Abstract
Estimating the proportion of true null hypotheses, π0, has attracted much attention in the recent statistical literature. Besides its apparent relevance for a set of specific scientific hypotheses, an accurate estimate of this parameter is key for many multiple testing procedures. Most existing methods for estimating π0in the literature are motivated from the independence assumption of test statistics, which is often not true in reality. Simulations indicate that most existing estimators in the presence of the dependence among test statistics can be poor, mainly due to the increase of variation in these estimators. In this paper, we propose several data-driven methods for estimating π0by incorporating the distribution pattern of the observedp-values as a practical approach to address potential dependence among test statistics. Specifically, we use a linear fit to give a data-driven estimate for the proportion of true-nullp-values in (λ, 1] over the whole range [0, 1] instead of using the expected proportion at 1−λ. We find that the proposed estimators may substantially decrease the variance of the estimated true null proportion and thus improve the overall performance. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
56. The Relationship between Single Nucleotide Polymorphisms in Taste Receptor Genes, Taste Function and Dietary Intake in Preschool-Aged Children and Adults in the Guelph Family Health Study.
- Author
-
Chamoun, Elie, Carroll, Nicholas A., Duizer, Lisa M., Qi, Wenjuan, Feng, Zeny, Darlington, Gerarda, Duncan, Alison M., Haines, Jess, and Ma, David W.L.
- Abstract
Taste is a fundamental determinant of food selection, and inter-individual variations in taste perception may be important risk factors for poor eating habits and obesity. Characterizing differences in taste perception and their influences on dietary intake may lead to an improved understanding of obesity risk and a potential to develop personalized nutrition recommendations. This study explored associations between 93 single nucleotide polymorphisms (SNPs) in sweet, fat, bitter, salt, sour, and umami taste receptors and psychophysical measures of taste. Forty-four families from the Guelph Family Health Study participated, including 60 children and 65 adults. Saliva was collected for genetic analysis and parents completed a three-day food record for their children. Parents underwent a test for suprathreshold sensitivity (ST) and taste preference (PR) for sweet, fat, salt, umami, and sour as well as a phenylthiocarbamide (PTC) taste status test. Children underwent PR tests and a PTC taste status test. Analysis of SNPs and psychophysical measures of taste yielded 23 significant associations in parents and 11 in children. After adjusting for multiple hypothesis testing, the rs713598 in the
TAS2R38 bitter taste receptor gene and rs236514 in theKCNJ2 sour taste-associated gene remained significantly associated with PTC ST and sour PR in parents, respectively. In children, rs173135 inKCNJ2 and rs4790522 in theTRPV1 salt taste-associated gene remained significantly associated with sour and salt taste PRs, respectively. A multiple trait analysis of PR and nutrient composition of diet in the children revealed that rs9701796 in theTAS1R2 sweet taste receptor gene was associated with both sweet PR and percent energy from added sugar in the diet. These findings provide evidence that for bitter, sour, salt, and sweet taste, certain genetic variants are associated with taste function and may be implicated in eating patterns. (Support was provided by the Ontario Ministry of Agriculture, Food, and Rural Affairs). [ABSTRACT FROM AUTHOR]- Published
- 2018
- Full Text
- View/download PDF
57. Finding the Missing Piece: Development and Applications of a Data-driven Strategy for Imputation of Mixed-type Trait Datasets
- Author
-
May, Jacqueline A., Adamowicz, Sarah J., and Feng, Zeny
- Subjects
Phylogenetics ,Traits ,Imputation - Abstract
Missing values are a prevalent issue in trait datasets and present methodological challenges for researchers. Data may be more complete for larger, charismatic species and less available for smaller-bodied species or those inhabiting understudied regions, which may result in biased inferences when datasets are reduced to species or groups with complete data. Imputation methods offer an alternative to complete-case analysis as they estimate the missing values using observed data, thereby retaining the sample size and statistical power of the study. Phylogenetic imputation methods build upon this concept by co-opting the phylogenetic signal in trait data to improve imputation performance. However, current guidelines for imputation are limited to select taxa and numerical and/or simulated data, and the performances of imputation methods using real, mixed-type (numerical and categorical) trait data are untested. To address these issues, the first study presents a real data-driven simulation strategy for imputation method selection for a given mixed-type dataset. The strategy entails missingness simulations, performance evaluations of candidate methods with and without phylogeny, and application of the best-suited method to test the impact of imputation on target dataset distributions and characteristics. Results indicate that a data-driven method selection approach reduces imputation error and preserves important dataset properties. The second study applies this strategy to datasets of diverse vertebrate groups, testing the performances of imputation methods using real mixed-type biological and environmental traits for 21 taxonomic orders. The comparatively strong performance of Random Forest imputation is apparent across the dataset types, and phylogenetic information appears generally advantageous. However, the exceptions (where imputation performs poorly) are striking and underscore the importance of a data-driven approach toward method selection. The third study illustrates the impact of imputation on biological inferences using a molecular evolution case study. Correlates of molecular rates in fishes are investigated using a complete-case dataset and datasets imputed with and without phylogeny. The role that method choice plays is further illuminated here, as imputed values have a considerable impact on derived inferences. Overall, this thesis contributes a novel strategy and suggestions for trait imputation that span diverse data types at unprecedented taxonomic levels, facilitating new research directions. Food from Thought: Agricultural Systems for a Healthy Planet Initiative program funded by the Government of Canada through the Canada First Research Excellence Fund; Natural Sciences and Engineering Research Council of Canada; Genome Canada and Ontario Genomics; Ontario Ministry of Economic Development, Job Creation and Trade
- Published
- 2023
58. ATQ: Alarm time quality, an evaluation metric for assessing timely epidemic detection models within a school absenteeism-based surveillance system
- Author
-
Vanderkruk, Kayla, Feng, Zeny, and Deeth, Lorna
- Subjects
Simulation study ,Epidemic detection ,Absenteeism surveillance system ,Influenza ,Evaluation metric - Abstract
Model-based school absenteeism surveillance systems have been proposed to raise seasonal influenza epidemic alarms. Previous studies used metrics such as false alarm rate (FAR) and accumulated days delayed, for model evaluation and selection, however they were unable to optimize both alarm accuracy and timeliness. In this study, we developed a metric, alarm time quality (ATQ), that simultaneously evaluated both aspects by assessing alarms on a gradient, where alarms raised incrementally before or after an optimal time were informative, but penalized. Summary statistics of ATQ, average alarm time quality (AATQ) and first alarm time quality (FATQ), were used as model selection criterion. Alarms raised by ATQ and FAR-selected logistic regression models were compared. Daily school absenteeism and laboratory-confirmed influenza data collected by Wellington-Dufferin-Guelph Public Health was used for demonstration. A simulation study representative of Wellington-Dufferin-Guelph was conducted for further evaluation. ATQ-selected models were found to raise alarms that were timelier than the FAR-selected model. 2022-04-25
- Published
- 2021
59. Regularized Regression Methods and Neural Networks for Modeling Fish Population Health with Water Quality Variables in the Athabasca Oil Sands Region
- Author
-
McMillan, Patrick, Deeth, Lorna, and Feng, Zeny
- Subjects
Oil Sands ,Bayesian Hyperparameter Optimization ,Variable Selection ,Neural Network ,Sentinel Fish Populations ,Environmental Monitoring - Abstract
This thesis aims to develop statistical models for fish population health measures including adjusted trout-perch body weight, gonad weight, and liver weight with the use of climate, environmental, and water quality variables measured in the Athabasca River. To identify relevant variables, we considered three variable selection techniques: stepwise regression, the lasso, and the elastic net (EN). The lasso and EN generally produced regression models with better performance for each response. Uranium (U), tungsten, tellurium (Te), pH, molybdenum (Mo), and antimony were found important for at least one response. Uranium, Te, and Mo had relatively large coefficients in both the adjusted gonad and liver weight models suggesting they may be influential on the development of trout-perch organs. Neural networks (NNs) are considered to improve the prediction accuracy of the fish population endpoints. The NNs were found to outperform the regularization techniques in predicting the adjusted body weight, but not the adjusted gonad or liver weights.
- Published
- 2021
60. Imputation of Missing Data in Chronic Hepatitis C Patient Utility Data
- Author
-
Amores, Angelica and Feng, Zeny
- Subjects
missing data ,longitudinal ,imputation ,hepatitis C ,simulation - Abstract
A longitudinal study was conducted to measure the impact of treatment with direct-acting antiviral agents on the Health-Related Quality of Life (HR-QoL) among patients diagnosed with Chronic Hepatitis C. EQ-5D measurements were recorded before treatment, mid-treatment and at two timepoints following treatment. This thesis provides recommendations for dealing with missing EQ-5D measurements in the data and proposes a strategy for selecting an imputation method for item non-response and unit non-response missingness. A simulation study is conducted on a nearly complete subset of the data to compare the performance of several imputation methods based on prediction accuracy. Results show that fully conditional specification (FCS) with predictive mean matching and FCS with a linear mixed effects model (FCS-LMM) were the most suitable imputation methods for item non-response and unit non-response missingness, respectively. The FCS-LMM method was selected to impute the missing values in the original longitudinal dataset.
- Published
- 2021
61. Phylodynamic and Transmission Network Individual Level Infectious Disease Models
- Author
-
Angevaare, Justin, Feng, Zeny, and Deardon, Rob
- Subjects
Transmission Network ,genetic structures ,Epidemic Model ,sense organs ,Markov Chain Monte Carlo ,Phylodynamics ,eye diseases ,Infectious Disease Epidemiology - Abstract
The individual level model (ILM) framework of Deardon et al. (2010) outlines the incorporation of individual specific risk factors into infectious disease models. ILMs represent individual-specific disease state transitions, and allow for investigation of hypotheses regarding overall risk to individuals. Such investigations are relevant in the development of projections and control policies while considering population heterogeneity. We extend the ILM framework to allow for competing risks of disease transmission with Transmission Network ILMs (TN-ILMs). The data requirements of TN-ILMs includes the typically latent transmission times, and transmission network, so we present TN-ILMs along Bayesian data augmentation methods to infer TN-ILM parameters jointly with these latent data. Our Markov Chain Monte Carlo based inference strategy for TN-ILMs is implemented in Pathogen.jl, a high performance, and highly flexible statistical software package in the Julia language. Pathogen.jl supports simulation, inference, and visualization of epidemics from Susceptible-Infected (SI), Susceptible-Exposed-Infected (SEI), Susceptible-Infected-Removed (SIR), and Susceptible-Exposed-Infected-Removed (SEIR) TN-ILMs. Applications of TN-ILMs using Pathogen.jl are presented for the 1861 Hagelloch measles outbreak (Pfeilsticker, 1863; Oesterle, 1992) and an experimental tomato spotted wilt virus outbreak (Hughes et al. 1997). We further extend TN-ILMs to full phylodynamic ILMs. Phylodynamics is the combined study of disease spread and evolution. Phylodynamic approaches are most appropriate when dense genetic sampling has been conducted on the pathogen during an outbreak, and evolutionary and epidemiological processes occur on a similar time scale. With the phylodynamic ILM extension, we can jointly infer disease transmission times, transmission network, pathogen phylogeny, and the phylodynamic ILM parameters. We contrast a fully phylodynamic approach to one that incorporates genetic distances as a dyadic covariate in various TN-ILMs, and show that phylodynamic ILMs offer improved event time and transmission network inference, at a significantly increased computational cost. This research was funded via a Highly Qualified Personnel (HQP) scholarship from the Ontario Ministry of Agriculture, Food and Rural Affairs (OMAFRA) / University of Guelph Partnership, as well as from Dr. Zeny Feng's and Dr. Rob Deardon's Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grants.
- Published
- 2020
62. Gut Bacteriophage Alterations after Fecal Microbiota Transplantation for Recurrent or Refractory Clostridioides difficile Infection
- Author
-
Niergarth, Jessmyn Adriane, Feng, Zeny, and Kim, Peter T.
- Subjects
FMT ,virome ,bacteriophage ,Clostridioides difficile infection ,Clostridioides difficile ,phageome ,viruses ,gut ,fecal microbiota transplantation ,rCDI ,bioinformatics ,CDI - Abstract
Clostridioides difficile infection (CDI) is a concern for health care providers around the world because CDI can be acquired nosocomially, and has high rates of treatment failure and recurrence (rCDI). Alternative therapeutic options have been explored, including fecal microbiota transplantation (FMT). FMT is a promising alternative to antibiotics that has been shown to achieve high cure rates for rCDI. Its mechanism of success is not fully understood and could potentially involve gut bacteriophages (phages), so we investigated the gut phage changes after FMT treatment of recurrent or refractory CDI. To achieve this objective, we purified DNA phages from fecal samples of rCDI patients treated with FMT, and from FMT donor samples. We created an in-house bioinformatics pipeline to preprocess raw metagenomic shotgun sequencing phage reads and identify phages from the fecal samples. We proposed a three-step statistical analysis procedure to analyze the association between HRQoL measures and phage abundances. We also explored the transition patterns of phage communities in patients from pre-FMT to post-FMT. We found that the Shannon diversity and proportion of reads mapping to donor phage contigs increased, and the Caudovirales:Microviridae ratio decreased in patients after FMT. These results suggest that donor phages were engrafted into or augmented in patients via FMT, and that these changes lead to a gut phageome more similar to that of a healthy individual. Our regularized mixed effects regression model for joint selection of dsDNA phages that were associated with Bodily Pain produced a final model that retained 36 phages as covariates. These phages are potential targets for future work and included five Leuconostoc phages, six Lactococcus phages, and one C. difficile phage. Leuconostoc and Lactococcus bacteria are associated with food, so our findings suggest that patient diet should be controlled for in future work. Within phages predicted to infect Proteobacteria, there was a relatively high abundance of phages predicted to infect Gammaproteobacteria, a bacterial class with a high proportion of pathogenic species. FMT donors were screened for pathogenic gut bacteria, but to further ensure the safety of FMTs, Gammaproteobacteria and their associated phages in FMTs could be studied further.
- Published
- 2020
63. A Novel Bioinformatics Pipeline for 18S rRNA Amplicon-based Detection of Protozoan Parasites in Shellfish
- Author
-
DeMone, Catherine, Feng, Zeny, and Shapiro, Karen
- Subjects
18S rRNA ,shellfish ,zoonotic protozoa ,parasitic diseases ,metabarcoding ,food and beverages ,bioinformatics - Abstract
Zoonotic, pathogenic protozoa are a serious public health concern. Three common species, Cryptosporidium parvum, Giardia enterica, and Toxoplasma gondii have been detected in commercial oysters. Current methods of detecting protozoa in shellfish are not standardized and few are able to simultaneously identify multiple species of interest. Here we present a bioinformatics pipeline to process 18S rRNA amplicons extracted from oyster matrices for the purpose of detecting protozoan pathogens. The pipeline was successfully applied for detection of G. enterica cysts and C. parvum and T. gondii oocysts spiked into whole oyster homogenates and hemolymph. These results indicate that 18S rRNA metabarcoding coupled with the validated pipeline could be tested for monitoring wild oysters contaminated with protozoan pathogens. While this study focused on detecting three parasites of interest, the multispecies identification abilities of this method make it an ideal screening tool for a broad range of protozoan pathogens in shellfish. Natural Sciences and Engineering Research Council (NSERC), Canada Excellence Research Chairs (CERC), University of Guelph Department of Mathematics and Statistics, University of Guelph Bioinformatics graduate program
- Published
- 2019
64. Incorporating Contact Network Uncertainty in Individual Level Models of Infectious Disease within a Bayesian Framework
- Author
-
Almutiry, Waleed, Feng, Zeny, and Deardon, Rob
- Subjects
Markov chain Monte Carlo ,likelihood inflating sampling algorithm ,population Monte Carlo approximate Bayesian computation ,likelihood approximation ,Bayesian statistics ,degree distribution ,contact network - Abstract
Individual-level infectious disease models enable the study of transmission mechanisms of infectious disease while accounting for heterogeneity within the population. As well as via covariates, such heterogeneity is often best modelled through a contact network or a series of networks. However, contact network and exact time of infection (and removal) for individuals are often completely or partially unobserved. In this thesis, we account for such data uncertainty through the incorporation of a large amount of missing information (contact network and event times) into a continuous time individual-level modelling framework. The main focus of this thesis is to consider the effect of incorporating contact network uncertainty on the performance of the models within a Bayesian framework. A secondary focus will be on enabling fast inference to fit these models to epidemic data sets with a large amount of missing information. We start by introducing our R package EpiILMCT that allows users to study the spread of infectious disease using spatial- or (and) network-based continuous-time ILMs. We then investigate the performance of network-based ILMs in analyzing small epidemic data sets under different levels of contact network uncertainty, along with uncertainty in individual-level event histories, using data augmented Markov chain Monte Carlo (MCMC). We also consider the incorporation about global-level contact network information through observa- tion models that are either based upon knowledge about the degree distribution or the total number of connections in the network. Then, we explore the use of approximate Bayesian computation population Monte Carlo (ABC-PMC) methods for fitting such models to both simulated data, and data from the UK 2001 foot-and-mouth disease epidemic. Finally, we introduce an approach to approximate full model inference by partitioning the population into a number of spatial clusters in which the contact network is divided into a series of isolated cluster-constrained sub-networks. Saudi government through the Saudi cultural bureau in Canada.
- Published
- 2018
65. Fitting Generalized Zero-Inflated Poisson Regression Mixture Models to Bacteria Microbiome Data
- Author
-
Chen, Siyu and Feng, Zeny
- Subjects
bacteria microbiome data ,GZIP regression mixture model - Abstract
Gut microbial dysbiosis contributes to the risk of colorectal cancer, thus it is important to study the gut mucosal microbiome. Gut bacteria microbiome data has the features of excess zeros and overdispersion that restrict the use of fitting traditional Poisson regression models to this kind of count data. We propose the use of the generalized zero-inflated Poisson (GZIP) regression mixture model for analyzing such data. When fitting a mixture model, we need to specify the number of components in a given population. However, the number of components is unknown. In this thesis, the Bayesian information criterion (BIC) is used to identify a preferred model with a pre-specified number of components. The EM algorithm is used to estimate parameters and the performance of the models is assessed by simulation studies. The GZIP mixture model is applied to gut bacteria microbiome data from a colorectal cancer study. We only consider the carcinoma and healthy groups as a health state covariate and select the best fitted GZIP model to each bacteria genus from models of two, three, or four components. Some special cases where the proposed methods failed to be applied are also discussed.
- Published
- 2018
66. Generalized linear regression model with LASSO, group LASSO, and sparse group LASSO regularization methods for finding bacteria associated with colorectal cancer using microbiome data
- Author
-
Bak, Stephen, Dang, Sanjeena, and Feng, Zeny
- Subjects
regularization ,data ,colon ,sparse ,cancer ,multinomial ,regression ,group ,LASSO ,Microbiome ,binomial - Abstract
With ever increasing advancements in microbiome sequencing technologies, the need for efficient statistical modelling of these systems has become apparent. Most microbiome data is filled with sparsity and therefore creates a problem for modelling with many conventional statistical analysis methods. For example, in the study of Nakatsu et al. (2015), the 16S ribosomal RNA sequencing on the colon tissue of healthy, carcinoma-inflicted, and adenoma-inflicted subjects were collected. One wishes to identify bacteria that are associated to the outcome of the three health states. The ordinary binomial or multinomial regression model would fail to perform a meaningful analysis due to the large number of taxa and the sparsity of the taxonomic count. In this thesis, we attempt to solve these problems by using the LASSO, group LASSO, and sparse group LASSO regularization on the multinomial and binomial regression models. Raw read microbiome sequencing data of the study of Nakatsu et al. (2015) is obtained from the Sequence Read Archive, of NCBI. The software "mothur" is used to preprocess the sequence data and cluster them into Operational Taxonomic Units (OTUs), and OTU counts are obtained for each taxa. We find that, in general, similar bacteria are chosen for healthy and adenoma phenotypes, and different bacteria are chosen for the carcinoma phenotype. We find that Proteobacteria are more often selected under the normal phenotype, whereas Fusobacterium are more often selected under the carcinoma phenotype. The adenoma phenotype generally resembles the bacteria from the other two phenotypes, but with different coefficients.
- Published
- 2017
67. Improving Credit Classification Using Machine Learning Techniques
- Author
-
Lazure, Adam, Kim, Peter, and Feng, Zeny
- Subjects
credit risk ,predictive mean matching ,support vector machine ,multiple imputation by chained equations ,random forest - Abstract
The quantification of credit risk is an ever expanding topic of discussion in the field of finance. In order to prevent economic loss, risk management is necessary. A popular method of risk management is the use of statistical techniques in conjunction with machine learning. This thesis takes a unique machine learning approach to credit classification. In particular, it conducts a missing information simulation study on German credit data and makes use of the random forest (RF), support vector machine (SVM), multiple imputation by chained equations (MICE) and predictive mean matching (PMM) methodologies. Results give indication that using MICE in tandem with PMM can be an optimal method of imputation within the context of credit risk data.
- Published
- 2017
68. An association test based on the mixture of zero-inflated Poisson regression models for detecting differential microbial abundance in case-control studies
- Author
-
Zhu, Maoyu and Feng, Zeny
- Subjects
differential microbial abundance ,case-control studies ,zero-inflated Poisson regression mixture model - Abstract
Motivation: The human microbial communities play an important role in human health and disease because human metabolism, nutrient intake and energy generation fall under the influence of these communities. Association analysis concerning relative abundances among these communities with status-related outcomes can provide essential information, which can help us to understand the impact that changes in the relative abundances profile can have on disease status. Proper testing of overdispersion and zero-inflated microbiome data is challenging. Existing methods fail to pinpoint the degree of association. Results: In this thesis, we propose a likelihood ratio test for testing the association between the relative abundance of bacteria and disease covariate for microbiome data while using a generalized zero-inflated Poisson regression mixture model. Simulation studies have shown that the likelihood ratio statistic, which examines the null hypothesis that the distribution of the bacterial count arises from healthy individuals and individuals with disease is the same versus the alternative hypothesis that the distribution of the bacterial count arises from healthy individuals and individuals with disease are different, converges to a chi-square distribution. The power of the likelihood ratio test is also evaluated by our simulation study. The application of our proposed method on the real microbiome data has shown that the associated bacteria at the genus level has different distributions of the bacteria counts between the healthy individuals and individuals with carcinoma. Our proposed method provides a useful tool for identifying differentiate taxonomic abundances underlying different disease status.
- Published
- 2017
69. A new bioinformatics pipeline to reveal the correlates of molecular evolutionary rates in ray-finned fishes
- Author
-
May, Jacqueline, Adamowicz, Sarah, and Feng, Zeny
- Subjects
molecular evolution ,R programming language ,DNA barcoding ,bioinformatics ,fishes ,molecular evolutionary rates ,biodiversity - Abstract
This thesis entails a multivariable investigation of molecular rate correlates in ray-finned fishes through development of a bioinformatics pipeline. The pipeline first matches data for 32 ecological traits with evolutionary rate measurements of the mitochondrial cytochrome c oxidase subunit I (COI) barcode region for over 6000 fish species. Linear regression analyses are then performed to identify those traits that contribute most to molecular rate variation, accounting for phylogenetic non-independence. The utility of the pipeline for other researchers and the potential for further molecular rate applications are then discussed. The results indicate that biological traits such as age at maturity, longevity, and body size are more general predictors of fish COI evolution rates than environmental factors such as temperature. This thesis showcases the use of bioinformatics tools to analyze different types of biological data and emphasizes the usage of multi-parameter studies to identify the most important sources of molecular rate variation. This research was funded by NSERC Discovery Grants to Dr. Sarah J. Adamowicz and Dr. Zeny Feng.
- Published
- 2017
70. Cluster analysis of microbiome data via mixtures of Dirichlet-multinomial regression models
- Author
-
Neish, Drew, Feng, Zeny, and Dang, Sanjeena
- Subjects
Dirichlet-multinomial ,microbiome ,cluster analysis ,finite mixture models - Abstract
The human gut microbiome is a source of genetic and metabolic diversity, and exploring the relationship between biological/environmental covariates and the resulting taxonomic composition of the gut microbial community is an active area of research. Previously, a Dirichlet-multinomial regression framework has been suggested to model this relationship, but it did not account for any latent group structure which has been observed across microbiome samples which share similar biota compositions (known as enterotypes). Here, a finite mixture of Dirichlet-multinomial regression models is proposed and illustrated in order to account for the enterotype structure and allow for a probabilistic investigation of the relationship between bacterial abundance and biological/environmental covariates within each inferred enterotype. Furthermore, finite mixtures of regression models which incorporate the concomitant effect of the covariates on the resulting mixing proportions are also proposed and examined within the Dirichlet-multinomial framework.
- Published
- 2015
71. Individual-level Models for use with Incomplete Infectious Disease Data and Related Topics
- Author
-
Bifolchi, Nadia, Deardon, Rob, and Feng, Zeny
- Subjects
Incomplete Infectious Disease Data ,Spatial Approximation of Contact Networks ,Risk-based Surveillance and Control ,Bayesian Framework ,Individual-level Models - Abstract
Individual-level models (ILMs) of infectious disease transmission have the ability to incorporate individual-level covariate information and thus, account for heterogeneity within the population. The amount of required data to parametrize these models and the inherent uncertainty associated with collecting infection history data can lead to large amounts of missing/incomplete information. This thesis contains three chapters describing work related to using individual-level models for incomplete infectious disease data. Infectious disease is generally spread via complex individual-level interactions. The full population’s individual-level interactions comprise the contact network but, often this contact network is unobserved. In Chapter 2, a simulation study is used to determine the effect of using spatial information as a proxy to more complex network information within ILMs fitted for predictive epidemic modelling purposes. Infectious disease models are frequently employed to predict disease spread and determine optimal strategies for disease control. In Chapter 3, a simulation study is used to examine the use of risk-based surveillance/control strategies in effectively minimizing the number of infected farms. An outbreak of an emergent strain of swine influenza within the southern Ontario pork industry is used as an example. Each farm’s risk is estimated using an ILM fit to varying degrees of available data. Limited resources are also considered through restrictions on the number of available tests. Various schemes for the allocation of these testing resources (e.g. by farm production type) is compared. In Chapter 4, several parameterizations of ILMs are proposed to better account for unobserved data due to the choice of sampling/surveillance scheme. A simulation study is used to carry out this research, with infectious disease data collected under various sampling/ surveillance scenarios. These sampling schemes are defined by two factors, the number of farms observed (sample size) and the time interval between consecutive observations. Models are parameterized to better account for time-varying epidemic strength and the effect of temporal discretization. The final chapter, Chapter 5, of this thesis looks at determining the extent to which proximity to cattle and weather events in Alberta predispose human populations to E. coli O157 disease. Ontario Ministry of Agriculture Foods and Rural Affairs (OMAFRA)/ University of Guelph Highly Qualified Personnel Scholarship, Bioniche Life Sciences Inc, Natural Sciences and Engineering Research Council of Canada (NSERC), and was carried out on equipment funded by the Canada Foundation for Innovation.
- Published
- 2015
72. Early Prediction of Seasonal Influenza using School Absenteeism
- Author
-
Stanley, Anu, Deardon, Rob, and Feng, Zeny
- Subjects
School Absenteeism ,Syndromic Surveillance ,Influenza Surveillance - Abstract
Syndromic surveillance uses non-traditional health-related data to detect regularly occurring or emerging infectious disease outbreaks. A school absenteeism surveillance system was implemented by Wellington-Dufferin-Guelph Public Health (WDGPH) since February-2008 using an arbitrary 10% absenteeism threshold. The primary focus of this thesis is to refine the current methods to allow early detection of seasonal influenza outbreaks in the community. Surveillance systems were developed linking real outbreaks, defined by aggregated hospital data within the WDG area, to the school absenteeism data. We used the moving average (MA), exponentially weighted moving average (EWMA) and logistic regression (LR) to compute a unique baseline for each school on a given day and compared its false alarm rate (FAR) and accumulated days delay (ADD) to that of a steady baseline currently used by the WDGPH. This study concludes that the current methods of WDGPH appear insufficient in comparison to the surveillance systems implemented in this thesis.
- Published
- 2014
73. Mixed Effects Models and their Applications
- Author
-
Wang, Weiqiang, Feng, Zeny, and Deardon, Rob
- Subjects
genome-wide association study ,Mixed Effects Models ,longitudinal traits - Abstract
Due to the flexibility and easy application, mixed effects models have been widely applied in medical studies. The first piece of my doctoral work involves modeling multiple events and their times to event using mixed effects models. Many diseases progress toward multiple outcomes after onset and patients may or may not be susceptible to the outcomes of interest. Interest in analyzing such disease data arises because statistical models must account for the mixture of susceptible and non-susceptible uncertainty, intra-correlation among multiple outcomes and intra-correlation among times of occurrence within each patient. We propose a mixed effects model nested within a mixture model to account for those issues. An EM algorithm is used to estimate parameters. Analytical forms of standard errors of each estimated parameters are derived based on the Louis' formula. Simulation studies are conducted to assess the performance of our proposed method. In the second paper of my thesis, a 2-step method is proposed to analyze genetic association between a single nucleotide polymorphism (SNP) and multiple longitudinal traits. In the first step, mixed effects models are used to analyze each longitudinal trait focusing on the estimated random effects. In the second step, we test the genetic association between each SNP with multiple estimated random effects simultaneously. The method is validated and evaluated through simulation studies and applied to the Framingham Heart Study data. The method is proved to be more powerful to detect the genetic pleiotropic effects than the union of testing each trait individually. The third paper focuses on the identification of gene and environment interactions. A random slope is introduced to the mixed effects model to model the genetic interactions with environmental factors. Both random intercepts and slopes will be treated as phenotypes to be tested. The method is validated and evaluated through simulation studies and real data application.
- Published
- 2014
74. A generalized simultaneous genetic association test on multiple traits
- Author
-
McDonald, Michael and Feng, Zeny
- Subjects
Phenotypic value of traits ,Genetic markers ,Multiple traits ,Genetic association test ,Quasi-likelihood scoring - Abstract
Genetic association studies involve the exploration of relationships between certain genetic markers (e.g., a single nucleotide polymorphism, SNP) and the phenotypic value of traits. The marker is said to be associated with a trait (or traits) of interest if a certain genotype of that marker occurs alongside different phenotypic values of the trait. To date, methods for association studies have been focused on the association with a single trait at a time. In this paper, we will present a novel method that is based on a generalized quasi-likelihood scoring (GQLS) approach for testing on multiple traits simultaneously. Our quasi-likelihood scoring method for multiple traits (QLSM) also accommodates samples containing related individuals. Simulation tests evaluate the efficiency of the QLSM model. The method is applied to the analysis of Canadian Holstein Cattle data. The data set consists of ten thousand SNPs for more than eight hundred bulls with over twenty estimated breeding values (EBVs), which are considered important economic traits in animal breeding and dairy science.
- Published
- 2010
75. Significance analysis of microarrays applied to heifer data
- Author
-
Tey, Jasper, Feng, Zeny, and Ali, Ayesha
- Subjects
missing values ,gene expression data ,fasting ,heifers ,imputation ,Significance Analysis of Microarrays software ,feeding - Abstract
The imputation of missing values in gene expression data is a practical problem in the field of microarray data analysis. In this thesis, Significance Analysis of Microarrays (SAM) software is used to analyze the difference between the two conditions, feeding and fasting, in over 8000 genes from heifers. To reduce the 'noise' caused by missing values, several data filtering schemes are considered, where genes, or heifers or both, that do not have at least 80% of non-missing values are removed. A modified approach to SAM's default K-NEAREST NEIGHBOUR imputation procedure is also proposed, called ' split imputation'. The results of this thesis show that the combined application of data filtering and split imputation, on the heifer data, can improve the performance of SAM with respect to its ability to control for the false discovery rate (FDR) and number of genes found significant.
- Published
- 2009
76. Summary of contributions to GAW15 Group 13: candidate gene association studies.
- Author
-
de Andrade M, Allen AS, Brinza D, Cheng R, Da Y, de Vries AR, Ewhida A, Feng Z, Jung H, Hsieh HJ, Köhler K, Liu Y, Liu-Mares W, Luan J, Marquard V, Nolte IM, Oh S, Platt A, Qin X, Yoo YJ, Yuan A, Tian X, and Won S
- Subjects
- Epistasis, Genetic, Haplotypes, Humans, Polymorphism, Single Nucleotide, Arthritis, Rheumatoid genetics
- Abstract
Here we summarize the contributions to Group 13 of the Genetic Analysis Workshop 15 held in St. Pete Beach, Florida, on November 12-14, 2006. The focus of this group was to identify candidate genes associated with rheumatoid arthritis or surrogate outcomes. The association methods proposed in this group were diverse, from better known approaches, such as logistic regression for single nucleotide polymorphism (SNP) analysis and haplotype sharing tests to methods less familiar to genetic epidemiologists, such as machine learning and visualization methods. The majority of papers analyzed Genetic Analysis Workshop 15 Problems 2 (rheumatoid arthritis data) and 3 (simulated data). The highlighted points of this group analyses were: (1) haplotype-based statistics can be more powerful than single SNP analysis for risk-locus localization; (2) considering linkage disequilibrium block structure in haplotype analysis may reduce the likelihood of false-positive results; and (3) visual representation of genetic models for continuous covariates may help identify SNPs associated with the underlying quantitative trait loci., ((c) 2007 Wiley-Liss, Inc.)
- Published
- 2007
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.