508 results on '"Sample size determination"'
Search Results
2. Sample Size Determination for Bayesian Hierarchical Models Commonly Used in Psycholinguistics
- Author
-
Vasishth, Shravan, Yadav, Himanshu, Schad, Daniel J., and Nicenboim, Bruno
- Published
- 2023
- Full Text
- View/download PDF
3. Statistical Power and Bayesian Assurance in Clinical Trial Design
- Author
-
Chen, Ding-Geng, Chen, Jenny K., Chen, Jiahua, Series Editor, Chen, Ding-Geng (Din), Series Editor, Zhao, Yichuan, editor, and Chen, Ding-Geng, editor
- Published
- 2018
- Full Text
- View/download PDF
4. Investigation of Occupational Noise-Induced Hearing Loss of Underground Coal Mines
- Author
-
Erol, İlknur
- Published
- 2022
- Full Text
- View/download PDF
5. Applications of Probability of Study Success in Clinical Drug Development
- Author
-
Wang, Ming-Dauh, Chen, Jiahua, Series editor, Chen, Ding-Geng (Din), Series editor, Chen, Zhen, editor, Liu, Aiyi, editor, Qu, Yongming, editor, Tang, Larry, editor, Ting, Naitee, editor, and Tsong, Yi, editor
- Published
- 2015
- Full Text
- View/download PDF
6. Sample Size and Power
- Author
-
Lee, Hang and Lee, Hang
- Published
- 2014
- Full Text
- View/download PDF
7. Laboratory Values for Pediatric Endocrinology
- Author
-
Dennis M. Styne
- Subjects
medicine.medical_specialty ,Glycated albumin ,Adult patients ,Pediatric endocrinology ,Sample size determination ,Computer science ,medicine ,Medical physics ,Intensive care medicine ,Absolute minimum ,Test (assessment) - Abstract
This list includes some of the reference ranges laboratory values used most frequently in pediatric endocrinology with standards, sample preparation and sample sizes from Quest Diagnostics/Nichols Institute and Esoterix/Labcorp. Other laboratories perform these tests and a few also have pediatric standards (e.g., ARUP). However, we do not recommend using a laboratory that has not established pediatric standards because the lower detectable limits, or the accurate area of the standard curves of the laboratory, might be well above pediatric values. Laboratories that are set up for pediatric endocrine samples will accept smaller volumes than requested of adult patients. The minimum volumes listed here can be used for smaller children but there may not be a duplicate test available to check results when such small quantities are utilized; consult with the laboratory for details of the most recent collection size, storage, and transport. You must check your results against the standards of the laboratory you choose, as the laboratories recommend, as the reference ranges can change since the time of publication of this book; the following is only a guide.
- Published
- 2023
8. Evaluation of the Cepheid Xpert C. difficile diagnostic assay: an update meta-analysis
- Author
-
Yuanyuan Bai, Yueling Wang, Wenjun Chu, Yan Jin, Zhen Song, and Yingying Hao
- Subjects
medicine.medical_specialty ,Receiver operating characteristic ,Clinical Microbiology - Research Paper ,business.industry ,Clostridioides difficile ,Diagnostic Tests, Routine ,Nucleic acid amplification technique ,Cochrane Library ,Microbiology ,Likelihood ratios in diagnostic testing ,Sensitivity and Specificity ,Confidence interval ,Nucleic acid amplification techniques ,Meta-analysis ,ROC Curve ,Sample size determination ,Internal medicine ,Media Technology ,Diagnostic odds ratio ,Clostridium Infections ,Medicine ,Humans ,business - Abstract
Background Accurate and rapid diagnosis of Clostridium difficile infection (CDI) is critical for effective patient management and implementation of infection control measures to prevent transmission. Objectives We updated our previous meta-analysis to provide a more reliable evidence base for the clinical diagnosis of Xpert C. difficile (Xpert C. difficile) assay. Methods We searched PubMed, EMBASE, Cochrane Library, Chinese National Knowledge Infrastructure (CNKI), and the Chinese Biomedical Literature Database (CBM) databases to identify studies according to predetermined criteria. STATA 13.0 software was used to analyze the tests for sensitivity, specificity, positive likelihood ratio, negative likelihood ratio, diagnostic odds ratio, and area under the summary receiver operating characteristic curves (AUC). QUADAS-2 was used to assess the quality of included studies with RevMan 5.2. Heterogeneity in accuracy measures was tested with Spearman correlation coefficient and chi-square. Meta-regressions and subgroup analyses were performed to figure out the potential sources of heterogeneity. Model diagnostics were used to evaluate the veracity of the data. Results A total of 26 studies were included in the meta-analysis. The pooled sensitivity (95% confidence intervals [CI]) for diagnosis was 0.97(0.95–0.98), and specificity was 0.96(0.95–0.97). The AUC was 0.99 (0.98–1.00). Model diagnostics confirmed the robustness of our meta-analysis’s results. Significant heterogeneity was still observed when we pooled most of the accuracy measures of selected studies. Meta-regression and subgroup analyses showed that the sample size and type, ethnicity, and disease prevalence might be the conspicuous sources of heterogeneity. Conclusions The up-to-date meta-analysis showed the Xpert CD assay had good accuracy for detecting CDI. However, the diagnosis of CDI must combine clinical presentation with diagnostic testing to better answer the question of whether the patient actually has CDI in the future, and inclusion of preanalytical parameters and clinical outcomes in study design would provide a more objective evidence base.
- Published
- 2021
9. Lung Nodule Classification Using Biomarkers, Volumetric Radiomics, and 3D CNNs
- Author
-
Jayalakshmi Mangalagiri, Sumeet Menon, Phuong Nguyen, David Chapman, Kushal T. Mehta, and Arshita Jain
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Lung Neoplasms ,Computer science ,Computer Vision and Pattern Recognition (cs.CV) ,Computer Science - Computer Vision and Pattern Recognition ,02 engineering and technology ,Malignancy ,Convolutional neural network ,Article ,030218 nuclear medicine & medical imaging ,Machine Learning (cs.LG) ,03 medical and health sciences ,0302 clinical medicine ,Radiomics ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,FOS: Electrical engineering, electronic engineering, information engineering ,Humans ,Radiology, Nuclear Medicine and imaging ,Lung ,Radiological and Ultrasound Technology ,Contextual image classification ,business.industry ,Image and Video Processing (eess.IV) ,CNNs ,Solitary Pulmonary Nodule ,Pattern recognition ,Electrical Engineering and Systems Science - Image and Video Processing ,medicine.disease ,Hybrid algorithm ,Computer Science Applications ,Random forest ,LIDC ,Sample size determination ,Benign ,Biomarker (medicine) ,Radiographic Image Interpretation, Computer-Assisted ,020201 artificial intelligence & image processing ,Artificial intelligence ,Neural Networks, Computer ,Lung cancer ,business ,Tomography, X-Ray Computed ,Biomarkers - Abstract
We present a hybrid algorithm to estimate lung nodule malignancy that combines imaging biomarkers from Radiologist's annotation with image classification of CT scans. Our algorithm employs a 3D Convolutional Neural Network (CNN) as well as a Random Forest in order to combine CT imagery with biomarker annotation and volumetric radiomic features. We analyze and compare the performance of the algorithm using only imagery, only biomarkers, combined imagery + biomarkers, combined imagery + volumetric radiomic features and finally the combination of imagery + biomarkers + volumetric features in order to classify the suspicion level of nodule malignancy. The National Cancer Institute (NCI) Lung Image Database Consortium (LIDC) IDRI dataset is used to train and evaluate the classification task. We show that the incorporation of semi-supervised learning by means of K-Nearest-Neighbors (KNN) can increase the available training sample size of the LIDC-IDRI thereby further improving the accuracy of malignancy estimation of most of the models tested although there is no significant improvement with the use of KNN semi-supervised learning if image classification with CNNs and volumetric features are combined with descriptive biomarkers. Unexpectedly, we also show that a model using image biomarkers alone is more accurate than one that combines biomarkers with volumetric radiomics, 3D CNNs, and semi-supervised learning. We discuss the possibility that this result may be influenced by cognitive bias in LIDC-IDRI because malignancy estimates were recorded by the same radiologist panel as biomarkers, as well as future work to incorporate pathology information over a subset of study participants., This paper has been submitted to the Journal of Digital Imaging (JDI 2020). The poster of this paper has received the 2nd prize for the Research Poster Award. Link: https://siim.org/page/20m_p_lung_node_malignancy
- Published
- 2021
10. Associations between the built environment and physical activity among adults with low socio-economic status in Canada: a systematic review
- Author
-
Christine M. Friedenreich, Anna Consoli, Chelsea Christie, Paul E. Ronksley, Gavin R. McCormack, and Jennifer E. Vena
- Subjects
Adult ,Built environment ,Canada ,Socio-economic status ,03 medical and health sciences ,quartier ,0302 clinical medicine ,Residence Characteristics ,Environmental health ,Humans ,030212 general & internal medicine ,Socioeconomic status ,Neighbourhood (mathematics) ,Exercise ,Poverty ,statut socioéconomique ,environnement bâti ,030505 public health ,Data collection ,Physical activity ,Public Health, Environmental and Occupational Health ,General Medicine ,Activité physique ,Geography ,Sample size determination ,Walkability ,Household income ,Observational study ,Systematic Review ,0305 other medical science ,Neighbourhood - Abstract
To synthesize literature on the associations between the built environment and physical activity among adults with low socio-economic status (SES) in Canada.Using a pre-specified study protocol (PROSPERO ID: CRD42019117894), we searched seven databases from inception to November 2018, for peer-reviewed quantitative studies that (1) included adults with low SES living in Canada and (2) estimated the association between self-reported or objectively measured built characteristics and self-reported or objectively measured physical activity. Study quality was assessed using the Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies. Findings were synthesized using a narrative approach.Of the 8338 citations identified by our search, seven studies met the inclusion criteria. Most studies included adults living in one province (Alberta, British Columbia, Ontario, or Quebec), with one study including a national sample. All studies were cross-sectional, and none controlled for residential self-selection. Sampling designs and data collection strategies were heterogeneous. Sample sizes ranged between 78 and 37,241 participants. Most studies measured SES using household income. Street connectivity, greenness, destination density, and walkability were positively associated with physical activity. Relative to the objectively measured built environment, associations between the self-reported built environment and physical activity were less consistent. Studies were of fair to good quality.Findings suggest that the neighbourhood built environment is associated with physical activity among adults with low SES in Canada. More rigorous study designs are needed to determine whether or not the built environment and physical activity are causally related within this vulnerable population.RéSUMé: OBJECTIF: Faire une synthèse de la littérature scientifique sur les associations entre l’environnement bâti et l’activité physique chez les adultes de faible statut socioéconomique (SSE) au Canada. MéTHODE: À l’aide d’un protocole d’étude préétabli (numéro d’identification PROSPERO : CRD42019117894), nous avons interrogé sept bases de données, entre l’inception de chacune et novembre 2018, pour repérer les études quantitatives évaluées par les pairs qui : 1) incluaient des adultes de faible SSE vivant au Canada; et 2) estimaient l’association entre les caractéristiques autodéclarées ou objectivement mesurées de l’environnement bâti et l’activité physique autodéclarée ou objectivement mesurée. La qualité des études a été évaluée à l’aide de l’outil Quality Assessment Tool for Observational Cohort and Cross-Sectional Studies. Nous avons fait la synthèse des constatations selon une démarche narrative. SYNTHèSE: Sur les 8 338 citations repérées dans notre recherche, sept études répondaient aux critères d’inclusion. La plupart incluaient des adultes vivant dans une province (Alberta, Colombie-Britannique, Ontario ou Québec), et une étude comportait un échantillon national. Toutes les études étaient transversales, et aucune ne tenait compte de l’auto-sélection de la résidence. Les modes d’échantillonnage et les stratégies de collecte de données étaient hétérogènes. La taille des échantillons variait de 78 à 37 241 participants. La plupart des études mesuraient le SSE à l’aide du revenu des ménages. La connectivité des rues, la verdure, la densité des destinations et la marchabilité étaient associées positivement à l’activité physique. Par rapport à l’environnement bâti objectivement mesuré, les associations entre l’environnement bâti autodéclaré et l’activité physique étaient moins constantes. La qualité des études était de moyenne à bonne. CONCLUSION: Nos constatations indiquent que l’environnement bâti du quartier est associé à l’activité physique chez les adultes de faible SSE au Canada. Des protocoles d’étude plus rigoureux sont nécessaires pour déterminer s’il existe ou non une relation causale entre l’environnement bâti et l’activité physique dans cette population vulnérable.
- Published
- 2020
11. Medium-coverage DNA sequencing in the design of the genetic association study
- Author
-
Hong-Wen Deng, Hui Shen, Chao Xu, and Ruiyuan Zhang
- Subjects
0303 health sciences ,Polymorphism, Genetic ,Sequence analysis ,Extramural ,030305 genetics & heredity ,Computational biology ,Sequence Analysis, DNA ,Biology ,Sensitivity and Specificity ,DNA sequencing ,Deep sequencing ,Article ,Large sample ,03 medical and health sciences ,Sample size determination ,Genetics ,Humans ,Genetics (clinical) ,Genetic association ,Type I and type II errors ,Genetic association study ,Genome-Wide Association Study - Abstract
DNA sequencing is a widely used tool in genetic association study. Sequencing cost remains a major concern in sequencing-based study, although the application of next generation sequencing has dramatically decreased the sequencing cost and increased the efficiency. The choice of sequencing depth and the sequencing sample size will largely determine the final study investment and performance. Many studies have been conducted to find a cost-effective design of sequencing depth that can achieve certain sequencing accuracy using minimal sequencing cost. The strategies previously studied can be classified into two groups: (1) single-stage to sequence all the samples using either high (>~30×) or low (
- Published
- 2020
12. Power and Sample Size
- Author
-
Elizabeth Garrett-Mayer
- Subjects
business.industry ,Sample size determination ,Statistics ,Medicine ,business ,Power (physics) - Published
- 2022
13. Foundations of Machine Learning-Based Clinical Prediction Modeling: Part III—Model Evaluation and Other Points of Significance
- Author
-
Julius M Kernbach and Victor E. Staartjes
- Subjects
Platt scaling ,Binary classification ,Brier score ,Sample size determination ,Calibration (statistics) ,business.industry ,Statistics ,Medicine ,Imputation (statistics) ,F1 score ,business ,Missing data - Abstract
Various available metrics to describe model performance in terms of discrimination (area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value, negative predictive value, F1 Score) and calibration (slope, intercept, Brier score, expected/observed ratio, Estimated Calibration Index, Hosmer-Lemeshow goodness-of-fit) are presented. Recalibration is introduced, with Platt scaling and Isotonic regression as proposed methods. We also discuss considerations regarding the sample size required for optimal training of clinical prediction models-explaining why low sample sizes lead to unstable models, and offering the common rule of thumb of at least ten patients per class per input feature, as well as some more nuanced approaches. Missing data treatment and model-based imputation instead of mean, mode, or median imputation is also discussed. We explain how data standardization is important in pre-processing, and how it can be achieved using, e.g. centering and scaling. One-hot encoding is discussed-categorical features with more than two levels must be encoded as multiple features to avoid wrong assumptions. Regarding binary classification models, we discuss how to select a sensible predicted probability cutoff for binary classification using the closest-to-(0,1)-criterion based on AUC or based on the clinical question (rule-in or rule-out). Extrapolation is also discussed.
- Published
- 2021
14. Pharmacogenomics of sulfonylureas in type 2 diabetes mellitus; a systematic review
- Author
-
Leyla Karkhaneh, Shahrzad Mohseni, Fatemeh Bandarian, Ozra Tabatabaei-Malazy, and Bagher Larijani
- Subjects
education.field_of_study ,medicine.medical_specialty ,biology ,business.industry ,Endocrinology, Diabetes and Metabolism ,Population ,Type 2 Diabetes Mellitus ,Type 2 diabetes ,Review Article ,medicine.disease ,ABCC8 ,Sample size determination ,Pharmacogenomics ,Internal medicine ,Internal Medicine ,biology.protein ,Medicine ,SNP ,Personalized medicine ,business ,education - Abstract
PURPOSE: Genetic factors have a role in response to a target medication (personalized medicine). This study aimed to review available evidence about the relationship between gene variants and therapeutic response to sulfonylureas in type 2 diabetes, systematically. METHODS: An extensive search was done in Scopus, PubMed, and Web of Science with specific search strategy in the field from the beginning until the 1(st) of Jan. 2021. After sending records to endnote software and removing duplicate records remained documents were screened by title and abstract. Full texts of remained documents were assessed after removing un-related records. Required data was extracted from remained documents and records were categorized according to gene/SNP studied. RESULTS: Finally, 26 studies with 9170 T2DM patients with a mean age of 59.47 ± 6.67 (49.7–75.2 years) remained. The most contribution was from China, Slovakia and Greece, respectively and the most genes studied were CYP2C9, KCNJ11, and both KCNQ1 and ABCC8 with 10, 7, and 4 articles, respectively. Also, rs1799853 and rs1057910 (each with seven studies), rs5219 with six studies and CYP2C9*1(with four articles), respectively were the most common variants investigated. Studies about each gene obtained different positive or negative results and were not consistent. CONCLUSION: Considering heterogeneity between SFUs pharmacogenomic studies regarding the method, sample size, population, gene/variant studied, and outcome and findings, these studies are not conclusive and need further studies.
- Published
- 2021
15. A Method of Text Sample Size Adaptation Based on Ontological and Cognitive Analysis
- Author
-
Anna Klimenko, Iakov Korovin, and Irina Safronenkova
- Subjects
Cognitive map ,Basis (linear algebra) ,business.industry ,Computer science ,Feature vector ,media_common.quotation_subject ,Cognition ,computer.software_genre ,Sample size determination ,Stylometry ,Quality (business) ,Artificial intelligence ,business ,Adaptation (computer science) ,computer ,Natural language processing ,media_common - Abstract
In this paper the issue of the latency of the cognitive assistant system, which functions on the basis of the stylometry, is considered. The conducted research allows concluding that the system latency can be decreased by means of the text sample size adaptation. The first approach to this is to correlate the text samples with the data source types and the methods of the text feature vector forming. Besides, the users’ feedbacks on the cognitive assistant functioning quality can be involved into the text sample size determination. In this paper, the new method of text sample size adaptation is proposed and discussed. It includes two approaches: the first one is the ontologycal analysis based on the data source type attribute, and the second one is the cognitive analysis of the users’ feedbacks. The mixture of these approaches allows to decrease the text sample size and so to decrease the time needed for the text feature vector forming.
- Published
- 2021
16. A Practical and Economical Bayesian Approach to Gas Price Prediction
- Author
-
TingFang Lee and ChihYun Chuang
- Subjects
Mathematical optimization ,symbols.namesake ,Computer science ,Sample size determination ,Bayesian probability ,symbols ,Timeline ,Base (topology) ,Database transaction ,Gaussian process ,Oracle ,Block (data storage) - Abstract
On the Ethereum network, it is challenging to determine a gas price that ensures a transaction will be included in a block within a user’s required timeline without overpaying. One way of addressing this problem is through the use of gas price oracles that utilize historical block data to recommend gas prices. However, when transaction volumes increase rapidly, these oracles often underestimate or overestimate the price. In this paper, we demonstrate how Gaussian process models can predict the distribution of the minimum price in an upcoming block when transaction volumes are increasing. This is effective because these processes account for time correlations between blocks. We performed an empirical analysis using the Gaussian process model on historical block data and compared the performance with GasStation-Express and Geth gas price oracles. The results suggest that when transactions volumes fluctuate greatly, the Gaussian process model offers a better estimation. Further, we demonstrated that GasStation-Express and Geth can be improved upon by using a smaller training sample size which is properly pre-processed. Base of empirical analysis, we recommended a gas price oracle made up of a hybrid model consisting of both the Gaussian process and GasStation-Express. This oracle provides efficiency, accuracy, and better cost.
- Published
- 2021
17. A normative chart for cognitive development in a genetically selected population
- Author
-
Fiksinski, Ania M., Bearden, Carrie E., Bassett, Anne S., Kahn, René S., Zinkstok, Janneke R., Hooper, Stephen R., Tempelaar, Wanda, McDonald-McGinn, Donna, Swillen, Ann, Emanuel, Beverly, Morrow, Bernice, Gur, Raquel, Chow, Eva, van den Bree, Marianne, Vermeesch, Joris, Warren, Stephen, Owen, Michael, van Amelsvoort, Therese, Eliez, Stephan, Gothelf, Doron, Arango, Celso, Kates, Wendy, Simon, Tony, Murphy, Kieran, Repetto, Gabriela, Suner, Damian Heine, Vicari, Stefano, Cubells, Joseph, Armando, Marco, Philip, Nicole, Campbell, Linda, Garcia-Minaur, Sixto, Schneider, Maude, Shashi, Vandana, Vorstman, Jacob, Breetvelt, Elemi J., RS: MHeNs - R2 - Mental Health, and Psychiatrie & Neuropsychologie
- Subjects
Adult ,DISORDERS ,Population ,Stress-related disorders Donders Center for Medical Neuroscience [Radboudumc 13] ,CHILDHOOD ,CHILDREN ,ADULTHOOD ,Article ,03 medical and health sciences ,0302 clinical medicine ,PSYCHOSIS ,Cognition ,Chart ,medicine ,Cognitive development ,DiGeorge Syndrome ,Humans ,22Q11.2 DELETION SYNDROME ,BRAIN ,education ,Association (psychology) ,Pharmacology ,Intelligence Tests ,education.field_of_study ,DECLINE ,medicine.disease ,RETT-SYNDROME ,030227 psychiatry ,Psychiatry and Mental health ,Schizophrenia ,Sample size determination ,Normative ,Psychology ,030217 neurology & neurosurgery ,BEHAVIOR ,Clinical psychology - Abstract
Item does not contain fulltext Certain pathogenic genetic variants impact neurodevelopment and cause deviations from typical cognitive trajectories. Understanding variant-specific cognitive trajectories is clinically important for informed monitoring and identifying patients at risk for comorbid conditions. Here, we demonstrate a variant-specific normative chart for cognitive development for individuals with 22q11.2 deletion syndrome (22q11DS). We used IQ data from 1365 individuals with 22q11DS to construct variant-specific normative charts for cognitive development (Full Scale, Verbal, and Performance IQ). This allowed us to calculate Z-scores for each IQ datapoint. Then, we calculated the change between first and last available IQ assessments (delta Z-IQ-scores) for each individual with longitudinal IQ data (n = 708). We subsequently investigated whether using the variant-specific IQ-Z-scores would decrease required sample size to detect an effect with schizophrenia risk, as compared to standard IQ-scores. The mean Z-IQ-scores for FSIQ, VIQ, and PIQ were close to 0, indicating that participants had IQ-scores as predicted by the normative chart. The mean delta-Z-IQ-scores were equally close to 0, demonstrating a good fit of the normative chart and indicating that, as a group, individuals with 22q11DS show a decline in IQ-scores as they grow into adulthood. Using variant-specific IQ-Z-scores resulted in 30% decrease of required sample size, as compared to the standard IQ-based approach, to detect the association between IQ-decline and schizophrenia (p
- Published
- 2021
18. The Effective Sample Size of EHR-Derived Cohorts Under Biased Sampling
- Author
-
Carolyn Lou, Rebecca A. Hubbard, and Blanca E. Himes
- Subjects
Selection bias ,education.field_of_study ,business.industry ,media_common.quotation_subject ,Population ,Sampling (statistics) ,Sample (statistics) ,Simple random sample ,Confidence interval ,Sample size determination ,health services administration ,Statistics ,Medicine ,business ,education ,health care economics and organizations ,Sampling bias ,media_common - Abstract
Electronic Health Records (EHRs) have become a popular data source for conducting observational studies of health outcomes. One advantage of using EHR-derived data for biomedical and epidemiologic research is the ability to efficiently construct large cohorts, providing access to “big data” in healthcare. For example, the U.S. Food and Drug Administration’s Sentinel System, which is composed of EHR and administrative claims data, includes over 100 million people, constituting approximately one-third of the U.S. population. Although the sample size of EHR-derived cohorts can be very large, EHR data arise through a complex, non-random sampling process that can induce bias when using such data to obtain parameter estimates that are meant to be representative of an underlying population. In the U.S.A., where most health insurance is employment-based, insured populations are often non-representative of uninsured populations, and thus, insurance status, as well as health literacy and healthcare-seeking behavior, is associated with representation in EHRs. As a result, the non-random sampling mechanism that gives rise to EHR data can induce significant bias in parameter estimates derived from EHR-based studies relative to the underlying population parameters. Here, we derive formulas for the mean-squared error of an EHR-derived sample as a function of the strength of association between a health outcome of interest, the sampling process, and an underlying unobserved covariate. We also provide a formula for the effective sample size of an EHR-derived cohort defined as the sample size of a simple random sample with equivalent mean-squared error to an EHR-derived sample arising from a biased sampling mechanism. The effective sample size allows for assessment of the advantage of using an EHR-derived sample as opposed to conducting a more traditional, designed observational study, taking into account both the number of patients and the biased sampling mechanism. Through simulation studies, we demonstrate the magnitude of bias induced in EHR-based parameter estimates under varying sample selection mechanisms, and we demonstrate how the effective sample size can be used to compute confidence intervals that account for the biased sampling scheme. We conclude that attention to biased sampling is necessary to avoid erroneous inference due to the large sample size and complex, non-random provenance of EHR-derived data, when the goal of a study is to use EHR-derived data to capture parameter estimates that are representative of an underlying population.
- Published
- 2021
19. Analyzing and Prioritizing Usability Issues in Games
- Author
-
Amna Idrees, Umair Rehman, Helmut Hlavacs, Hassan Ilahi, Muhammad Umair Shah, and Amir Zaib Abbasi
- Subjects
Player experience ,Business goals ,Sample size determination ,Computer science ,business.industry ,Heuristic evaluation ,Observational study ,Usability ,business ,Data science ,Focus group - Abstract
Identifying and addressing usability issues is a vital part of improving player experiences in games. This paper delves into different methods that can be applied to shortlist usability issues in games. We specifically discuss observational studies, think-aloud protocols, questionnaires, surveys, user interviews, focus groups, and heuristic evaluations. The paper also discusses approaches to prioritize usability issues in games through severity ratings. Severity ratings allow researchers to prioritize issues in terms of their impact on player experience and other factors, including business goals, cost of redesigns, etc. We also discuss how the player’s usability data can be analyzed using different approaches. Next, we discuss sources of biases in usability studies, ways of determining optimal sample size and approaches to reducing evaluator effect. Finally, we describe approaches to ideating design solutions to address usability concerns in games.
- Published
- 2021
20. Preliminary Analysis of Human Error Prediction Model by Using Biological Information
- Author
-
Ryota Matsubara, Midori Sugaya, Muhammad Nur Adilin Mohd Anuardi, and Yuto Saito
- Subjects
business.industry ,Computer science ,Human error ,Cognition ,Workload ,Machine learning ,computer.software_genre ,Logistic regression ,Sample size determination ,Artificial intelligence ,Limit (mathematics) ,business ,Construct (philosophy) ,computer ,Stroop effect - Abstract
Increasing in aging population forced the society to act more than their limit. For instance, an action such as driving, where we need our mental concentration at most, could lead to serious accident from a simple mistake because of overwork. Therefore, it is crucial to prevent the accident. Many researchers focus on biological information to predict the error because human error always related to a person’s cognitive condition such as stress and discomfort. However, existing studies on the human error prediction model have not conducted a detailed analysis, and also have not considered individual differences. Therefore, the purpose of this study is to analyze the biological information immediately before and after the occurrence of human error in order to construct a prediction model for human error considering individual differences. In this study, we developed the Stroop task to be used as the mental workload and measured the subjects’ biological information. As a result, we proposed 10 [s] as the time intervals for before and after the consecutive of the occurrence of the human errors for better analysis. Besides, the biological information measured from all subjects suggested that pNN10 can be considered as the predictive indicator for human error occurrence. However, other biological information also expressed vary results where our next step needs to consider the individual differences by increasing the sample size. In addition, the logistic regression will be considered for machine learning to be used for the human error prediction model construction.
- Published
- 2021
21. Evaluation of Linguistic Questionnaire
- Author
-
Rédina Berkachy
- Subjects
Data set ,Fuzzy rule ,Computer science ,Sample size determination ,Fuzzy number ,Sample (statistics) ,Fuzzy control system ,Missing data ,Defuzzification ,Linguistics - Abstract
Linguistic questionnaires have gained lots of attention in the last decades. They are a prominent tool used in many fields to convey the opinion of people on different subjects. In this chapter, we propose a model for the assessment of linguistic questionnaires describing the global and individual evaluations, where we suppose that the sample weights and the missingness are both allowed. For the problem of missingness, we show a method based on the readjustment of weights. We should clarify that the proposed approach is not a correction for the missingness in the sample as widely known in survey statistics. We give the expressions of the individual and global evaluations, followed by the description of the indicators of information rate related to missing answers. The model is after illustrated by a numerical application related to the Finanzplatz data set. The objective of this empirical application is to clearly see that the obtained individual evaluations can be treated similarly to any data set in the classical theory. In addition, we will empirically remark that the obtained distributions tend to be normally distributed. Afterward, we perform different analyses by simulations on the statistical measures of these distributions. We compare the individual evaluations with respect to a variety of distances, in order to see the influence of the symmetry of the modelling fuzzy numbers and the sample sizes on different statistical measures. We close the chapter by a comparison between the evaluations by the defended model, and the ones obtained through a usual fuzzy system using different defuzzification operators. Interesting findings of this chapter are that corresponding statistical measures are independent from the sample sizes, and that the use of the traditional fuzzy rule-based systems is not always the most convenient tool when non-symmetrical modelling shapes are used, contrariwise to the defended approach. Our approach by the individual evaluations is in such situations suggested.
- Published
- 2021
22. Simulation Experiments on Markov-Modulated Linear Regression Model
- Author
-
Ilya Jackson and Nadezda Spiridovska
- Subjects
Markov chain ,Markov additive process ,Consistency (statistics) ,Estimation theory ,Sample size determination ,Linear regression ,Estimator ,Applied mathematics ,Constant (mathematics) ,Mathematics - Abstract
Markov-modulated linear regression (MMLR) model is a special case of Markov-additive processes. The model assumes that unknown regression coefficients depend on an external state of the environment, but regressors remain constant. MMLR model differs from other switching models by a new analytic approach to parameter estimation and known transition intensities between the states of Markov component. This paper analyses statistical properties of MMLR model’s estimator based on simulated data. The research considers the influence of the sample parameters (e.g., sample size and distribution of the initial data), as well the influence of estimation method details (e.g., different weight matrices in OLS) on the consistency of model estimates.
- Published
- 2021
23. Meta-Analysis and Meta-Regression: An Alternative to Multilevel Analysis When the Number of Countries Is Small
- Author
-
Liefbroer, A.C., Zoutewelle-Terovan, M., and Netherlands Interdisciplinary Demographic Institute (NIDI)
- Subjects
Estimation ,small clusters ,Computer science ,05 social sciences ,Multilevel model ,050401 social sciences methods ,Variance (accounting) ,cross-national ,01 natural sciences ,meta-analysis ,010104 statistics & probability ,Standard error ,0504 sociology ,Sample size determination ,Comparative research ,Meta-analysis ,meta-regression ,Econometrics ,multilevel analysis ,Meta-regression ,0101 mathematics - Abstract
Hierarchically nested data structures are often analyzed by means of multilevel techniques. A common situation in cross-national comparative research is data on two levels, with information on individuals at level 1 and on countries at level 2. However, when dealing with few level-2 units (e.g. countries), results from multilevel models may be unreliable due to estimation bias (e.g. underestimated standard errors, unreliable country-level variance estimates). This chapter provides a discussion on multilevel modeling inaccuracies when using a small level-2 sample size, as well as a list of available alternative analytic tools for analyzing such data. However, as in practice many of these alternatives remain unfeasible in testing hypotheses central to cross-national comparative research, the aim of this chapter is to propose and illustrate a new technique – the 2-step meta-analytic approach – reliable in the analysis of nested data with few level-2 units. In addition, this method is highly infographic and accessible to the average social scientist (not skilled in advanced simulation techniques).
- Published
- 2021
24. Effects of Sample Size in the Determination of the True Number of Haplogroups or ESUs Within a Species with Phylogeographic and Conservation Purposes: The Case of Cebus albifrons in Ecuador, and the Kinkajous and Coatis Throughout Latin America
- Author
-
Christian Miguel Pinto, Maria Fernanda Jaramillo, Joseph Mark Shostell, Sebastián Sánchez-Castillo, María Ignacia Castillo, and Manuel Ruiz-García
- Subjects
Mitochondrial DNA ,Phylogeography ,biology ,Sample size determination ,Range (biology) ,biology.animal ,Cebus albifrons ,Zoology ,Nasua ,Kinkajou ,biology.organism_classification ,Haplogroup - Abstract
Geographical assignment of individuals, or tissues, seized from illegal traffic and hunting is relevant for the conservation of many species. For this, the real number of genetically differentiated groups within a species should be determined to know from where the specimens were illegally extracted or to know where the seized and rehabilitated specimens should be liberated. This determination is also crucial to study the evolutionary history of the species. In the current work, we show, by means of three examples, that sample size is more important than the number of genes or markers studied in determining the total number of well-differentiated genetic groups. The examples were related to the number of groups detected for the white-fronted capuchins (Cebus albifrons) in Ecuador, and for the number of well-differentiated groups throughout Latin America for the kinkajou (Potos flavus), and for the different species of coatis (Nasua and Nasuella). In all cases, larger sample sizes with fewer genes detected more genetically different groups than did smaller-sized with entire mitogenomes. Therefore, in regards to the geographical assignment of seized specimens from illegal traffic it is better to obtain larger sample sizes, which cover the most extensive geographical range possible even if they have just one or few mitochondrial genes rather than to rely on smaller sample sizes with entire mitogenome. Furthermore, we take into consideration that analyses of entire mitogenomes are more costly and require a higher DNA quality than a few mitochondrial genes.
- Published
- 2021
25. Finite Sample Smeariness on Spheres
- Author
-
Stephan Huckemann, Benjamin Eltzner, and Shayan Hundrieser
- Subjects
05 social sciences ,Mathematical analysis ,Type (model theory) ,Physics::Classical Physics ,Condensed Matter::Disordered Systems and Neural Networks ,01 natural sciences ,Nominal size ,010104 statistics & probability ,Distribution (mathematics) ,Dimension (vector space) ,Sample size determination ,0502 economics and business ,0101 mathematics ,Random variable ,050205 econometrics ,Statistical hypothesis testing ,Mathematics ,Quantile - Abstract
Finite Sample Smeariness (FSS) has been recently discovered. It means that the distribution of sample Frechet means of underlying rather unsuspicious random variables can behave as if it were smeary for quite large regimes of finite sample sizes. In effect classical quantile-based statistical testing procedures do not preserve nominal size, they reject too often under the null hypothesis. Suitably designed bootstrap tests, however, amend for FSS. On the circle it has been known that arbitrarily sized FSS is possible, and that all distributions with a nonvanishing density feature FSS. These results are extended to spheres of arbitrary dimension. In particular all rotationally symmetric distributions, not necessarily supported on the entire sphere feature FSS of Type I. While on the circle there is also FSS of Type II it is conjectured that this is not possible on higher-dimensional spheres.
- Published
- 2021
26. Investigating the Robustness of Reading Difficulty Models for Russian Educational Texts
- Author
-
Alexey Sorokin and Ulyana Isaeva
- Subjects
business.industry ,media_common.quotation_subject ,Sample (statistics) ,Feature selection ,computer.software_genre ,Readability ,Sample size determination ,Reading (process) ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,Feature (machine learning) ,Relevance (information retrieval) ,Artificial intelligence ,Set (psychology) ,business ,computer ,Natural language processing ,media_common - Abstract
Recent papers on Russian readability suggest several formulas aimed at evaluating text reading difficulty for learners of different ages. However, little is known about individual formulas for school subjects and their performance compared to that of existing universal readability formulas. Our goal is to study the impact of the subject both in terms of model quality and on the importance of individual features. We trained 4 linear regression models: an individual formula for each of 3 school subjects (Biology, Literature, and Social Studies) and a universal formula for all the 3 subjects. The dataset was created of schoolbook texts, randomly sampled into pseudo-texts of size 500 sentences. It was split into train and test sets in the ratio of 75 to 25. As for the features, previous papers on Russian readability do not provide proper feature selection. So we suggested a set of 32 features that are possibly relevant to text difficulty in Russian. For every model, features were selected from this set based on their importance. The results obtained show that all the one-subject formulas outperform the universal model and previously developed readability formulas. Experiments with other sample sizes (200 and 900 sentences per sample) prove these results. This is because feature importances vary significantly among the subjects. Suggested readability models might be beneficial for school education for evaluating text relevance for learners and adjusting those texts to target difficulty levels.
- Published
- 2021
27. Forest Vegetation Sampling and Analysis
- Author
-
Gautam Kumar Das
- Subjects
Biomass (ecology) ,Similarity (network science) ,Sample size determination ,Environmental science ,Forest vegetation ,Sampling (statistics) ,Forestry ,Vegetation ,Relative species abundance ,Stock (geology) - Abstract
Analysis of vegetation characteristics and their occurrences including timber and non-timber species by random sampling, estimation of biomass stock and carbon content, and determination of wood volume are considered for several forest patches of West Bengal. For determination of required optimum sample size in the forests including the nature of the vegetation pattern, a survey has been taken up for statistical analysis on the sampled data collected from the 27 forests of West Bengal. Sampled data are analyzed in three phases separately applying probability measures of statistical methods. From the analysis, higher the survey spots, lower the required optimum sample size is revealed. Results obtained from the statistical analysis for the forest show likely indications and positive trends that help to understand the vegetation categories, types of dominant timber trees and stem-diameters of the other forest areas in the state. Similarity measurement of such timber trees is the determination of the properties of communities that helps to suggest whether the communities may be classified together or in necessity to be separated. Not only similarity measurements, but the biomass stock mapping of such uprooted timber trees helps for the estimation of carbon content of dead wood in the community forest created under the social forestry scheme.
- Published
- 2021
28. Fixed and Random-Effects Models for Meta-Analysis
- Author
-
Ravisha Srinivasjois
- Subjects
Randomized controlled trial ,law ,Sample size determination ,Meta-analysis ,Statistics ,Forest plot ,Fixed effects model ,Variance (accounting) ,Random effects model ,Selection (genetic algorithm) ,law.invention ,Mathematics - Abstract
Results of a randomised controlled trial (RCT) may differ from other similar RCTs despite best efforts in study design and conduct. This is because some heterogeneity is inevitable as no two individuals are identical, and responses to interventions vary. A meta-analysis of ‘more or less similar’ studies generates a more reliable summary estimate to better predict the true population effect because of the improved power and precision. Meta-analysis involves assigning ‘weight’ to each included study based on various factors, including the sample size, and observed variance. The weight assigned to each study differs based on the model chosen to generate the pooled effect estimate. Judging the effect of heterogeneity on the results of included studies is crucial for selecting the right model for meta-analysis. The choice of the model affects the outcomes of the summary estimate. This chapter covers the key assumptions, characteristics and rationale for selection of the fixed effect and random effects model for analysis.
- Published
- 2021
29. Estimation Methods for Item Factor Analysis: An Overview
- Author
-
Yunxiao Chen and Siliang Zhang
- Subjects
Structure (mathematical logic) ,Multivariate statistics ,Computer science ,business.industry ,Machine learning ,computer.software_genre ,Set (abstract data type) ,Sample size determination ,Statistical inference ,Artificial intelligence ,business ,Categorical variable ,computer ,Factor analysis - Abstract
Item factor analysis (IFA) refers to the factor models and statistical inference procedures for analyzing multivariate categorical data. IFA techniques are commonly used in social and behavioral sciences for analyzing item-level response data. Such models summarize and interpret the dependence structure among a set of categorical variables by a small number of latent factors. In this chapter, we review the IFA modeling technique and commonly used IFA models. Then we discuss estimation methods for IFA models and their computation, with a focus on the situation where the sample size, the number of items, and the number of factors are all large. Existing statistical softwares for IFA are surveyed. This chapter is concluded with suggestions for practical applications of IFA methods and discussions of future directions.
- Published
- 2021
30. Privacy Amplification via Iteration for Shuffled and Online PNSGD
- Author
-
Matteo Sordello, Zhiqi Bu, and Jinshuo Dong
- Subjects
Scheme (programming language) ,Noise ,Stochastic gradient descent ,Sample size determination ,Computer science ,Convergence (routing) ,Line (geometry) ,Differential privacy ,computer ,Algorithm ,Online setting ,computer.programming_language - Abstract
In this paper, we consider the framework of privacy amplification via iteration, which is originally proposed by Feldman et al. and subsequently simplified by Asoodeh et al. in their analysis via the contraction coefficient. This line of work focuses on the study of the privacy guarantees obtained by the projected noisy stochastic gradient descent (PNSGD) algorithm with hidden intermediate updates. A limitation in the existing literature is that only the early stopped PNSGD has been studied, while no result has been proved on the more widely-used PNSGD applied on a shuffled dataset. Moreover, no scheme has been yet proposed regarding how to decrease the injected noise when new data are received in an online fashion. In this work, we first prove a privacy guarantee for shuffled PNSGD, which is investigated asymptotically when the noise is fixed for each sample size n but reduced at a predetermined rate when n increases, in order to achieve the convergence of privacy loss. We then analyze the online setting and provide a faster decaying scheme for the magnitude of the injected noise that also guarantees the convergence of privacy loss.
- Published
- 2021
31. Optimizing Regularized Multiple Linear Regression Using Hyperparameter Tuning for Crime Rate Performance Prediction
- Author
-
Costin Badica and Alexandra Vultureanu-Albisi
- Subjects
Hyperparameter ,Variables ,Computer science ,media_common.quotation_subject ,02 engineering and technology ,Overfitting ,Regularization (mathematics) ,Sample size determination ,020204 information systems ,Linear regression ,0202 electrical engineering, electronic engineering, information engineering ,Performance prediction ,Preprocessor ,020201 artificial intelligence & image processing ,Algorithm ,media_common - Abstract
Multiple Linear Regression is a well-known technique used to experimentally investigate the relationship between one dependent variable and multiple independent variables. However, fitting this model has problems, for example when the sample size is large. Consequently, the results of traditional methods to estimate the model can be misleading. So, there have been proposed regularization or shrinkage techniques to estimate the model in this case. In this work, we have proposed a methodology to build a crime rate performance prediction model using multiple linear regression methods with regularization. Our methodology consists of three major steps: i) analyzing and preprocessing the dataset; ii) optimizing the model using k-fold cross-validation and hyperparameter tuning; iii) comparing the performance of different models using accuracy metrics. The obtained results show that the model built using lasso regression, outperforms the other constructed models.
- Published
- 2021
32. Testing Equality of Mean Vectors with Block-Circular and Block Compound-Symmetric Covariance Matrices
- Author
-
Carlos A. Coelho
- Subjects
Set (abstract data type) ,Work (thermodynamics) ,Block structure ,Sample size determination ,Likelihood-ratio test ,Structure (category theory) ,Block (permutation group theory) ,Applied mathematics ,Covariance ,Mathematics - Abstract
While the likelihood ratio test (LRT) for the equality of mean vectors when no particular structure is assumed for the covariance matrices is a well-known and well-studied test, the same is not true when some structure, namely a block structure, is assumed for the covariance matrices. In the present work, the author obtains the expressions for the LRT statistics to test the equality of mean vectors when the covariance matrices are assumed to be block-circular or block compound-symmetric and it is shown that actually in most cases the distributions of these statistics have closed finite form representations. For the other cases, families of near-exact distributions are developed and their performance is then numerically assessed. It is shown that these families of near-exact distributions lie very close to the exact distribution, even for very small samples and that they have an asymptotic behavior not only for increasing sample sizes but also for increasing numbers of populations involved and increasing numbers of sets of variables and variables in each set.
- Published
- 2021
33. Comparison of Outlier Detection Methods in NEAT Design
- Author
-
Daniel Jurich and Chunyan Liu
- Subjects
Rasch model ,Mean squared error ,Sample size determination ,Flagging ,Equating ,Outlier ,Statistics ,Anomaly detection ,Statistic ,Mathematics - Abstract
In equating practice, the existence of outliers in the anchor items can deteriorate the equating accuracy and threaten the validity of test scores. This study used simulation to compare the performance of three outlier detection methods when conducting equating: the t-test method, the logit difference method, and the robust z statistic. The investigated factors include sample size, proportion of outliers, item difficulty drift direction, and group difference. Overall, across all simulated conditions, the t-test method outperformed the other methods in terms of sensitivity of flagging true outliers, specificity of flagging true non-outliers, bias of translation constant, and the root mean square error of the estimated examinee ability.
- Published
- 2021
34. Application of Two-Stage Sampling in Sampling Inspection
- Author
-
Tao Hong and Jinzhuo Chen
- Subjects
Characteristic function (convex analysis) ,Sampling inspection ,Sampling scheme ,Sample size determination ,Statistics ,Two stage sampling ,Sampling (statistics) ,Mathematics - Abstract
This paper considers conditioning on the quantities of the samples observed in incoming inspection, we give the sampling characteristic function and algorithm of the two-stage sampling scheme. Based on the comparison of operating characteristic curves we found that the sample size of the sampling scheme proposed in this paper is one-tenth of the original sampling scheme, and the sampling effect is better, reducing the inspection cost and improving the sampling inspection efficiency.
- Published
- 2021
35. Finite Sample Performance of Traditional Estimators
- Author
-
Aygul Zagidullina
- Subjects
Sample size determination ,Statistics ,Estimator ,Magnitude (mathematics) ,Sample (statistics) ,Data dimension ,Multivariate statistical ,Mathematics - Abstract
We demonstrate that well-known multivariate statistical techniques perform poorly and become misleading, when the data dimension p is comparable in magnitude to or larger than the sample size n.
- Published
- 2021
36. A Robustness Evaluation of Homogeneity Test of Covariance Matrices
- Author
-
Rauf Ahmad
- Subjects
Trace (linear algebra) ,Dimension (vector space) ,Sample size determination ,Robustness (computer science) ,Homogeneity (statistics) ,Statistics ,Covariance ,Null hypothesis ,Test (assessment) ,Mathematics - Abstract
Box’s M test is most commonly used to test homogeneity of covariance matrices, and is a default test in most statistical software. The test, however, is known to be sensitive to non-normality and overly sensitive to departures from the null hypothesis. Among many competitors, Nagao’s trace test is considered the best alternative to the Box’s likelihood-ratio test. To evaluate Box’s test for its robustness to non-normality, a detailed simulation study is carried out, with an additional objective being to study the test’s behavior for increasing dimension, although keeping the dimension less than the sample size so that the test can be computed. For comparison, Nagao’s alternative is also included. The most important finding of the study substantiates the conjecture that Box’s test seriously lacks robustness to non-normality. Further, its performance also worsens with even moderately increasing dimension. Nagao’s test shares the same problems in both cases.
- Published
- 2021
37. Alzheimer’s Brain Network Analysis Using Sparse Learning Feature Selection
- Author
-
Edwin R. Hancock, Lixin Cui, Yue Wang, Lu Bai, and Lichi Zhang
- Subjects
Elastic net regularization ,Identification (information) ,Lasso (statistics) ,Discriminative model ,Feature (computer vision) ,Sample size determination ,Computer science ,business.industry ,Pattern recognition ,Feature selection ,Pairwise comparison ,Artificial intelligence ,business - Abstract
Accurate identification of Mild Cognitive Impairment (MCI) based on resting-state functional Magnetic Resonance Imaging (RS-fMRI) is crucial for reducing the risk of developing Alzheimer’s disease (AD). In the literature, functional connectivity (FC) is often used to extract brain network features. However, it still remains challenging for the estimation of FC because RS-fMRI data are often high-dimensional and small in sample size. Although various Lasso-type sparse learning feature selection methods have been adopted to identify the most discriminative features for brain disease diagnosis, they suffer from two common drawbacks. First, Lasso is instable and not very satisfactory for the high-dimensional and small sample size problem. Second, existing Lasso-type feature selection methods have not simultaneously encapsulate the joint correlations between pairwise features and the target, the correlations between pairwise features, and the joint feature interaction into the feature selection process, thus may lead to suboptimal solutions. To overcome these issues, we propose a novel sparse learning feature selection method for MCI classification in this work. It unifies the above measures into a minimization problem associated with a least square error and an Elastic Net regularizer. Experimental results demonstrate that the diagnosis accuracy for MCI subjects can be significantly improved using our proposed feature selection method.
- Published
- 2021
38. How the Business Intelligence in the New Startup Performance in UAE During COVID-19: The Mediating Role of Innovativeness
- Author
-
Ahmad Ibrahim Aljumah, Muhammad Alshurideh, and Mohammed T. Nuseir
- Subjects
Response rate (survey) ,2019-20 coronavirus outbreak ,Knowledge management ,Coronavirus disease 2019 (COVID-19) ,business.industry ,Sample size determination ,Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) ,0502 economics and business ,05 social sciences ,Business intelligence ,050211 marketing ,business ,050203 business & management - Abstract
The current study aims at empirically investigating the impact of the business intelligence on the new start-up performance of UAE during Covid-19. The study also examines the mediating role of innovativeness in the relationship between business intelligence and new start-up performance in UAE. The sample size included distributing the questionnaires to 250 respondents to get the required information for further analyses. 210 questionnaires out of 250 were received, so the response rate of the study was 84%. The data analysis involved the path modeling technique because of the explorative nature of the study. The results indicated that all the paths are significant at a p-value of less than 0.05. The findings of the study will be helpful for policymakers and researchers in formulating the policy concerning the business intelligence, innovation, and start-up performance in UAE.
- Published
- 2021
39. External Factors Influencing Performance of Listed Firms on Dar es Salaam Stock Exchange
- Author
-
B. Mwenda, Dickson Pastory, and Benson Otieno Ndiege
- Subjects
Variables ,Return on assets ,Exchange rate ,Stock exchange ,Sample size determination ,media_common.quotation_subject ,Econometrics ,Control variable ,Business ,Interest rate ,media_common ,Panel data - Abstract
The prosperity of firm performance depends on external surrounding environment. However, in Tanzania, little is known about external factors influencing firm performance. The aim of this study was to determine the external factors influencing the performance of listed firms on Dar es Salaam Stock Exchange. The Study employed ex post facto research design as a guide. The sample size for this study consisted of 14 listed firms from 2011 to 2018. The study employed panel data, where secondary data for dependent variables were obtained from firms audited annual reports while independent variable data were obtained from National Bureau of Statistics, and Bank of Tanzania. The dependent variable (firm performance) was measured by Return on asset (ROA), while independent variables measured where; Exchange Rate, Gross domestic product, Lending Interest Rate, Inflation Rate, and Corporate tax rate with firm size and firm age as control variables. The study used two models (Model 1 and 2) during analysis. Model 1, the study included all variables, while model 2 dropped control variables (firm size and age) during analysis. Generalized Least Square method was used for analysis and testing the hypotheses. The findings indicated that inflation rate was positively significant to ROA at p < 0.05 and p < 0.1 in Model 1 and 2 respectively, while corporate tax rate was positively significant at p < 0.01 for Model 1 and 2. Exchange rate was negatively significant at p < 0.01 for model 1 and 2, while firm age had a positive significant influence on ROA at p < 0.01 in model 1. Such results have implications on firms since their survival, growth and performance highly depend on their interactions with the surrounding environment. The study recommends regulatory bodies and Bank of Tanzania to formulate policy frameworks to regulate and control effects of external factors in order to improve firm performance.
- Published
- 2021
40. Random Dimension Low Sample Size Asymptotics
- Author
-
Vladimir V. Ulyanov and Gerd Christoph
- Subjects
Normal distribution ,Simplex ,Dimension (vector space) ,Laplace's method ,Student's t-distribution ,Sample size determination ,Mathematical analysis ,Scaling ,Vector space ,Mathematics - Abstract
A first investigation of high-dimensional low-sample-size (HDLSS) asymptotics, Hall, Marron and Neeman (2005) discovered a surprisingly rigid geometric structure. A sample of size k taken from the standard m-dimensional normal distribution is for large m close to the vertices of the k-dimensional simplex in m-dimensional vector space. It follows from the analysis of three geometric statistics: the length of an observation, the distance between any two independent observations and the angle between these vectors. We generalize and refine the results constructing the second order Chebyshev-Edgeworth expansions under assumption that the data dimension is random and different scaling factors are chosen.
- Published
- 2021
41. Feasibility and Pilot Studies
- Author
-
Kenneth E. Freedland, Peter G. Kaufmann, and Lynda H. Powell
- Subjects
Protocol (science) ,medicine.medical_specialty ,business.industry ,Consolidated Standards of Reporting Trials ,law.invention ,Clinical trial ,Randomized controlled trial ,Sample size determination ,law ,Credibility ,Medicine ,Medical physics ,business ,Trial methodology - Abstract
This chapter defines feasibility and pilot studies according to the 2010 CONSORT guidelines. Feasibility studies are always linked to a planned clinical trial. Pilot studies are a subset of feasibility studies that use an identical randomized design as the planned clinical trial but are conducted to answer questions about the feasibility of a protocol. Pilot studies are distinct from miniature efficacy trials. Although both are conducted on a small number of participants and use a randomized design, pilot studies focus on feasibility, but miniature efficacy trials claim to provide estimates of efficacy. The evolving literature on behavioral trial methodology has shown a variety of problems that result from estimating efficacy with underpowered miniature efficacy trials. Recommendations are to prepare for a definitive behavioral clinical trial by making a strong case for its credibility, plausibility, and feasibility and by estimating effect sizes using targets needed to achieve clinically significant benefit. The popular practice of conducting miniature efficacy trials which are loosely linked to a planned behavioral trial, calling them pilot studies, drawing conclusions about potential efficacy, and using results to estimate sample size should be discontinued.
- Published
- 2021
42. Statistical Analysis of Test Results of Metal-Composite Compounds Under Action of Shear
- Author
-
I. I. Sorokina, M. V. Astahov, and Ekaterina V. Slavkina
- Subjects
Normal distribution ,Shear (sheet metal) ,business.product_category ,Sample size determination ,business.industry ,Homogeneity (statistics) ,Shear force ,Structural engineering ,Bartlett's test ,business ,Fastener ,Random variable ,Mathematics - Abstract
The wide distribution of polymer composite materials due to the possibility of their use in the repair and modernization of existing products, as well as the manufacture of new parts in the repair shop. The connection of such assemblies with the metal parts of the structure is the least studied aspect. At present, the advantages of combined adhesive pins that do not violate the integrity of the fibers of the composites are obvious. Under the conditions of this study, such a connection was made using lance-shaped fasteners, the geometry of which affects the compound being studied. The influence of the location of the plane of the fasteners in relation to the applied load is considered. A statistical analysis of the results of the obtained force values was carried out in stages: elimination of misses, establishing the law of distribution of a random variable, checking the received samples for uniformity, comparing average values. Based on the use of probabilistic paper, normal is accepted as the distribution law. The homogeneity test was performed using the Bartlett test, which is due to the sample size and the normal distribution of the response. Based on statistical processing, an increase in the value of the load destroying the sample was revealed. For the samples with a parallel arrangement of F planes at the angle \(45^\circ\) to the shear force, the withstand force is 15 … 30% higher compared with the rest of the control groups. Processing of the results made it possible to obtain an empirical coefficient reflecting the influence of the geometry of the location of the embedded fastener on the performance of the metal-composite joint under the action of shear load.
- Published
- 2021
43. Performance Measures in Discrete Supervised Classification
- Author
-
Anabela Marques and Ana Sousa Ferreira
- Subjects
Measure (data warehouse) ,Index (economics) ,Receiver operating characteristic ,Computer science ,business.industry ,Regression analysis ,Mutual information ,Machine learning ,computer.software_genre ,Field (computer science) ,Sample size determination ,Artificial intelligence ,Association (psychology) ,business ,computer - Abstract
The evaluation of results in Cluster Analysis frequently appears in the literature, and a variety of evaluation measures have been proposed. On the contrary, in supervised classification, particularly in the discrete case, the subject of results’ evaluation is relatively scarce in this field of the literature. This is the motto underlying this study. The evaluation of the performance of any model of supervised classification is, generally, based on the number of cases correctly or incorrectly predicted by the model. However, these measures can lead to a misleading evaluation when the data is not balanced. More recently, other types of measures have been studied as association or agreement coefficients, the Huberty index, Mutual information, and even ROC curves. Exploratory studies were conducted in this study to understand the relationship between each measure and data characteristics, namely, sample size, balance, and separability of classes. To this end, simulated data and a Beta regression model in the performance of the models were used.
- Published
- 2021
44. Exploring Pilot Workload During Professional Pilot Primary Training and Development: A Feasibility Study
- Author
-
Stephen Belt, Srikanth Gururajan, Nithil Kumar Bollock, Gajapriya Tamilselvan, and Yan Gai
- Subjects
Computer science ,Aviation ,business.industry ,Sample size determination ,Control (management) ,Applied psychology ,Workload ,Metric (unit) ,Flight training ,business ,Training and development ,Flight simulator - Abstract
Workload is an effective analytical attribute that helps to evaluate a pilot’s performance while operating an aircraft yet is an under-researched construct in aviation. In the foundational study, we sought to utilize electroencephalogram (EEG) and flight simulator performance data to explore the relationship between mental and physical workload of pilots as they completed routine flight activities. The study focused on two specific metrics- the EEG response (sensory inhibition, and attention) and a physical workload metric derived from flight control activity and the deviation from reference pitch and bank attitudes. Five pilots participated in the study, each completing five sessions in an Advanced Aviation Training Device (AATD). The results were inconclusive but seemed to indicate trends that were reasonably linked to pilot’s skill profile. A more complete and nuanced understanding of how mental and physical workloads relate to pilot activity may be determined from additional research with a larger sample size and broader range of performance metrics and assessment strategies.
- Published
- 2021
45. Planning for Precision and Power
- Author
-
Hans-Michael Kaltenbach
- Subjects
Power analysis ,Sample size determination ,Computer science ,Blocking (statistics) ,Algorithm ,Mean difference ,Power (physics) - Abstract
We provide an introduction into power analysis based on a two-sample problem with normally distributed errors. We consider the impact of balancing the size of treatment groups, and methods for increasing precision and power by blocking or sub-sampling that do not require increasing the sample size. We then discuss the determination of sample sizes to achieve the desired precision of a difference of means, and the problem of power analysis for testing such a difference based on the normal and the t-distribution. We also provide some practical shorthand formulas for quick approximate sample size calculations.
- Published
- 2021
46. Survey Research Major Methodological Flaws: Caveat Lector
- Author
-
Joseph A. Balogun
- Subjects
Medical education ,media_common.quotation_subject ,education ,Survey research ,Social issues ,humanities ,Rigour ,Readability ,Sample size determination ,Perception ,Survey instrument ,Psychology ,Web-based calculator ,media_common - Abstract
Survey research is a structured and systematic descriptive method of scientific inquiry frequently used to gauge patients' and practitioners' knowledge, perceptions, and attitudes on professional, health, and social issues and determine the prevalence of diseases and primary risk factors that cause them. For the findings from a survey study to be externally valid, the researcher must address the major methodological problems at the planning stage of the investigation. This chapter seeks to fill the perceived knowledge and skills gaps by providing details on how to achieve cross-cultural and conceptual equivalence of questionnaires translated from another language, ascertain the readability of a survey instrument. It also examines the use of web-based resources to calculate the optimum sample size in survey research. The judicious application of the information in this chapter will improve the rigour of survey research published in medical and allied health journals.
- Published
- 2021
47. Possible Factors Which May Impact Kernel Equating of Mixed-Format Tests
- Author
-
Marie Wiberg and Jorge González
- Subjects
Standard error ,Sample size determination ,Kernel (statistics) ,Item response theory ,Equating ,Statistics ,Item discrimination ,behavioral disciplines and activities ,Mathematics ,Test (assessment) - Abstract
Mixed-format tests contain items with different formats such as dichotomously scored and polytomously scored items. The aim of this study was to examine the impact of item discrimination, sample size, and proportion of polytomously scored items on item response theory (IRT) kernel equating of mixed-format tests under the equivalent groups design. A simulation study was performed to examine the aim. The results from the simulation study showed that the percent relative errors were low and stable for all conditions, whereas differences in standard errors and equated values where found for the conditions with different sample sizes and item discriminations. Also, the standard errors were higher when the proportion of polytomously scored items in the test where higher.
- Published
- 2021
48. Nonparametric Model-Based Estimators for the Cumulative Distribution Function of a Right Censored Variable in a Small Area
- Author
-
Casanova, Sandrine, Leconte, Eve, Daouia, Abdelaati, Ruiz-Gazen, Anne, Toulouse School of Economics (TSE), Université Toulouse 1 Capitole (UT1), and Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-École des hautes études en sciences sociales (EHESS)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)
- Subjects
Cumulative distribution function ,05 social sciences ,Nonparametric statistics ,Estimator ,[SHS.ECO]Humanities and Social Sciences/Economics and Finance ,01 natural sciences ,Quantile regression ,010104 statistics & probability ,Sample size determination ,0502 economics and business ,Statistics ,Covariate ,0101 mathematics ,050205 econometrics ,Quantile ,Variable (mathematics) ,Mathematics - Abstract
National audience; In survey analysis, the estimation of the cumulative distribution function (cdf) is of great interest as it facilitates the derivation of mean/median estimators for both populations and sub-populations (i.e. domains). We focus on small domains and consider the case where the response variable is right censored. Under this framework, we propose a nonparametric model-based estimator that extends the cdf estimator of Casanova (2012) to the censored case: it uses auxiliary information in the form of a continuous covariate and utilizes nonparametric quantile regression. We then employ simulations to compare the constructed estimator with the model-based cdf estimator of Casanova and Leconte (2015) and the Kaplan–Meier estimator (Kaplan and Meier 1958), both of which use only information contained within the domain: the quantile-based estimator performs better than the former two for very small domain sample sizes. Access times to the first job for young female graduates in the Occitania region are used to illustrate the new methodology.
- Published
- 2021
49. Comparative Analysis of Sampling Methods for Data Quality Assessment
- Author
-
Hong Liu, Sameer Karali, and Jongyeop Kim
- Subjects
Data cleansing ,Computer science ,media_common.quotation_subject ,Sampling (statistics) ,computer.software_genre ,Work (electrical) ,Sample size determination ,Data quality ,Statistics ,Credibility ,Quality (business) ,Productivity ,computer ,media_common - Abstract
Data quality assessment is an integral part of maintaining the quality of a system. The purpose of implementing an assessment is to make reading and using data easier, and it is also a prerequisite for data analysis. Low-quality data can be an obstacle for fast analysis, which is one of the reasons for financial issues in countries, companies, and hospitals. Jesmeen et al. [6] and Laranjeiro et al. [7] pointed out that poor quality data costs approximately $13.3 million per organization and $3 trillion per year for the entire US economy. Not only does poor data impact financial resources, but it also negatively impacts efficiency, productivity, and credibility. Therefore, data quality assessment has become one of the most attractive technologies to help improve data quality. In this work, we use data samples to assess the overall data quality. First, the sample size is determined and a control dataset is created to validate the proposed method. Next, four sampling methods were used to obtain different sized samples. Based on the three quality dimensions, the sampling results are compared using various statistical methods.
- Published
- 2021
50. Understanding the Influences of Cognitive Biases on Financial Decision Making During Normal and COVID-19 Pandemic Situation in the United Arab Emirates
- Author
-
Rand Al-Dmour, S. F. Shah, Ahmed Al-Dmour, and Muhammad Alshurideh
- Subjects
Finance ,050208 finance ,business.industry ,05 social sciences ,Anchoring ,Behavioral economics ,Cognitive bias ,Sample size determination ,Loss aversion ,0502 economics and business ,Pandemic ,Herding ,050207 economics ,business ,Psychology ,Overconfidence effect - Abstract
The purpose of the study is to identify the effects of behavioral/psychological factors i.e. overconfidence, anchoring bias, loss aversion, herding effect on financial decision making, in both normal situation (NS) and COVID-19 pandemic uncertain situation (CVD-19) separately. This paper used a qualitative method by using semi-structured interview (virtual and physical) and all fifteen interviewers were based in the United Arab Emirates. Whereas, the results of the study show that in NS all the factors have a positive significant relationship with financial decision-making. But on the other hand, in the CVD-19 uncertain situation, majority of the factors has a negative effect on financial decision making, except for overconfidence, which shows positive effect. Though the limitation was a time constraint, limited factors, and CVD-19 itself is a stressful environment and people do not prefer to participate in interviews. Finally, the future research direction is to increase sample size and factors to understand the financial decision impact on performance.
- Published
- 2021
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.