44 results on '"spurious correlations"'
Search Results
2. Towards generalizable face forgery detection via mitigating spurious correlation
- Author
-
Bai, Ningning, Wang, Xiaofeng, Han, Ruidong, Hou, Jianpeng, Wang, Qin, and Pang, Shanmin
- Published
- 2025
- Full Text
- View/download PDF
3. Enhancing Robustness of Over-Parameterized Models via Feature Reweighting Using Logit-Wise Mixup
- Author
-
Jo, Woo-Seok, Ju, Yeong-Joon, Lee, Seong-Whan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wallraven, Christian, editor, Liu, Cheng-Lin, editor, and Ross, Arun, editor
- Published
- 2025
- Full Text
- View/download PDF
4. Adaptive Bias Discovery for Learning Debiased Classifier
- Author
-
Bae, Jun-Hyun, Lee, Minho, Jung, Heechul, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cho, Minsu, editor, Laptev, Ivan, editor, Tran, Du, editor, Yao, Angela, editor, and Zha, Hongbin, editor
- Published
- 2025
- Full Text
- View/download PDF
5. From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition
- Author
-
Qraitem, Maan, Saenko, Kate, Plummer, Bryan A., Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
6. Constructing Concept-Based Models to Mitigate Spurious Correlations with Minimal Human Effort
- Author
-
Kim, Jeeyung, Wang, Ze, Qiu, Qiang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
7. Resurrecting failed environment-recruitment relationships: Revisiting cod recruitment in light of the poor-recruitment paradigm.
- Author
-
Gross, Julie M. and Hoenig, John M.
- Subjects
- *
RECRUITMENT (Population biology) , *ATLANTIC cod , *LIFE sciences , *SPRING , *AUTUMN - Abstract
Numerous environmental drivers (e.g., temperature, salinity) have been proposed as controls to recruitment. However, these relationships often become discredited over time as further data become available. It remains unclear if these environment-recruitment relationships are spurious results of examining many variables or if environmental drivers' influences on recruitment change over time as other variables exert control, e.g., with shifting climate. We propose that it is of value to re-examine discredited environment-recruitment relationships using the poor-recruitment paradigm (Gross et al. 2022, Fish. Res. 252, 106329). This approach examines whether extreme environmental conditions are associated with poor recruitment, the idea being that non-extreme conditions are uninformative in terms of predicting recruitment. This allows one to detect patterns of poor recruitment regardless of an environment-recruitment relationship's perceived (lack of) significance. We apply the poor-recruitment paradigm approach to various stocks of Atlantic cod (Gadus morhua) in the North Atlantic for both historical environment-recruitment data from a meta-analysis (Myers 1998, Rev. Fish Biol. Fisher. 8, 285) and for recent environment-recruitment data from the 2023 stock assessments for stocks in the Gulf of Maine and Georges Bank in the northwest Atlantic Ocean. From re-examining nine historical data sets, we find poor recruitment can be predicted from previously discredited or uninformative environmental variables involving temperature, salinity and zooplankton abundance. For recent cod data, we examine three stocks each with six environmental variables measured during two seasons (spring and autumn). We find that temperature continues to have predictive value for poor recruitment, but zooplankton abundance is less informative and does not always follow the poor-recruitment paradigm. These findings suggest that (1) these predictors continue to be important over time for cod recruitment and (2) that for the recent regime, temperature may be a better predictor of poor recruitment than zooplankton abundance. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
8. VulCausal: Robust Vulnerability Detection Using Neural Network Models from a Causal Perspective
- Author
-
Kuang, Hongyu, Zhang, Jingjing, Yang, Feng, Zhang, Long, Huang, Zhijian, Yang, Lin, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cao, Cungeng, editor, Chen, Huajun, editor, Zhao, Liang, editor, Arshad, Junaid, editor, Asyhari, Taufiq, editor, and Wang, Yonghao, editor
- Published
- 2024
- Full Text
- View/download PDF
9. Label-aware debiased causal reasoning for Natural Language Inference
- Author
-
Kun Zhang, Dacao Zhang, Le Wu, Richang Hong, Ye Zhao, and Meng Wang
- Subjects
Natural language inference ,Spurious correlations ,Debiased reasoning ,Causal effect ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Recently, researchers have argued that the impressive performance of Natural Language Inference (NLI) models is highly due to the spurious correlations existing in training data, which makes models vulnerable and poorly generalized. Some work has made preliminary debiased attempts by developing data-driven interventions or model-level debiased learning. Despite the progress, existing debiased methods either suffered from the high cost of data annotation processing, or required elaborate design to identify biased factors. By conducting detailed investigations and data analysis, we argue that label information can provide meaningful guidance to identify these spurious correlations in training data, which has not been paid enough attention. Thus, we design a novel Label-aware Debiased Causal Reasoning Network (LDCRN). Specifically, according to the data analysis, we first build a causal graph to describe causal relations and spurious correlations in NLI. Then, we employ an NLI model (e.g., RoBERTa) to calculate total causal effect of input sentences to labels. Meanwhile, we design a novel label-aware biased module to model spurious correlations and calculate their causal effect in a fine-grained manner. The debiasing process is realized by subtracting this causal effect from total causal effect. Finally, extensive experiments over two well-known NLI datasets and multiple human-annotated challenging test sets are conducted to prove the superiority of LDCRN. Moreover, we have developed novel challenging test sets based on MultiNLI to facilitate the community.
- Published
- 2024
- Full Text
- View/download PDF
10. Are Vision Transformers Robust to Spurious Correlations?
- Author
-
Ghosal, Soumya Suvra and Li, Yixuan
- Subjects
- *
TRANSFORMER models , *ARTIFICIAL neural networks - Abstract
Deep neural networks may be susceptible to learning spurious correlations that hold on average but not in atypical test samples. As with the recent emergence of vision transformer (ViT) models, it remains unexplored how spurious correlations are manifested in such architectures. In this paper, we systematically investigate the robustness of different transformer architectures to spurious correlations on three challenging benchmark datasets. Our study reveals that for transformers, larger models and more pre-training data significantly improve robustness to spurious correlations. Key to their success is the ability to generalize better from the examples where spurious correlations do not hold. Further, we perform extensive ablations and experiments to understand the role of the self-attention mechanism in providing robustness under spuriously correlated environments. We hope that our work will inspire future research on further understanding the robustness of ViT models to spurious correlations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Distributionally Robust Optimization and Invariant Representation Learning for Addressing Subgroup Underrepresentation: Mechanisms and Limitations
- Author
-
Kumar, Nilesh, Shrestha, Ruby, Li, Zhiyuan, Wang, Linwei, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wesarg, Stefan, editor, Puyol Antón, Esther, editor, Baxter, John S. H., editor, Erdt, Marius, editor, Drechsler, Klaus, editor, Oyarzun Laura, Cristina, editor, Freiman, Moti, editor, Chen, Yufei, editor, Rekik, Islem, editor, Eagleson, Roy, editor, Feragen, Aasa, editor, King, Andrew P., editor, Cheplygina, Veronika, editor, Ganz-Benjaminsen, Melani, editor, Ferrante, Enzo, editor, Glocker, Ben, editor, Moyer, Daniel, editor, and Petersen, Eikel, editor
- Published
- 2023
- Full Text
- View/download PDF
12. Probing Spurious Correlations in Popular Event-Based Rumor Detection Benchmarks
- Author
-
Wu, Jiaying, Hooi, Bryan, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Amini, Massih-Reza, editor, Canu, Stéphane, editor, Fischer, Asja, editor, Guns, Tias, editor, Kralj Novak, Petra, editor, and Tsoumakas, Grigorios, editor
- Published
- 2023
- Full Text
- View/download PDF
13. Feature purify: An examination of spurious correlations in high-entropy alloys
- Author
-
Yue Pan, Hua Hou, Xiaolong Pei, and Yuhong Zhao
- Subjects
High-entropy alloys ,Spurious correlations ,Dimensionality reduction ,Materials of engineering and construction. Mechanics of materials ,TA401-492 - Abstract
Patterns in datasets, while key to successful predictions, can also be misleading. In this study, a feature filtering workflow based on interpretable dimensionality reduction techniques was developed to diagnose the spurious correlations in high-entropy alloy dataset with phase structure and hardness labels. It is found that the presumed linear relationship between valence electron concentration (VEC) and hardness in high-entropy alloys is spurious, which highly influenced by the constitute elements of alloy system and determined by properties related to atomic radius. In addition, electron work function (w) and cohesive energy (Ec) have similar relationships with hardness, which indicated that these electronic features should be excluded for hardness prediction. Hence, this process can serve as a preliminary step in feature selection to mitigate the influence of non-causal features on prediction.
- Published
- 2024
- Full Text
- View/download PDF
14. Does Mathematical Coupling Matter to the Acute to Chronic Workload Ratio? A Case Study From Elite Sport.
- Author
-
Coyne, Joseph O. C., Nimphius, Sophia, Newton, Robert U., and Haff, G. Gregory
- Subjects
SPORTS injuries risk factors ,BASKETBALL ,MATHEMATICS ,INDUSTRIAL psychology ,RISK assessment ,STATISTICS ,WEIGHT lifting ,DATA analysis ,EFFECT sizes (Statistics) ,SPORTS events ,PHYSICAL training & conditioning - Abstract
Purpose: Criticisms of the acute to chronic workload ratio (ACWR) have been that the mathematical coupling inherent in the traditional calculation of the ACWR results in a spurious correlation. The purposes of this commentary are (1) to examine how mathematical coupling causes spurious correlations and (2) to use a case study from actual monitoring data to determine how mathematical coupling affects the ACWR. Methods: Training and competition workload (TL) data were obtained from international-level open-skill (basketball) and closed-skill (weightlifting) athletes before their respective qualifying tournaments for the 2016 Olympic Games. Correlations between acute TL, chronic TL, and the ACWR as coupledAincoupled variations were examined. These variables were also compared using both rolling averages and exponentially weighted moving averages to account for any potential benefits of one calculation method over another. Results: Although there were some significant differences between coupled and uncoupled chronic TL and ACWR data, the effect sizes of these differences were almost all trivial (g = 0.04-0.21). Correlations ranged from r = .55 to .76, .17 to .53, and .88 to .99 for acute to chronic TL, acute to uncoupled chronic TL, and ACWR to uncoupled ACWR, respectively. Conclusions: There may be low risk of mathematical coupling causing spurious correlations in the TL-injury-risk relationship. Varying levels of correlation seem to exist naturally between acute and chronic TL variables regardless of coupling. The trivial to small effect sizes and large to nearly perfect correlations between coupled and uncoupled AWCRs also imply that mathematical coupling may have little effect on either calculation method, if practitioners choose to apply the ACWR for TL monitoring purposes. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
15. Checking Robustness of Representations Learned by Deep Neural Networks
- Author
-
Szyc, Kamil, Walkowiak, Tomasz, Maciejewski, Henryk, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Dong, Yuxiao, editor, Kourtellis, Nicolas, editor, Hammer, Barbara, editor, and Lozano, Jose A., editor
- Published
- 2021
- Full Text
- View/download PDF
16. Mitigating Spurious Correlations for Self-supervised Recommendation
- Author
-
Lin, Xin-Yu, Xu, Yi-Yan, Wang, Wen-Jie, Zhang, Yang, and Feng, Fu-Li
- Published
- 2023
- Full Text
- View/download PDF
17. Detecting Spurious Correlations With Sanity Tests for Artificial Intelligence Guided Radiology Systems
- Author
-
Usman Mahmood, Robik Shrestha, David D. B. Bates, Lorenzo Mannelli, Giuseppe Corrias, Yusuf Emre Erdi, and Christopher Kanan
- Subjects
deep learning ,computed tomography ,bias ,validation ,spurious correlations ,artificial intelligence ,Medicine ,Public aspects of medicine ,RA1-1270 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.
- Published
- 2021
- Full Text
- View/download PDF
18. Constructing Causal Networks Through Regressions: A Tutorial.
- Author
-
Alemi, Farrokh
- Subjects
- *
HOSPITAL admission & discharge , *HOSPITAL emergency services , *CASE studies , *MULTIVARIATE analysis , *ARTIFICIAL neural networks , *PATIENTS , *REGRESSION analysis , *RISK management in business , *SOFTWARE architecture , *ADVERSE health care events , *STATISTICAL models , *ROOT cause analysis - Abstract
Background: Significant progress has been made in the practice of conducting causal analysis using network models. Despite this progress, there is limited evidence that hospital risk managers are using these analytical models. Objective: This article introduces the causal network, its related concepts, and methods of analysis. The article demonstrates how hospital risk managers can use existing regression software to construct a causal network and identify root causes of an adverse event. Methods: Causal networks depict cause and effect in a set of variables. In this context, causes are strong correlations that meet 3 additional criteria: (1) causes occur prior to effects, (2) there is an articulated mechanism for how causes lead to effects, and (3) the association between cause and effect is not spurious, meaning the association persists even after other variables are statistically controlled for (a method of analysis called counterfactual). A causal network can be constructed through repeated use of least absolute shrinkage and selection operator (LASSO) regression. In the proposed regressions, the response variable is any variable in the data. The independent variables are variables that occur prior to the response variable. By design, the statistically significant coefficients in the time-constrained LASSO regressions identify "direct" causes of the response variables. When direct causes of all variables are identified, then the entire network model, including root causes, has been specified. In the final step, the parameters of the network model (ie, strength of causal associations) are estimated by fitting the network structure to the available data. We demonstrate these concepts through fitting a network model to simulated data for causes of excessive boarding in emergency departments. Results: The network (involving 12 causes, over 4 periods, and 1 sentinel event) was accurately recovered from the simulated case reports. The recovered network did not differ from the original network used to simulate the data in any of the 156 possible links. The recovered network allowed the identification of root and direct causes. It showed that hospital occupancy rate, and not emergency department efficiency, was root cause of excessive emergency department boarding. Discussion: Causal networks can provide insights into root, and direct, causes of an adverse event. These models provide empirical tests of causes of adverse events. We encourage the use of these methods by hospital risk managers. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF
19. An Investigation of Adaptive Radius for the Covariance Localization in Ensemble Data Assimilation
- Author
-
Xiang Xing, Bainian Liu, Weimin Zhang, Jianping Wu, Xiaoqun Cao, and Qunbo Huang
- Subjects
ensemble data assimilation ,spurious correlations ,adaptive covariance localization ,Naval architecture. Shipbuilding. Marine engineering ,VM1-989 ,Oceanography ,GC1-1581 - Abstract
The covariance matrix estimated from the ensemble data assimilation always suffers from filter collapse because of the spurious correlations induced by the finite ensemble size. The localization technique is applied to ameliorate this issue, which has been suggested to be effective. In this paper, an adaptive scheme for Schur product covariance localization is proposed, which is easy and efficient to implement in the ensemble data assimilation frameworks. A Gaussian-shaped taper function is selected as the localization taper function for the Schur product in the adaptive localization scheme, and the localization radius is obtained adaptively through a certain criterion of correlations with the background ensembles. An idealized Lorenz96 model with an ensemble Kalman filter is firstly examined, showing that the adaptive localization scheme helps to significantly reduce the spurious correlations in the small ensemble with low computational cost and provides accurate covariances that are similar to those derived from a much larger ensemble. The investigations of adaptive localization radius reveal that the optimal radius is model-parameter-dependent, vertical-level-dependent and nearly flow-dependent with weather scenarios in a realistic model; for example, the radius of model parameter zonal wind is generally larger than that of temperature. The adaptivity of the localization scheme is also illustrated in the ensemble framework and shows that the adaptive scheme has a positive effect on the assimilated analysis as the well-tuned localization.
- Published
- 2021
- Full Text
- View/download PDF
20. Further arguments that ability tilt correlations are spurious : A reply to Coyle (2022)
- Author
-
Sorjonen, Kimmo, Ingre, Michael, Nilsonne, Gustav, Melin, Bo, Sorjonen, Kimmo, Ingre, Michael, Nilsonne, Gustav, and Melin, Bo
- Abstract
Ability tilt refers to a within-individual difference between two abilities, e.g. a difference between math and verbal ability. Coyle and colleagues have demonstrated correlations between ability tilts and measures of the constituent abilities. We have previously pointed out that such measures may be spurious as the tilt variable is dependent on the constituent abilities. We have further shown that reported tilt associations are inconsistent with simulations including non-spurious tilt-effects, and concluded that tilt-correlations demonstrated by Coyle and colleagues are spurious. In a recent paper, Coyle responded with a series of arguments, including that the validity of tilt correlations is supported by their agreement with theoretical predictions, and that the analyses we used in our previous critique (regression effects) differ from tilt-correlations. Here, we advance the discussion by responding to the arguments put forward by Coyle. We show that the difference between regression effects and correlations is not material to the validity of our argument. Furthermore, we discuss the relation of tilt correlations to theory, and show that many empirical tilt-correlations, e.g. between the birth rate – death rate difference and fertility in US states, can be observed although such correlations can hardly be explained by differential investment theories. Therefore, we maintain that tilt correlations are spurious and that they offer little support for theories concerning the development of intelligence.
- Published
- 2023
- Full Text
- View/download PDF
21. Generalization Lessons from Biomedical Relation Extraction using Pretrained Transformer Models
- Author
-
Elangovan, Aparna and Elangovan, Aparna
- Abstract
Curating structured knowledge for storing in biomedical knowledge databases, requires human experts to annotate relationships, thus making maintenance of these databases expensive and difficult to scale to the large quantities of information presented in scientific publications. It is challenging to ensure that the information is comprehensive and up-to-date. Hence, we investigate the generalization capabilities of state-of-the-art natural language processing (NLP) techniques to automate relation extraction to aid human curation. In NLP, deep learning-based architectures, in particular pretrained transformer models with millions of parameters enabling them to achieve state of the art (SOTA) performance, have been dominating leaderboards on public benchmark datasets, usually achieved by fine-tuning pretrained transformer models on the target dataset task. In our research, we investigate the generalizability of such SOTA models – fine-tuned pretrained transformer models – in biomedical relation extraction for real-world applications where the performance expectations of these models need to be applicable beyond the official test sets. While our experiments focus on the current SOTA models, our findings have broader implications on generalization of NLP models and their performance evaluations. We ask the following research questions: 1. How generalizable are fine-tuned pretrained transformer models in biomedical relation extraction? 2. What factors lead to poor generalizability despite high test set performance of fine-tuned pretrained transformer models? 3. How can we improve qualitative aspects of the training data to improve real-world generalization performance of fine-tuned pretrained transformer models? The contributions are: 1) We identify a large performance gap compared to the test set when a SOTA fine-tuned pretrained transformer model is applied at large scale. This substantial generalization gap has neither been verified nor reported in prior large scale b
- Published
- 2023
22. Why genome-wide associations with cognitive ability measures are probably spurious.
- Author
-
Richardson, Ken and Jones, Michael C.
- Subjects
- *
SINGLE nucleotide polymorphisms , *COGNITIVE ability , *GENETIC models - Abstract
Much time and effort, as well as funding, is being devoted to Genome Wide Association Studies (GWAS) for identifying genetic causes of variation (single nucleotide polymorphisms or SNPs) in human cognitive abilities and educational attainments (CA and EA). After years of finding only very weak associations, usually failing to replicate, attention has turned to aggregates of otherwise non-significant SNPs (called polygenic scores, or PGS) and some associations with traits are now being reported. Here we show how, in the context of CA and EA as approximation measures, spurious correlations in GWAS/PGS can arise in a number of ways, particularly from genetic population structure. We review recent studies suggesting that attempts to control for such confounds have been quite inadequate, and also criticize the underlying statistical assumptions and genetic model. [ABSTRACT FROM AUTHOR]
- Published
- 2019
- Full Text
- View/download PDF
23. Contrastive variational information bottleneck for aspect-based sentiment analysis.
- Author
-
Chang, Mingshan, Yang, Min, Jiang, Qingshan, and Xu, Ruifeng
- Subjects
- *
SENTIMENT analysis , *GENERALIZATION , *STATISTICAL correlation , *FORECASTING - Abstract
Deep learning techniques have dominated the literature on aspect-based sentiment analysis (ABSA), achieving state-of-the-art performance. However, deep models generally suffer from spurious correlations between input features and output labels, which significantly hurts the robustness and generalization capability. In this paper, we propose to reduce spurious correlations for ABSA, via a novel C ontrastive V ariational I nformation B ottleneck framework (called CVIB). The proposed CVIB framework is composed of an original network and a self-pruned network, and these two networks are optimized simultaneously via contrastive learning. Concretely, we employ the Variational Information Bottleneck (VIB) principle to learn an informative and compressed network (self-pruned network) from the original network, which discards the superfluous patterns or spurious correlations between input features and prediction labels. Then, self-pruning contrastive learning is devised to pull together semantically similar positive pairs and push away dissimilar pairs, where the representations of the anchor learned by the original and self-pruned networks respectively are regarded as a positive pair while the representations of two different sentences within a mini-batch are treated as a negative pair. To verify the effectiveness of our CVIB method, we conduct extensive experiments on five benchmark ABSA datasets. The experimental results show that our approach achieves better performance than the strong competitors in terms of overall prediction performance, robustness, and generalization. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Apprentissage par transfert pour la détection des abus de langage
- Author
-
Bose, Tulika, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Université de Lorraine, Irina Illina, Dominique Fohr, and ANR-15-IDEX-0004,LUE,Isite LUE(2015)
- Subjects
Domain adaptation ,Adaptation au domaine ,Neighborhood framework ,Transfer learning ,Topic modeling ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Abusive language ,[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing ,Langage abusif ,Cadre de voisinage ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Apprentissage par transfert ,Corrélations parasites ,[INFO]Computer Science [cs] ,Spurious correlations ,Modélisation thématique - Abstract
The proliferation of social media, despite its multitude of benefits, has led to the increased spread of abusive language. Such language, being typically hurtful, toxic, or prejudiced against individuals or groups, requires timely detection and moderation by online platforms. Deep learning models for detecting abusive language have displayed great levels of in-corpus performance but underperform substantially outside the training distribution. Moreover, they require a considerable amount of expensive labeled data for training.This strongly encourages the effective transfer of knowledge from the existing annotated abusive language resources that may have different distributions to low-resource corpora. This thesis studies the problem of transfer learning for abusive language detection and explores various solutions to improve knowledge transfer in cross-corpus scenarios.First, we analyze the cross-corpus generalizability of abusive language detection models without accessing the target during training. We investigate if combining topic model representations with contextual representations can improve generalizability. The association of unseen target comments with abusive language topics in the training corpus is shown to provide complementary information for a better cross-corpus transfer.Secondly, we explore Unsupervised Domain Adaptation (UDA), a type of transductive transfer learning, with access to the unlabeled target corpus. Some popular UDA approaches from sentiment classification are analyzed for cross-corpus abusive language detection. We further adapt a BERT model variant to the unlabeled target using the Masked Language Model (MLM) objective. While the latter improves the cross-corpus performance, the other UDA methods perform sub-optimally. Our analysis reveals their limitations and emphasizes the need for effective adaptation methods suited to this task.As our third contribution, we propose two DA approaches using feature attributions, which are post-hoc model explanations. Particularly, the problem of spurious corpus-specific correlations is studied that restrict the generalizability of classifiers for detecting hate speech, a sub-category of abusive language. While the previous approaches rely on a manually curated list of terms, we automatically extract and penalize the terms causing spurious correlations. Our dynamic approaches improve the cross-corpus performanceover previous works both independently and in combination with pre-defined dictionaries.Finally, we consider transferring knowledge from a resource-rich source to a low-resource target with fewer labeled instances, across different online platforms. A novel training strategy is proposed, which allows flexible modeling of the relative proximity of neighbors retrieved from the resource-rich corpus to learn the amount of transfer. We incorporate neighborhood information with Optimal Transport that permits exploitingthe embedding space geometry. By aligning the joint embedding and label distributions of neighbors, substantial improvements are obtained in low-resource hate speech corpora.; La prolifération des médias sociaux, malgré ses nombreux avantages, a entraîné une augmentation des propos injurieux. Ces propos, qui sont généralement blessants, toxiques ou empreints de préjugés à l'encontre d'individus ou de groupes, doivent être détectés et modérés rapidement par les plateformes en ligne. Les modèles d'apprentissage profond pour la détection de propos abusifs ont montré des niveaux de performance élevé quand ils sont évalués sur des données similaires à celles qui ont servi à entraîner les modèles, mais sont nettement moins performants s'ils sont évalués sur des données dont la distribution est différente. En outre, ils nécessitent une quantité considérable de données étiquetées coûteuses pour l'apprentissage. C'est pour cela qu'il est intéressant d'étudier le transfert efficace de connaissances à partir de corpus annotés existants de propos abusifs. Cette thèse étudie le problème de l'apprentissage par transfert pour la détection de propos abusifs et explore diverses solutions pour améliorer le transfert de connaissances dans des scénarios inter corpus.Tout d'abord, nous analysons la généralisabilité inter-corpus des modules de détection de propos abusifs sans accéder à des données cibles pendant le processus d'apprentissage. Nous examinons si la combinaison des représentations issues du thème (topic) avec des représentations contextuelles peut améliorer la généralisabilité. Nous montrons que l'association de commentaires du corpus cible avec des thèmes du corpus d'entraînement peut fournir des informations complémentaires pour un meilleur transfert inter-corpus.Ensuite, nous explorons l'adaptation au domaine non supervisée (UDA, Unsupervised Domain Adaptation), un type d'apprentissage par transfert transductif, avec accès au corpus cible non étiqueté. Nous explorons certaines approches UDA populaires dans la classification des sentiments pour la détection de propos abusifs dans le cadre de corpus croisés. Nous adaptons ensuite une variante du modèle BERT au corpus cible non étiqueté en utilisant la technique du modèle de langue avec masques (MLM Masked Language Model). Alors que cette dernière améliore les performances inter-corpus, les autres approches UDA ont des performances sous-optimales. Notre analyse révèle leurs limites et souligne le besoin de méthodes d'adaptation efficaces pour cette tâche.Comme troisième contribution, nous proposons deux approches d'adaptation au domaine utilisant les attributions de caractéristiques (feature attributions), qui sont des explications a posteriori du modèle. En particulier, nous étudions le problème des corrélations erronées (spurious correlations) spécifiques à un corpus qui limitent la généralisation pour la détection des discours de haine, un sous-ensemble des propos abusifs. Alors que les approches de la littérature reposent sur une liste de termes établie manuellement, nous extrayons et pénalisons automatiquement les termes qui causent des corrélations erronées. Nos approches dynamiques améliorent les performances dans le cas de corpus croisés par rapport aux travaux précédents, à la fois indépendamment et en combinaison avec des dictionnaires prédéfinis.Enfin, nous considérons le transfert de connaissances d'un domaine source avec beaucoup de données étiquetées vers un domaine cible, où peu d'instances étiquetées sont disponibles. Nous proposons une nouvelle stratégie d'apprentissage, qui permet une modélisation flexible de la proximité relative des voisins récupérés dans le corpus source pour apprendre la quantité de transfert utile. Nous incorporons les informations de voisinage avec une méthode de transport optimal (Optimal Transport ) qui exploite la géométrie de l'espace de représentation (embedding space) . En alignant les distributions conjointes de l'embedding et des étiquettes du voisinage, nous montrons des améliorations substantielles dans des corpus de discours haineux de taille réduite.
- Published
- 2023
25. Principal component analyses for integrated ecosystem assessments may primarily reflect methodological artefacts.
- Author
-
Planque, Benjamin and Arneberg, Per
- Subjects
- *
PRINCIPAL components analysis , *MULTIVARIATE analysis , *MARINE ecology , *STATISTICAL correlation , *BIG data - Abstract
Multivariate analyses constitute an integral part of today's marine integrated ecosystem assessments (IEAs). Principal component analysis (PCA) is one of the most common of these techniques, and the method has been used repeatedly to summarize the dynamics of marine ecosystems. There seems to be little recognition of the potential pitfalls associated with performing PCA on time-series that are autocorrelated and/or non-stationary. We investigate how the descriptive performance of PCAs may be affected by the structure of the underlying timeseries and question whether such analyses can provide useful summaries of ecosystem trajectories. For this purpose, we reanalyse four datasets from the Barents, Norwegian, Baltic, and North Seas. We compare the results with those obtained from simulated datasets that share similar trend and autocorrelation properties, but in which the variables are unrelated. We show that most of the patterns revealed by the PCA can emerge from random time-series and that the fraction of the variance that cannot be accounted for by random processes is minimal. The Norwegian Sea dataset is a pathological case in which the variance explained by the first two components only exceeds what would be expected from randomly simulated time-series by 2%. We conclude that outputs from explorative multivariate analyses provide very little insight into ecosystem status, trajectories and functioning. IEA groups need to be equipped with methods that can provide better insight into how marine ecosystems function, the drivers of their changes and their possible future trajectories. [ABSTRACT FROM AUTHOR]
- Published
- 2018
- Full Text
- View/download PDF
26. Spurious correlations in research on ability tilt
- Author
-
Sorjonen, Kimmo, Nilsonne, Gustav, Ingre, Michael, Melin, Bo, Sorjonen, Kimmo, Nilsonne, Gustav, Ingre, Michael, and Melin, Bo
- Abstract
Ability tilt refers to a within-individual difference between two abilities (X-Y), e.g. differences between tech and verbal or verbal and math abilities. Studies have found associations between ability tilts and their constituent abilities (X or Y). Here we show that such associations may be spurious due to the non-independence of the two measures. Using data from the 1997 National Longitudinal Survey of Youth (NLSY97), we find that associations between ability and ability tilt may simply be due to more positive associations between two measures of the same or similar abilities compared to two measures of different or dissimilar abilities. This finding calls into question theoretical interpretations that have proposed that ability tilt correlations are due to differential investment of time and effort in one ability at the expense of the other ability.
- Published
- 2022
- Full Text
- View/download PDF
27. Ecoinformatics (Big Data) for Agricultural Entomology: Pitfalls, Progress, and Promise.
- Author
-
Rosenheim, Jay A. and Gratton, Claudio
- Subjects
- *
AGRICULTURAL informatics , *BIG data , *ENTOMOLOGY - Abstract
Ecoinformatics, as defined in this review, is the use of preexisting data sets to address questions in ecology. We provide the first review of ecoinformatics methods in agricultural entomology. Ecoinformatics methods have been used to address the full range of questions studied by agricultural entomologists, enabled by the special opportunities associated with data sets, nearly all of which have been observational, that are larger and more diverse and that embrace larger spatial and temporal scales than most experimental studies do. We argue that ecoinformatics research methods and traditional, experimental research methods have strengths and weaknesses that are largely complementary. We address the important interpretational challenges associated with observational data sets, highlight common pitfalls, and propose some best practices for researchers using these methods. Ecoinformatics methods hold great promise as a vehicle for capitalizing on the explosion of data emanating from farmers, researchers, and the public, as novel sampling and sensing techniques are developed and digital data sharing becomes more widespread. [ABSTRACT FROM AUTHOR]
- Published
- 2017
- Full Text
- View/download PDF
28. Detecting Spurious Correlations with Sanity Tests for Artificial Intelligence Guided Radiology Systems
- Author
-
Lorenzo Di Cesare Mannelli, Christopher Kanan, Usman Mahmood, Yusuf E. Erdi, Robik Shrestha, Giuseppe Corrias, and David D. B. Bates
- Subjects
FOS: Computer and information sciences ,medicine.medical_specialty ,Computer Science - Machine Learning ,bias ,Generalization ,Computer science ,Best practice ,Computer Vision and Pattern Recognition (cs.CV) ,Biomedical Engineering ,Computer Science - Computer Vision and Pattern Recognition ,Medicine (miscellaneous) ,Health Informatics ,Machine Learning (stat.ML) ,Machine perception ,030218 nuclear medicine & medical imaging ,Machine Learning (cs.LG) ,03 medical and health sciences ,0302 clinical medicine ,Statistics - Machine Learning ,Component (UML) ,medicine ,FOS: Electrical engineering, electronic engineering, information engineering ,Spurious relationship ,Original Research ,validation ,business.industry ,Deep learning ,Image and Video Processing (eess.IV) ,deep learning ,computed tomography ,Gold standard (test) ,QA75.5-76.95 ,Electrical Engineering and Systems Science - Image and Video Processing ,artificial intelligence ,3. Good health ,Computer Science Applications ,ComputingMethodologies_PATTERNRECOGNITION ,Software deployment ,030220 oncology & carcinogenesis ,Electronic computers. Computer science ,Digital Health ,Medicine ,Artificial intelligence ,Radiology ,spurious correlations ,Public aspects of medicine ,RA1-1270 ,business - Abstract
Artificial intelligence (AI) has been successful at solving numerous problems in machine perception. In radiology, AI systems are rapidly evolving and show progress in guiding treatment decisions, diagnosing, localizing disease on medical images, and improving radiologists' efficiency. A critical component to deploying AI in radiology is to gain confidence in a developed system's efficacy and safety. The current gold standard approach is to conduct an analytical validation of performance on a generalization dataset from one or more institutions, followed by a clinical validation study of the system's efficacy during deployment. Clinical validation studies are time-consuming, and best practices dictate limited re-use of analytical validation data, so it is ideal to know ahead of time if a system is likely to fail analytical or clinical validation. In this paper, we describe a series of sanity tests to identify when a system performs well on development data for the wrong reasons. We illustrate the sanity tests' value by designing a deep learning system to classify pancreatic cancer seen in computed tomography scans.
- Published
- 2021
29. The Clinical Relevance of the Percentage Flow-Mediated Dilation Index.
- Author
-
Atkinson, Greg and Batterham, Alan
- Abstract
In 2010, the American College of Cardiology Foundation and American Heart Association could not recommend brachial artery percentage flow-mediated dilation (FMD%) for risk assessment of coronary artery disease (CAD) in asymptomatic adults. We aimed to scrutinise past and recently published findings regarding FMD% in this same context of clinical utility and conclude that (1) the question of whether brachial FMD% is a suitable substitute for coronary vasodilation is addressed by method agreement statistics rather than the correlation coefficients that have been reported in past studies. Also, the much-repeated view that brachial FMD% and coronary vasodilation are 'closely related' is not entirely justified, even before the influence of baseline lumen diameters on this relationship is accounted for; (2) along with the specialist training and the considerable time (≥1 h) that is required for the FMD% protocol, the error in individual measurements and population reference ranges is too large for clinical decisions to be robust on individual patients; (3) many interventions that are proposed to change FMD% also change baseline artery diameter, which can bias estimates of any intervention effects on the flow-mediated response per se, and (4) the FMD% index generates spurious correlations between shear rate, artery diameter and endothelial function, which may help to explain the apparent paradoxes of FMD% being higher in obese people and lower in athletes. In conclusion, the clinical relevance of brachial artery flow-mediated dilation is unclear at present. The dependence of the chosen index, FMD%, on initial artery size has contributed to this lack of clarity. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
30. Multicollinearity in spatial genetics: separating the wheat from the chaff using commonality analyses.
- Author
-
Prunier, J. G., Colyn, M., Legendre, X., Nimon, K. F., and Flamand, M. C.
- Subjects
- *
ANIMAL genetic engineering , *SPATIAL behavior , *MULTICOLLINEARITY , *REGRESSION analysis , *ANALYSIS of variance - Abstract
Direct gradient analyses in spatial genetics provide unique opportunities to describe the inherent complexity of genetic variation in wildlife species and are the object of many methodological developments. However, multicollinearity among explanatory variables is a systemic issue in multivariate regression analyses and is likely to cause serious difficulties in properly interpreting results of direct gradient analyses, with the risk of erroneous conclusions, misdirected research and inefficient or counterproductive conservation measures. Using simulated data sets along with linear and logistic regressions on distance matrices, we illustrate how commonality analysis ( CA), a detailed variance-partitioning procedure that was recently introduced in the field of ecology, can be used to deal with nonindependence among spatial predictors. By decomposing model fit indices into unique and common (or shared) variance components, CA allows identifying the location and magnitude of multicollinearity, revealing spurious correlations and thus thoroughly improving the interpretation of multivariate regressions. Despite a few inherent limitations, especially in the case of resistance model optimization, this review highlights the great potential of CA to account for complex multicollinearity patterns in spatial genetics and identifies future applications and lines of research. We strongly urge spatial geneticists to systematically investigate commonalities when performing direct gradient analyses. [ABSTRACT FROM AUTHOR]
- Published
- 2015
- Full Text
- View/download PDF
31. An alternative approach for testing for linear association for two independent stationary AR(1) processes.
- Author
-
Agiakloglou, Christos and Tsimpanos, Apostolos
- Subjects
ECONOMICS ,STATISTICS ,MONTE Carlo method ,STATIONARY processes ,STOCHASTIC processes - Abstract
Spurious correlations occur when two independent time series are found to be correlated according to the typical statistical procedure for testing the null hypothesis of zero correlation in the population. Using a Monte Carlo analysis, this study examines the spurious correlation phenomenon for two independent stationary AR(1) processes and it finds that if an alternative testing procedure is applied, spurious behaviour is eliminated using the variance of the sample correlation coefficient of these two series, suggested by Bartlett (1935). [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
32. Statistical comparisons of heavy metal pollutants between seven regions of the Polish exclusive economic zone.
- Author
-
Renner, Ross
- Subjects
HEAVY metal toxicology ,POLLUTANTS ,ECONOMIC zones (Law of the sea) ,ANALYSIS of covariance ,ECONOMIC zoning - Abstract
This paper addresses three intractable difficulties associated with the statistical analysis of compositional data, such as percentages or ppm. These are: (1) that such data do not follow multivariate normal distributions thus rendering inappropriate, standard parametric statistical tests and estimation procedures, (2) the covariance/correlation coefficients between specific pairs of components are determined in whole or in part by the presence or absence of other components, and, (3) the negative bias property. That is, at least one covariance and therefore at least one correlation, must be negative, hence the remaining correlations are prevented from ranging freely between −1 and +1. It follows that correlation coefficients formed from compositional data are not only not absolute, but also frequently spurious. Standard multivariate procedures based on them are unreliable, and intrinsic associations between components inferred from strong positive correlations in particular, are potentially false. In a recent 2009 paper, it was reported that 59 surface sediment samples from 7 regions in the Polish exclusive economic zone had been chemically analyzed for 16 elements. Enrichment factors together with crude correlation coefficients between selected elements were presented. All these quantities were computed from the initial raw compositional data resulting from the chemical analyses In this paper, a statistical procedure is presented which is distinctly different to the enrichment factor computations based on the same raw compositional data. The procedure generates a log-ratio measure of the abundance of each element in each of the seven regions, thus enabling comparisons of relative levels of pollution between the regions. Although the two techniques are quite unrelated, it is shown that in general, extremely high or low measures of the relative abundances in the regions are associated with correspondingly high or low values of the enrichment factors in the same regions that were reported in the 2009 paper. That is, the statistical analysis confirms the results of the enrichment factor data in the identification of the most to the least polluted regions. In an additional analysis, the residue term was excluded from each sediment sample by rescaling the 16 element concentrations to sum to 100%, thus forming 59 residue-free sub-compositions. Crude correlation coefficients were computed for pairs of elements of this sub-compositional data. These revealed that certain correlations based on the initial raw data that were reported in the 2009 paper for the same pairs of elements, were not only inconsistent, but sometimes also contradictory. Such contradictions imply that intrinsic geochemical element associations inferred in that paper from such correlations were false. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
33. Improving the Ensemble Estimate of the Kalman Gain by Bootstrap Sampling.
- Author
-
Yanfen Zhang and Oliver, Dean S.
- Subjects
- *
KALMAN filtering , *ESTIMATION theory , *STATISTICAL correlation , *STATISTICAL bootstrapping , *MULTILEVEL models - Abstract
Using a small ensemble size in the ensemble Kalman filter methodology is efficient for updating numerical reservoir models but can result in poor updates following spurious correlations between observations and model variables. The most common approach for reducing the effect of spurious correlations on model updates is multiplication of the estimated covariance by a tapering function that eliminates all correlations beyond a prespecified distance. Distance-dependent tapering is not always appropriate, however. In this paper, we describe efficient methods for discriminating between the real and the spurious correlations in the Kalman gain matrix by using the bootstrap method to assess the confidence level of each element from the Kalman gain matrix. The new method is tested on a small linear problem, and on a water flooding reservoir history matching problem. For the water flooding example, a small ensemble size of 30 was used to compute the Kalman gain in both the screened EnKF and standard EnKF methods. The new method resulted in significantly smaller root mean squared errors of the estimated model parameters and greater variability in the final updated ensemble. [ABSTRACT FROM AUTHOR]
- Published
- 2010
- Full Text
- View/download PDF
34. SOME REMARKS ON PARTIAL AND SPURIOUS CORRELATIONS.
- Author
-
Neruda, Boris
- Subjects
- *
STATISTICAL correlation , *RESEARCH methodology , *MATHEMATICAL statistics , *ANALYSIS of variance , *COMPUTER software - Abstract
The estimation of correlation coefficients has become an easy task since comfortable computer software is available to the end that in numerous publications whole batteries of correlation coefficient matrices are being presented. However, the effects of confounding variables are not always recognized and not searched for. Hence false conclusions are drawn from otherwise important scientific studies. This study offers a tool which allows researcher to estimate in advance those correlation coefficients which are promising candidates of being biased by a third variable. This is time sparing because only targeted coefficients need then to be recalculated. Furthermore, the tabulated data are given which allow to roughly and quickly estimate partial correlation coefficients. [ABSTRACT FROM AUTHOR]
- Published
- 2005
35. Pitfalls of normalization of marine geochemical data using a common divisor
- Author
-
Van der Weijden, Cornelis H.
- Subjects
- *
TRACE elements , *GEOCHEMISTRY , *SEDIMENTS - Abstract
Normalization of trace element contents of sediment through division by the content of an immobile element – usually aluminum – is common practice in marine geochemical studies. This might appear to be a simple way of correcting for dilution by sedimentary phases barren of a particular trace element – for instance carbonates or silica – and for comparison with its content in a standard clay or shale. The purpose of this paper is to revive the awareness of the pitfalls of this practice by giving numerical examples of the often-unexpected results of normalization. Taking a statistical perspective, it is first shown that uncorrelated variables acquire spurious correlations when normalized. But normalization can also increase, decrease, change sign of, or even blur the correlations between unmodified variables. Next, a number of realistic scenarios are worked out to show that the correlations between normalized element contents still suffer from the closure effect. Only in a few simple cases it is possible to extract, from the normalized data, realistic estimates of trace element contents in distinct sedimentary phases, such as organic matter. This is ascertained from data on copper contents in sediment from the Black Sea and Arabian Sea. When the coefficient of variation (i.e. standard deviation divided by the mean) of the aluminum data is relatively low and much lower than for the original values of the trace elements, comparisons of correlations can equally well or even better be made on the basis of the unmodified values. Then, the comparison of trace element data with their values for standard shales is discussed. The inherent problem is that the composition of the commonly used standard shale and, consequently, the reference values of normalized elements are not necessarily representative of the local or regional sediments in the study area. Also, this method falls short of identifying the processes responsible for enrichment or diminution in trace element contents. Finally, consideration is given to some aspects of a proper use of Al normalization [Copyright &y& Elsevier]
- Published
- 2002
- Full Text
- View/download PDF
36. Spurious correlations in research on the effects of specific cognitive abilities.
- Author
-
Sorjonen, Kimmo and Melin, Bo
- Subjects
- *
COGNITIVE ability , *VERBAL ability , *DATA analysis , *TEST scoring , *TIME perception , *COGNITIVE Abilities Test - Abstract
Studies on the effect of non- g ability residuals have often employed double adjustment for general cognitive ability (g), as they have calculated the ability residuals adjusting for g and then calculated the effect of the non- g residuals while adjusting for g. The present simulations demonstrate that the double adjustments may result in spurious negative associations between the non- g residual on one cognitive ability, e.g. verbal ability, and variables with a positive association with another ability, e.g. SAT math and math ability. In analyses of data from the 1997 National Longitudinal Survey of Youth (NLSY97), the negative associations between non- g residuals on verbal and math ability and aptitude test scores on the other ability vanished when not double adjusting for g. This indicates that the observed negative associations may be spurious and not due to differential investment of time and effort in one ability at the expense of the other ability, as suggested in the literature. Researchers of the effects of specific abilities are recommended to validate their findings and interpretations with analyses not double adjusting for g. • Negative associations involving non- g ability residuals have been identified. • These associations may be spurious and due to double adjustment for g. • Claims that associations are due to differential investment can be questioned. • Effects of specific abilities should be verified by not double adjusting for g. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
37. An Investigation of Adaptive Radius for the Covariance Localization in Ensemble Data Assimilation.
- Author
-
Xing, Xiang, Liu, Bainian, Zhang, Weimin, Wu, Jianping, Cao, Xiaoqun, and Huang, Qunbo
- Subjects
KALMAN filtering ,SCHUR functions ,COVARIANCE matrices ,WEATHER ,TEMPERATURE - Abstract
The covariance matrix estimated from the ensemble data assimilation always suffers from filter collapse because of the spurious correlations induced by the finite ensemble size. The localization technique is applied to ameliorate this issue, which has been suggested to be effective. In this paper, an adaptive scheme for Schur product covariance localization is proposed, which is easy and efficient to implement in the ensemble data assimilation frameworks. A Gaussian-shaped taper function is selected as the localization taper function for the Schur product in the adaptive localization scheme, and the localization radius is obtained adaptively through a certain criterion of correlations with the background ensembles. An idealized Lorenz96 model with an ensemble Kalman filter is firstly examined, showing that the adaptive localization scheme helps to significantly reduce the spurious correlations in the small ensemble with low computational cost and provides accurate covariances that are similar to those derived from a much larger ensemble. The investigations of adaptive localization radius reveal that the optimal radius is model-parameter-dependent, vertical-level-dependent and nearly flow-dependent with weather scenarios in a realistic model; for example, the radius of model parameter zonal wind is generally larger than that of temperature. The adaptivity of the localization scheme is also illustrated in the ensemble framework and shows that the adaptive scheme has a positive effect on the assimilated analysis as the well-tuned localization. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
38. An ensemble Kalman filter implementation based on the Ledoit and Wolf covariance matrix estimator.
- Author
-
Nino-Ruiz, Elias D., Guzman, Luis, and Jabba, Daladier
- Subjects
- *
COVARIANCE matrices , *KALMAN filtering , *GENERAL circulation model , *ATMOSPHERIC circulation , *RANDOM matrices - Abstract
In this paper, we propose an efficient and practical implementation of the ensemble Kalman filter (EnKF) via the distribution-free Ledoit and Wolf (LW) covariance matrix estimator. Initially, we develop a tractable implementation of the LW estimator in high-dimensional probability spaces such as those found in the context of operational data assimilation. We employ this well-conditioned, full-rank covariance matrix estimator to approximate background error covariances and to mitigate the impact of spurious correlations during assimilation steps. The proposed formulation can be coupled within the EnKF framework to derive a matrix-free implementation via an iterative Woodbury formula. Experimental tests are performed by using an Atmospheric General Circulation Model. The numerical results are compared with those of the EnKF based on the Rao–Blackwell Ledoit, and Wolf covariance matrix estimator (EnKF-RBLW) wherein Gaussian assumptions are a must on prior members. The outcomes reveal that the use of the proposed filter can mitigate the impact of spurious correlations during assimilation stages, and even more, the proposed method can improve on the results of the EnKF-RBLW as a consequence of the Gaussian relaxation on prior errors. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
39. SEaCorAl: Identifying and contrasting the regulation-correlation bias in RNA-Seq paired expression data of patient groups.
- Author
-
Petti M, Verrienti A, Paci P, and Farina L
- Subjects
- Algorithms, Humans, RNA-Seq, Sequence Analysis, RNA, Transcriptome, Gene Expression Profiling, Genome
- Abstract
The Cancer Genome Atlas database offers the possibility of analyzing genome-wide expression RNA-Seq cancer data using paired counts, that is, studies where expression data are collected in pairs of normal and cancer cells, by taking samples from the same individual. Correlation of gene expression profiles is the most common analysis to study co-expression groups, which is used to find biological interpretation of -omics big data. The aim of the paper is threefold: firstly we show for the first time, the presence of a "regulation-correlation bias" in RNA-Seq paired expression data, that is an artifactual link between the expression status (up- or down-regulation) of a gene pair and the sign of the corresponding correlation coefficient. Secondly, we provide a statistical model able to theoretically explain the reasons for the presence of such a bias. Thirdly, we present a bias-removal algorithm, called SEaCorAl, able to effectively reduce bias effects and improve the biological significance of correlation analysis. Validation of the SEaCorAl algorithm is performed by showing a significant increase in the ability to detect biologically meaningful associations of positive correlations and a significant increase of the modularity of the resulting unbiased correlation network., (Copyright © 2021 The Author(s). Published by Elsevier Ltd.. All rights reserved.)
- Published
- 2021
- Full Text
- View/download PDF
40. Improving volcanic ash forecasts with ensemble-based data assimilation
- Author
-
Fu, Guangliang, Lin, H.X., Heemink, A.W., and Delft University of Technology
- Subjects
high performance computing ,aircraft data ,Data assimilation ,volcanic ash forecast ,spurious correlations ,satellite data - Abstract
The 2010 Eyjafjallajökull volcano eruption had serious consequences to civil aviation. This has initiated a lot of research on volcanic ash forecasting in recent years. For forecasting the volcanic ash transport after eruption onset, a volcanic ash transport and diffusion model (VATDM) needs to be run with Eruption Source Parameters (ESPs) such as plume height and mass eruption rate as input, and with data assimilation techniques to continuously improve the forecast. Reliable and accurate ash measurements are crucial for providing successful ash clouds advices. In the firstphase of this research work, simulated aircraft-based volcanic ash measurements, will be assimilated into a transport model to identify the potential benefit of this kind of observations in an assimilation system. The results show that assimilating aircraft-based measurements can improve the state of ash clouds, and can provide an improved forecast. We also show that for an advice on the aeroplane flying level, aircraft-based measurements should preferably be taken at this level. Furthermoreit is shown that in order to make an acceptable advice for aviation decision makers, accurate knowledge about uncertainties of ESPs and measurements is of great importance.The forecast accuracy of distal volcanic ash clouds is important for providing valid aviation advice during volcanic ash eruptions. However, because the distal part of a volcanic ash plume is far from the volcano, the influence of eruption information on this part becomes rather indirect and uncertain, resulting in inaccurate volcanic ash forecasts in these distal areas. In this thesis, we use real-life aircraft in situ observations, measured in the North-West part of Germany during the 2010 Eyjafjallajökull eruption, in an ensemble-based data assimilation system to investigate the potential improvement on the forecast accuracy with regard to the distal volcanic ash plume. We show that the error of the analyzed volcanic ash state can be significantly reduced by assimilating real-life in situ measurements. After assimilation, it is shown that the model-based aviation advice for Germany, the Netherlands and Luxembourg can be improved. We suggest that with suitable aircrafts measuring once per day across the distal volcanic ash plume, the description and prediction of volcanic ash clouds in these areas can be improved significantly.Among the data assimilation approaches, the ensemble Kalman filter (EnKF) is a well-known and popular method. A proper covariance localization strategy in the analysis step of EnKF is essential for reducing spurious covariances caused by the finite ensemble size, as shown for this application for assimilation of aircraft in situ measurements. After analyzing the characteristics of the physical forecast error covariances, we present a two-way tracking approach to define the localization matrixfor covariance localization. The result shows that the Two-way-tracking Localized EnKF (TL-EnKF) effectively maintains the correctly specified physical covariances and largely reduces the spurious ones. The computational cost of TL-EnKF is also evaluated and is shown to be advantageous for both serial and parallel implementations. Compared to the commonly used distance-based covariance localization, the two-way tracking approach is shown to be more suitable. In addition, the covariance inflation approach is verified as an additional improvement to TL-EnKF to achieve more accurate results.A timely prediction requires that the computations of the data assimilation system can be performed quickly (at least than the Wall-clock). We therefore investigate strategies for accelerating the data assimilation algorithm. Based on evaluations of the computational time, the analysis step of the assimilation turns out to be the most expensive part. After a study on the characteristics of the ensemble ash state, we propose a mask-state algorithm which records the sparsity information of the full ensemble state matrix and transforms the full matrix into a relatively small one. This will reduce the computational cost in the analysis step. Experimental results show the mask-state algorithm significantly speeds up the analysis step. Subsequently, the total amount of computing time for volcanic ash data assimilation is reduced to an acceptable level. The mask-state algorithm is generic and thus can be embedded in any ensemble-based data assimilation framework. Moreover, ensemble-based data assimilation with the mask-state algorithm is promising and flexible, because it implements exactly the standard data assimilation without any approximation and it realizes the satisfying performance without any change of the full model.Infrared satellite measurements of volcanic ash mass loadings are often used asinput observations for the assimilation scheme. However, these satellite-retrieveddata are often two-dimensional (2D), and cannot easily be combined with a three-dimensional (3D) volcanic ash model to improve the volcanic ash state. By integrating available data including ash mass loadings, cloud top heights and thickness information, we propose a satellite observational operator (SOO) that translates satellite-retrieved 2D volcanic ash mass loadings to 3D concentrations at the top layer of the ash cloud. Ensemble-based data assimilation is used to assimilate the extracted measurements of ash concentrations. The results show that satellite data assimilation can force the volcanic ash state to match the satellite observations, and that it improves the forecast of the ash state. Comparison with highly accurate aircraft in situ measurements shows that the effective duration of the improved volcanic ash forecasts is about a half day.
- Published
- 2017
- Full Text
- View/download PDF
41. Improving volcanic ash forecasts with ensemble-based data assimilation
- Author
-
Fu, Guangliang (author) and Fu, Guangliang (author)
- Abstract
The 2010 Eyjafjallajökull volcano eruption had serious consequences to civil aviation. This has initiated a lot of research on volcanic ash forecasting in recent years. For forecasting the volcanic ash transport after eruption onset, a volcanic ash transport and diffusion model (VATDM) needs to be run with Eruption Source Parameters (ESPs) such as plume height and mass eruption rate as input, and with data assimilation techniques to continuously improve the forecast. Reliable and accurate ash measurements are crucial for providing successful ash clouds advices. In the first phase of this research work, simulated aircraft-based volcanic ash measurements, will be assimilated into a transport model to identify the potential benefit of this kind of observations in an assimilation system. The results show that assimilating aircraft-based measurements can improve the state of ash clouds, and can provide an improved forecast. We also show that for an advice on the aeroplane flying level, aircraft-based measurements should preferably be taken at this level. Furthermore it is shown that in order to make an acceptable advice for aviation decision makers, accurate knowledge about uncertainties of ESPs and measurements is of great importance. The forecast accuracy of distal volcanic ash clouds is important for providing valid aviation advice during volcanic ash eruptions. However, because the distal part of a volcanic ash plume is far from the volcano, the influence of eruption information on this part becomes rather indirect and uncertain, resulting in inaccurate volcanic ash forecasts in these distal areas. In this thesis, we use real-life aircraft in situ observations, measured in the North-West part of Germany during the 2010 Eyjafjallajökull eruption, in an ensemble-based data assimilation system to investigate the potential improvement on the forecast accuracy with regard to the distal volcanic ash plume. We show that the error of the analyzed volcanic as, Mathematical Physics
- Published
- 2017
42. Does mathematical coupling matter to the acute to chronic workload ratio? A case study from elite sport
- Author
-
Coyne, Joseph O. C., Nimphius, Sophia, Newton, Robert U., Haff, G. Gregory, Coyne, Joseph O. C., Nimphius, Sophia, Newton, Robert U., and Haff, G. Gregory
- Abstract
Coyne, J. O., Nimphius, S., Newton, R. U., & Haff, G. G. (2019). Does mathematical coupling matter to the acute to chronic workload ratio? A case study from elite sport. International Journal of Sports Physiology and Performance, 14(10), 1447-1454. Available here
43. Costs of plasticity in foraging characteristics of the clonal plant ranunculus reptans
- Author
-
Mark van Kleunen, Bernhard Schmid, and Markus Fischer
- Subjects
0106 biological sciences ,Genotype ,Agrostis stolonifera ,Range (biology) ,media_common.quotation_subject ,Foraging ,Biology ,Plasticity ,010603 evolutionary biology ,01 natural sciences ,Competition (biology) ,Evolution, Molecular ,foraging ,Magnoliopsida ,03 medical and health sciences ,ddc:570 ,Botany ,Genetics ,developmental stability ,Ecology, Evolution, Behavior and Systematics ,030304 developmental biology ,Plant stem ,media_common ,2. Zero hunger ,0303 health sciences ,Stolon ,Ranunculus reptans ,15. Life on land ,biology.organism_classification ,Adaptation, Physiological ,Clonal growth ,Clone Cells ,Horticulture ,genetic variation ,Trait ,spurious correlations ,General Agricultural and Biological Sciences ,costs of plasticity - Abstract
In clonal plants, evolution of plastic foraging by increased lengths of leaves and internodes under unfavourable conditions may be constrained by costs and limits of plasticity. We studied costs and limits of plasticity in foraging characteristics in 102 genotypes of the stoloniferous herb Ranunculus reptans. We grew three replicates of each genotype with and three without competition by the naturally co-occuring grass Agrostis stolonifera. We used regression and correlation analyses to investigate potential costs of plasticity in lengths of leaves and stolon internodes, developmental instability costs of these traits, and a developmental range limit of these traits. We used randomization procedures to control for spurious correlations between parameters calculated from the same data. Under competition the number of rosettes, rooted rosettes, and flowers was 58%, 40%, and 61% lower, respectively, than in the absence of competition. Under competition lengths of leaves and stolon internodes were 14% and 6% smaller, respectively, than in the absence of competition. We detected significant costs of plasticity in stolon internode length in the presence of competition when fitness was measured in terms of the number of rosettes and the number of flowers (selection gradients against plasticity were 0.250 and 0.214, respectively). Within-environment variation (SD) in both foraging traits was not positively correlated with the corresponding plasticity, which indicates that there were no developmental instability costs. More plastic genotypes did not have less extreme trait values than less plastic genotypes for both foraging traits, which indicates that there was no developmental range limit. We conclude that in R. reptans costs of plasticity more strongly constrain evolution of foraging in the horizontal plane (i.e., stolon internode length) than in the vertical plane (i.e., leaf length).
44. Costs of Plasticity in Foraging Characteristics of the Clonal Plant Ranunculus reptans
- Author
-
van Kleunen, Mark, Fischer, Markus, and Schmid, Bernhard
- Published
- 2000
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.