1. Seminar: Scalable Preprocessing Tools for Exposomic Data Analysis
- Author
-
Grady, Stephen K., Dojcsak, Levente, Harville, Emily W., Wallace, Maeve E., Vilda, Dovile, Donneyong, Macarius M., Hood, Darryl B., Valdez, R. Burciaga, Ramesh, Aramandla, Im, Wansoo, Matthews-Juarez, Patricia, Juarez, Paul D., and Langston, Michael A.
- Subjects
Environmental health -- Research ,Environment -- Research ,Public health -- Research ,Data mining -- Methods ,Data warehousing/data mining ,Environmental issues ,Health - Abstract
BACKGROUND: The exposome serves as a popular framework in which to study exposures from chemical and nonchemical stressors across the life course and the differing roles that these exposures can play in human health. As a result, data relevant to the exposome have been used as a resource in the quest to untangle complicated health trajectories and help connect the dots from exposures to adverse outcome pathways. OBJECTIVES: The primary aim of this methods seminar is to clarify and review preprocessing techniques critical for accurate and effective external exposomic data analysis. Scalability is emphasized through an application of highly innovative combinatorial techniques coupled with more traditional statistical strategies. The Public Health Exposome is used as an archetypical model. The novelty and innovation of this seminar's focus stem from its methodical, comprehensive treatment of preprocessing and its demonstration of the positive effects preprocessing can have on downstream analytics. DISCUSSION: State-of-the-art technologies are described for data harmonization and to mitigate noise, which can stymie downstream interpretation, and to select key exposomic features, without which analytics may lose focus. A main task is the reduction of multicollinearity, a particularly formidable problem that frequently arises from repeated measurements of similar events taken at various times and from multiple sources. Empirical results highlight the effectiveness of a carefully planned preprocessing workflow as demonstrated in the context of more highly concentrated variable lists, improved correlational distributions, and enhanced downstream analytics for latent relationship discovery. The nascent field of exposome science can be characterized by the need to analyze and interpret a complex confluence of highly inhomogeneous spatial and temporal data, which may present formidable challenges to even the most powerful analytical tools. A systematic approach to preprocessing can therefore provide an essential first step in the application of modern computer and data science methods. https://doi.org/10.1289/EHP12901, Introduction The exposome is generally viewed as a measure of all the environmental exposures an individual has had over a lifetime. The systematic stratagem of capturing these exposures under a [...]
- Published
- 2023
- Full Text
- View/download PDF