Introduction: Emission sources of volatile organic compounds (VOCs*) are numerous and widespread in both indoor and outdoor environments. Concentrations of VOCs indoors typically exceed outdoor levels, and most people spend nearly 90% of their time indoors. Thus, indoor sources generally contribute the majority of VOC exposures for most people. VOC exposure has been associated with a wide range of acute and chronic health effects; for example, asthma, respiratory diseases, liver and kidney dysfunction, neurologic impairment, and cancer. Although exposures to most VOCs for most persons fall below health-based guidelines, and long-term trends show decreases in ambient emissions and concentrations, a subset of individuals experience much higher exposures that exceed guidelines. Thus, exposure to VOCs remains an important environmental health concern. The present understanding of VOC exposures is incomplete. With the exception of a few compounds, concentration and especially exposure data are limited; and like other environmental data, VOC exposure data can show multiple modes, low and high extreme values, and sometimes a large portion of data below method detection limits (MDLs). Field data also show considerable spatial or interpersonal variability, and although evidence is limited, temporal variability seems high. These characteristics can complicate modeling and other analyses aimed at risk assessment, policy actions, and exposure management. In addition to these analytic and statistical issues, exposure typically occurs as a mixture, and mixture components may interact or jointly contribute to adverse effects. However most pollutant regulations, guidelines, and studies remain focused on single compounds, and thus may underestimate cumulative exposures and risks arising from coexposures. In addition, the composition of VOC mixtures has not been thoroughly investigated, and mixture components show varying and complex dependencies. Finally, although many factors are known to affect VOC exposures, many personal, environmental, and socioeconomic determinants remain to be identified, and the significance and applicability of the determinants reported in the literature are uncertain. To help answer these unresolved questions and overcome limitations of previous analyses, this project used several novel and powerful statistical modeling and analysis techniques and two large data sets. The overall objectives of this project were (1) to identify and characterize exposure distributions (including extreme values), (2) evaluate mixtures (including dependencies), and (3) identify determinants of VOC exposure. METHODS VOC data were drawn from two large data sets: the Relationships of Indoor, Outdoor, and Personal Air (RIOPA) study (1999-2001) and the National Health and Nutrition Examination Survey (NHANES; 1999-2000). The RIOPA study used a convenience sample to collect outdoor, indoor, and personal exposure measurements in three cities (Elizabeth, NJ; Houston, TX; Los Angeles, CA). In each city, approximately 100 households with adults and children who did not smoke were sampled twice for 18 VOCs. In addition, information about 500 variables associated with exposure was collected. The NHANES used a nationally representative sample and included personal VOC measurements for 851 participants. NHANES sampled 10 VOCs in common with RIOPA. Both studies used similar sampling methods and study periods. Specific Aim 1. To estimate and model extreme value exposures, extreme value distribution models were fitted to the top 10% and 5% of VOC exposures. Health risks were estimated for individual VOCs and for three VOC mixtures. Simulated extreme value data sets, generated for each VOC and for fitted extreme value and lognormal distributions, were compared with measured concentrations (RIOPA observations) to evaluate each model's goodness of fit. Mixture distributions were fitted with the conventional finite mixture of normal distributions and the semi-parametric Dirichlet process mixture (DPM) of normal distributions for three individual VOCs (chloroform, 1,4-DCB, and styrene). Goodness of fit for these full distribution models was also evaluated using simulated data. Specific Aim 2. Mixtures in the RIOPA VOC data set were identified using positive matrix factorization (PMF) and by toxicologic mode of action. Dependency structures of a mixture's components were examined using mixture fractions and were modeled using copulas, which address correlations of multiple components across their entire distributions. Five candidate copulas (Gaussian, t, Gumbel, Clayton, and Frank) were evaluated, and the performance of fitted models was evaluated using simulation and mixture fractions. Cumulative cancer risks were calculated for mixtures, and results from copulas and multivariate lognormal models were compared with risks based on RIOPA observations. Specific Aim 3. Exposure determinants were identified using stepwise regressions and linear mixed-effects models (LMMs)., Results: Specific Aim 1. Extreme value exposures in RIOPA typically were best fitted by three-parameter generalized extreme value (GEV) distributions, and sometimes by the two-parameter Gumbel distribution. In contrast, lognormal distributions significantly underestimated both the level and likelihood of extreme values. Among the VOCs measured in RIOPA, 1,4-dichlorobenzene (1,4-DCB) was associated with the greatest cancer risks; for example, for the highest 10% of measurements of 1,4-DCB, all individuals had risk levels above 10(-4), and 13% of all participants had risk levels above 10(-2). Of the full-distribution models, the finite mixture of normal distributions with two to four clusters and the DPM of normal distributions had superior performance in comparison with the lognormal models. DPM distributions provided slightly better fit than the finite mixture distributions; the advantages of the DPM model were avoiding certain convergence issues associated with the finite mixture distributions, adaptively selecting the number of needed clusters, and providing uncertainty estimates. Although the results apply to the RIOPA data set, GEV distributions and mixture models appear more broadly applicable. These models can be used to simulate VOC distributions, which are neither normally nor lognormally distributed, and they accurately represent the highest exposures, which may have the greatest health significance. Specific Aim 2. Four VOC mixtures were identified and apportioned by PMF; they represented gasoline vapor, vehicle exhaust, chlorinated solvents and disinfection byproducts, and cleaning products and odorants. The last mixture (cleaning products and odorants) accounted for the largest fraction of an individual's total exposure (average of 42% across RIOPA participants). Often, a single compound dominated a mixture but the mixture fractions were heterogeneous; that is, the fractions of the compounds changed with the concentration of the mixture. Three VOC mixtures were identified by toxicologic mode of action and represented VOCs associated with hematopoietic, liver, and renal tumors. Estimated lifetime cumulative cancer risks exceeded 10(-3) for about 10% of RIOPA participants. The dependency structures of the VOC mixtures in the RIOPA data set fitted Gumbel (two mixtures) and t copulas (four mixtures). These copula types emphasize dependencies found in the upper and lower tails of a distribution. The copulas reproduced both risk predictions and exposure fractions with a high degree of accuracy and performed better than multivariate lognormal distributions. Specific Aim 3. In an analysis focused on the home environment and the outdoor (close to home) environment, home VOC concentrations dominated personal exposures (66% to 78% of the total exposure, depending on VOC); this was largely the result of the amount of time participants spent at home and the fact that indoor concentrations were much higher than outdoor concentrations for most VOCs. In a different analysis focused on the sources inside the home and outside (but close to the home), it was assumed that 100% of VOCs from outside sources would penetrate the home. Outdoor VOC sources accounted for 5% (d-limonene) to 81% (carbon tetrachloride [CTC]) of the total exposure. Personal exposure and indoor measurements had similar determinants depending on the VOC. Gasoline-related VOCs (e.g., benzene and methyl tert-butyl ether [MTBE]) were associated with city, residences with attached garages, pumping gas, wind speed, and home air exchange rate (AER). Odorant and cleaning-related VOCs (e.g., 1,4-DCB and chloroform) also were associated with city, and a residence's AER, size, and family members showering. Dry-cleaning and industry-related VOCs (e.g., tetrachloroethylene [or perchloroethylene, PERC] and trichloroethylene [TCE]) were associated with city, type of water supply to the home, and visits to the dry cleaner. These and other relationships were significant, they explained from 10% to 40% of the variance in the measurements, and are consistent with known emission sources and those reported in the literature. Outdoor concentrations of VOCs had only two determinants in common: city and wind speed. Overall, personal exposure was dominated by the home setting, although a large fraction of indoor VOC concentrations were due to outdoor sources. City of residence, personal activities, household characteristics, and meteorology were significant determinants. Concentrations in RIOPA were considerably lower than levels in the nationally representative NHANES for all VOCs except MTBE and 1,4-DCB. Differences between RIOPA and NHANES results can be explained by contrasts between the sampling designs and staging in the two studies, and by differences in the demographics, smoking, employment, occupations, and home locations. (ABSTRACT TRUNCATED)