95 results on '"Clifford H. Spiegelman"'
Search Results
2. Assessing machine leaning algorithms on crop yield forecasts using functional covariates derived from remotely sensed data.
- Author
-
Luca Sartore, Arthur Rosales, David M. Johnson 0002, and Clifford H. Spiegelman
- Published
- 2022
- Full Text
- View/download PDF
3. Bayesian Spatial Multivariate Receptor Modeling for Multisite Multipollutant Data.
- Author
-
EunSug Park, Philip K. Hopke, Inyoung Kim, Shuman Tan, and Clifford H. Spiegelman
- Published
- 2018
- Full Text
- View/download PDF
4. Semiparametric Classification of Forest Graphical Models.
- Author
-
Mary Frances Dorn, Amit Moscovich, Boaz Nadler, and Clifford H. Spiegelman
- Published
- 2018
5. Modeling swine population dynamics at a finer temporal resolution
- Author
-
Yijun Wei, Emilola Abayomi, Luca Sartore, Gavin Richard Corral, Seth Riggins, Valbona Bejleri, and Clifford H. Spiegelman
- Subjects
Estimation ,education.field_of_study ,media_common.quotation_subject ,Commodity ,Population ,Management Science and Operations Research ,Quarter (United States coin) ,General Business, Management and Accounting ,Agricultural statistics ,Geography ,Modeling and Simulation ,Temporal resolution ,Service (economics) ,Econometrics ,education ,Transaction data ,media_common - Abstract
The United States Department of Agriculture's National Agricultural Statistics Service (NASS) uses probability surveys of hog owners to estimate quarterly hog inventories in the United States at the national and state levels. NASS also receives data from external sources. A panel of commodity experts forms the Agricultural Statistics Board (ASB). The ASB establishes the NASS official estimates for each quarter by taking into account survey estimates and other relevant sources of information that are available in numerical and non‐numerical form. The aim of this article is to propose an estimation method of hog inventories by combining the NASS proprietary survey results, the hog transaction data, the past ASB panel expert analyses, biological dynamics, and the inter‐inventory relationship constraints. This approach downscales the official estimates to provide monthly estimates according to well‐defined biological growth patterns. The model developed in this study provides national estimates that may inform the quarterly reports.
- Published
- 2020
- Full Text
- View/download PDF
6. A Markov Chain Monte Carlo-Based Origin Destination Matrix Estimator that is Robust to Imperfect Intelligent Transportation Systems Data.
- Author
-
EunSug Park, Laurence R. Rilett, and Clifford H. Spiegelman
- Published
- 2008
- Full Text
- View/download PDF
7. A Univariate Inequality for Medians
- Author
-
Clifford H. Spiegelman
- Subjects
Combinatorics ,Median ,Inequality ,media_common.quotation_subject ,General Engineering ,Regular polygon ,Univariate ,Astrophysics::Cosmology and Extragalactic Astrophysics ,Majorization ,Mathematics ,media_common ,Physics and Chemistry - Abstract
An inequality is provided for medians which is an analog of a theorem due to Karamata, dealing with majorization.
- Published
- 2021
8. The Organizers' Goals
- Author
-
Jr. R.L. Watters, J. Sacks, and Clifford H. Spiegelman
- Subjects
General Engineering ,Physics and Chemistry - Published
- 2021
9. Make Research Data Public?---Not Always so Simple: A Dialogue for Statisticians and Science Editors
- Author
-
Nell Sedransk, Lawrence H. Cox, Deborah Nolan, Keith Soper, Clifford H. Spiegelman, Linda J. Young, Katrina L. Kelner, Robert A. Moffitt, Ani Thakar, M. Jordan Raddick, Edward J. Ungvarsky, Richard W. Carlson, and Rolf Apweiler
- Published
- 2010
10. Developing Integer Calibration Weights for Census of Agriculture
- Author
-
Luca Sartore, Linda Young, Kelly Toppin, and Clifford H. Spiegelman
- Subjects
0106 biological sciences ,Statistics and Probability ,Single process ,Calibration (statistics) ,Applied Mathematics ,Rounding ,Census ,010603 evolutionary biology ,01 natural sciences ,Agricultural and Biological Sciences (miscellaneous) ,010104 statistics & probability ,Integer ,Discrete optimization ,Statistics ,0101 mathematics ,Statistics, Probability and Uncertainty ,General Agricultural and Biological Sciences ,Coordinate descent ,General Environmental Science ,Mathematics - Abstract
When conducting a national survey or census, administrative data may be available that can provide reliable values for some of the variables. Survey and census estimates should be consistent with reliable administrative data. Calibration can be used to improve the estimates by further adjusting the survey weights so that estimates of targeted variables honor bounds obtained from administrative data. The commonly used methods of calibration produce non-integer weights. For the Census of Agriculture, estimates of farms are provided as integers so as to insure consistent estimates at all aggregation levels; thus, the calibrated weights are rounded to integers. The calibration and rounding procedure used for the 2012 Census of Agricultural produced final weights that were substantially different from the survey weights that had been adjusted for under-coverage, non-response, and misclassification. A new method that calibrates and rounds as a single process is provided. The new method produces integer, calibrated weights that tend to be consistent with more calibration targets and are more correlated with the modeled census weights. In addition, the new method is more computationally efficient. Supplementary materials accompanying this paper appear online.
- Published
- 2018
- Full Text
- View/download PDF
11. Bayesian Spatial Multivariate Receptor Modeling for Multisite Multipollutant Data
- Author
-
Philip K. Hopke, Eun Sug Park, Inyoung Kim, Shuman Tan, and Clifford H. Spiegelman
- Subjects
Statistics and Probability ,Multivariate statistics ,010504 meteorology & atmospheric sciences ,Applied Mathematics ,Bayesian probability ,Air pollution ,medicine.disease_cause ,computer.software_genre ,01 natural sciences ,010104 statistics & probability ,Air pollutants ,Modeling and Simulation ,medicine ,Environmental science ,Data mining ,0101 mathematics ,computer ,0105 earth and related environmental sciences - Abstract
For the development of effective air pollution control strategies, it is crucial to identify the sources that are the principal contributors to air pollution and estimate how much each source contributes. Multivariate receptor modeling aims to address these problems by decomposing ambient concentrations of multiple air pollutants into components associated with different source types. With the expanded monitoring efforts that have been established over the past several decades, extensive multivariate air pollution data obtained from multiple monitoring sites (multi-site multi-pollutant data) are now available. Although considerable research has been conducted on modeling multivariate space-time data in other contexts, there has been little research on spatial multivariate receptor models for multi-site, multi-pollutant data. We present a Bayesian spatial multivairate receptor modeling (BSMRM) approach that can incorporate spatial correlations in multi-site, multi-pollutant data into the estimation...
- Published
- 2018
- Full Text
- View/download PDF
12. Mineral preservatives in the wood of Stradivari and Guarneri.
- Author
-
Joseph Nagyvary, Renald N Guillemette, and Clifford H Spiegelman
- Subjects
Medicine ,Science - Abstract
Following the futile efforts of generations to reach the high standard of excellence achieved by the luthiers in Cremona, Italy, by variations of design and plate tuning, current interest is being focused on differences in material properties. The long-standing question whether the wood of Stradivari and Guarneri were treated with wood preservative materials could be answered only by the examination of wood specimens from the precious antique instruments. In a recent communication (Nature, 2006), we reported about the degradation of the wood polymers in instruments of Stradivari and Guarneri, which could be explained only by chemical manipulations, possibly by preservatives. The aim of the current work was to identify the minerals from the small samples of the maple wood which were available to us from the antique instruments. The ashes of wood from one violin and one cello by Stradivari, two violins by Guarneri, one viola by H. Jay, one violin by Gand-Bernardel were analyzed and compared with a variety of commercial tone woods. The methods of analysis were the following: back-scattered electron imaging, X-ray fluorescence maps for individual elements, wave-length dispersive spectroscopy, energy dispersive X-ray spectroscopy and quantitative microprobe analysis. All four Cremonese instruments showed the unmistakable signs of chemical treatments in the form of chemicals which are not present in natural woods, such as BaSO4, CaF2, borate, and ZrSiO4. In addition to these, there were also changes in the common wood minerals. Statistical evaluation of 12 minerals by discriminant analysis revealed: a. a difference among all four Cremona instruments, b. the difference of the Cremonese instruments from the French and English antiques, and c. only the Cremonese instruments differed from all commercial woods. These findings may provide the answer why all attempts to recreate the Stradivarius from natural wood have failed. There are many obvious implications with regard to how the green tone wood should be treated, which chould lead to changes in the practice of violin-making. This research should inspire others to analyze more antique violins for their chemical contents.
- Published
- 2009
- Full Text
- View/download PDF
13. Assessment of mobile source contributions in El Paso by PMF receptor modeling coupled with wind direction analysis
- Author
-
Eun Sug Park, David W. Sullivan, Qi Ying, Clifford H. Spiegelman, and Dong Hun Kang
- Subjects
Pollutant ,Transportation planning ,Multivariate statistics ,Environmental Engineering ,010504 meteorology & atmospheric sciences ,Meteorology ,Air pollution ,010501 environmental sciences ,Wind direction ,medicine.disease_cause ,01 natural sciences ,Pollution ,Ambient air ,Data quality ,medicine ,Environmental Chemistry ,Environmental science ,Waste Management and Disposal ,Air quality index ,0105 earth and related environmental sciences - Abstract
It is well-known that El Paso is the only border area in Texas that has violated national air quality standards. Mobile source emissions (including vehicle exhaust) contribute significantly to air pollution, along with other sources including industrial, residential, and cross-border. This study aims at separating unobserved vehicle emissions from air-pollution mixtures indicated by ambient air quality data. The level of contributions from vehicle emissions to air pollution cannot be determined by simply comparing ambient air quality data with traffic levels because of the various other contributors to overall air pollution. To estimate contributions from vehicle emissions, researchers employed advanced multivariate receptor modeling called positive matrix factorization (PMF) to analyze hydrocarbon data consisting of hourly concentrations measured from the Chamizal air pollution monitoring station in El Paso. The analysis of hydrocarbon data collected at the Chamizal site in 2008 showed that approximately 25% of measured Total Non-Methane Hydrocarbons (TNMHC) was apportioned to motor vehicle exhaust. Using wind direction analysis, researchers also showed that the motor vehicle exhaust contributions to hydrocarbons were significantly higher when winds blow from the south (Mexico) than those when winds blow from other directions. The results from this research can be used to improve understanding source apportionment of pollutants measured in El Paso and can also potentially inform transportation planning strategies aimed at reducing emissions across the region.
- Published
- 2019
14. Absence of Statistical and Scientific Ethos: The Common Denominator in Deficient Forensic Practices
- Author
-
Clifford H. Spiegelman, William A. Tobin, and H. David Sheets
- Subjects
Statistics and Probability ,Engineering ,Public Administration ,media_common.quotation_subject ,Ignorance ,Criminology ,01 natural sciences ,Validity ,Ethos ,03 medical and health sciences ,0302 clinical medicine ,030216 legal & forensic medicine ,media_common ,business.industry ,Applied Mathematics ,010401 analytical chemistry ,Common denominator ,Demise ,Testimonial ,Reproducibility ,lcsh:Political institutions and public administration (General) ,0104 chemical sciences ,Arson ,Forensic science ,Hypotheses tests ,Criticism ,lcsh:JF20-2112 ,lcsh:Probabilities. Mathematical statistics ,Statistics, Probability and Uncertainty ,lcsh:QA273-280 ,business ,Social psychology ,Design of experiments - Abstract
Comparative Bullet Lead Analysis (CBLA) was discredited as a forensic discipline largely due to the absence of cross-discipline input, primarily metallurgical and statistical, during development and forensic/judicial application of the practice. Of particular significance to the eventual demise of CBLA practice was ignorance of the role of statistics in assessing probative value of claimed bullet “matches” at both the production and retail distribution levels, leading to overstated testimonial claims by expert witnesses. Bitemark comparisons have come under substantial criticism in the last few years, both due to exonerations based on DNA evidence and to research efforts questioning the claimed uniqueness of bitemarks. The fields of fire and arson investigation and of firearm and toolmark comparison are similar to CBLA and bitemarks in the absence of effective statistical support for these practices. The features of the first two disciplines are examined in systemic detail to enhance understanding as to why they became discredited forensic practices, and to identify aspects of the second two disciplines that pose significant concern to critics.
- Published
- 2017
- Full Text
- View/download PDF
15. Forensic bitemark identification: weak foundations, exaggerated claims
- Author
-
Paul C. Giannelli, C. Michael Bowers, Brandon L. Garrett, Nizam Peerwani, M. Bonner Denton, Sandy L. Zabell, D. Michael Risinger, Shari Seidman Diamond, Karen Kafadar, Thomas L. Bohan, David L. Faigman, Alan B. Morrison, Rachel Dioso-Villa, George Sensabaugh, Peter J. Bush, Mary A. Bush, David Korn, William C. Thompson, Jonathan J. Koehler, Jerome P. Kassirer, Lisa Faigman, Joseph L. Peterson, Jennifer L. Mnookin, Barbara E. Bierer, Ross E. Zumwalt, Clifford H. Spiegelman, Thomas D. Albright, Simon A. Cole, Arturo Casadevall, Erin Murphy, Jules Epstein, Hal S. Stern, Edward J. Imwinkelried, Stephen E. Fienberg, Michael J. Saks, James L. Wayman, Allan Jamieson, and Henry T. Greely
- Subjects
Scrutiny ,expert evidence ,forensic science ,Medicine (miscellaneous) ,Commission ,Criminology ,Biochemistry, Genetics and Molecular Biology (miscellaneous) ,03 medical and health sciences ,0302 clinical medicine ,Empirical research ,Clinical Research ,Medicine ,030216 legal & forensic medicine ,admissibility ,business.industry ,Foundation (evidence) ,030206 dentistry ,16. Peace & justice ,Applied ethics ,Supreme court ,bite mark ,Forensic identification ,Identification (biology) ,Original Article ,Applied Ethics ,business ,Law - Abstract
Several forensic sciences, especially of the pattern-matching kind, are increasingly seen to lack the scientific foundation needed to justify continuing admission as trial evidence. Indeed, several have been abolished in the recent past. A likely next candidate for elimination is bitemark identification. A number of DNA exonerations have occurred in recent years for individuals convicted based on erroneous bitemark identifications. Intense scientific and legal scrutiny has resulted. An important National Academies review found little scientific support for the field. The Texas Forensic Science Commission recently recommended a moratorium on the admission of bitemark expert testimony. The California Supreme Court has a case before it that could start a national dismantling of forensic odontology. This article describes the (legal) basis for the rise of bitemark identification and the (scientific) basis for its impending fall. The article explains the general logic of forensic identification, the claims of bitemark identification, and reviews relevant empirical research on bitemark identification—highlighting both the lack of research and the lack of support provided by what research does exist. The rise and possible fall of bitemark identification evidence has broader implications—highlighting the weak scientific culture of forensic science and the law's difficulty in evaluating and responding to unreliable and unscientific evidence.
- Published
- 2017
- Full Text
- View/download PDF
16. Dependence among randomly acquired characteristics on shoeprints and their features
- Author
-
Micha Mandel, Yoram Yekutieli, Yaron Shor, Naomi Kaplan Damary, Clifford H. Spiegelman, and Sarena Wiesner
- Subjects
business.industry ,Orientation (computer vision) ,Computer science ,Probabilistic logic ,Pattern recognition ,01 natural sciences ,Pathology and Forensic Medicine ,Data set ,010104 statistics & probability ,03 medical and health sciences ,0302 clinical medicine ,Inheritance of acquired characteristics ,Independence test ,Crime scene ,030216 legal & forensic medicine ,Artificial intelligence ,0101 mathematics ,Suspect ,business ,Law - Abstract
Randomly acquired characteristics (RACs), also known as accidental marks, are random markings on a shoe sole, such as scratches or holes, that are used by forensic experts to compare a suspect's shoe with a print found at the crime scene. This article investigates the relationships among three features of a RAC: its location, shape type and orientation. If these features, as well as the RACs, are independent of each other, a simple probabilistic calculation could be used to evaluate the rarity of a RAC and hence the evidential value of the shoe and print comparison, whereas a correlation among the features would complicate the analysis. Using a data set of about 380 shoes, it is found that RACs and their features are not independent, and moreover, are not independent of the shoe sole pattern. It is argued that some of the dependencies found are caused by the elements of the sole. The results have important implications for the way forensic experts should evaluate the degree of rarity of a combination of RACs.
- Published
- 2017
17. Transportation Statistics and Microsimulation
- Author
-
Laurence R. Rilett, Clifford H. Spiegelman, and Eun Sug Park
- Subjects
Sampling distribution ,Overdispersion ,Goodness of fit ,Statistics ,Econometrics ,Sample variance ,Regression analysis ,Simple linear regression ,Q–Q plot ,Categorical variable ,Mathematics - Abstract
Overview: The Role of Statistics in Transportation Engineering What Is Engineering? What Is Transportation Engineering? Goal of the Textbook Overview of the Textbook Who Is the Audience for This Textbook? Relax-Everything Is Fine Graphical Methods for Displaying Data Introduction Histogram Box and Whisker Plot Quantile Plot Scatter Plot Parallel Plot Time Series Plot Quality Control Plots Concluding Remarks Numerical Summary Measures Introduction Measures of Central Tendency Measures of Relative Standing Measures of Variability Measures of Association Concluding Remarks Probability and Random Variables Introduction Sample Spaces and Events Interpretation of Probability Random Variable Expectations of Random Variables Covariances and Correlation of Random Variables Computing Expected Values of Functions of Random Variables Conditional Probability Bayes' Theorem Concluding Remarks Common Probability Distributions Introduction Discrete Distributions Continuous Distributions Concluding Remarks Appendix: Table of the Most Popular Distributions in Transportation Engineering Sampling Distributions Introduction Random Sampling Sampling Distribution of a Sample Mean Sampling Distribution of a Sample Variance Sampling Distribution of a Sample Proportion Concluding Remarks Inferences: Hypothesis Testing and Interval Estimation Introduction Fundamentals of Hypothesis Testing Inferences on a Single Population Mean Inferences about Two Population Means Inferences about One Population Variance Inferences about Two Population Variances Concluding Remarks Appendix: Welch (1938) Degrees of Freedom for the Unequal Variance t-Test Other Inferential Procedures: ANOVA and Distribution-Free Tests Introduction Comparisons of More than Two Population Means Multiple Comparisons One- and Multiway ANOVA Assumptions for ANOVA Distribution-Free Tests Conclusions Inferences Concerning Categorical Data Introduction Tests and Confidence Intervals for a Single Proportion Tests and Confidence Intervals for Two Proportions Chi-Square Tests Concerning More Than Two Population Proportions The Chi-Square Goodness-of-Fit Test for Checking Distributional Assumptions Conclusions Linear Regression Introduction Simple Linear Regression Transformations Understanding and Calculating R2 Verifying the Main Assumptions in Linear Regression Comparing Two Regression Lines at a Point and Comparing Two Regression Parameters The Regression Discontinuity Design (RDD) Multiple Linear Regression Variable Selection for Regression Models Additional Collinearity Issues Concluding Remarks Regression Models for Count Data Introduction Poisson Regression Model Overdispersion Assessing Goodness of Fit of Poisson Regression Models Negative Binomial Regression Model Concluding Remarks Appendix: Maximum Likelihood Estimation Experimental Design Introduction Comparison of Direct Observation and Designed Experiments Motivation for Experimentation A Three-Factor, Two Levels per Factor Experiment Factorial Experiments Fractional Factorial Experiments Screening Designs D-Optimal and I-Optimal Designs Sample Size Determination Field and Quasi-Experiments Concluding Remarks Appendix: Choice Modeling of Experiments Cross-Validation, Jackknife, and Bootstrap Methods for Obtaining Standard Errors Introduction Methods for Standard Error Estimation When a Closed-Form Formula Is Not Available Cross-Validation The Jackknife Method for Obtaining Standard Errors Bootstrapping Concluding Remarks Bayesian Approaches to Transportation Data Analysis Introduction Fundamentals of Bayesian Statistics Bayesian Inference Concluding Remarks Microsimulation Introduction Overview of Traffic Microsimulation Models Analyzing Microsimulation Output Performance Measures Concluding Remarks Appendix: Soft Modeling and Nonparametric Model Building Homework Problems and References appear at the end of each chapter.
- Published
- 2016
- Full Text
- View/download PDF
18. Analysis of experiments in forensic firearms/toolmarks practice offered as support for low rates of practice error and claims of inferential certainty
- Author
-
Clifford H. Spiegelman and William A. Tobin
- Subjects
Forensic science ,Philosophy ,Law ,media_common.quotation_subject ,Statistics, Probability and Uncertainty ,Criminology ,Certainty ,Attribution ,Psychology ,media_common - Abstract
This article critically evaluates experiments used to justify inferences of specific source attribution (‘individualization’) to ‘100% certainty’ and ‘near-zero’ rates of error claimed by firearm toolmark examiners in court testimonies, and suggests approaches for establishing statistical foundations for firearm toolmarks practice that two recent National Academy of Science reports confirm do not currently exist. Issues that should be considered in the earliest stages of statistical foundational development for firearm toolmarks are discussed.
- Published
- 2012
- Full Text
- View/download PDF
19. A nonparametric approach based on a Markov like property for classification
- Author
-
Jeongyoun Ahn, Clifford H. Spiegelman, and Eun Sug Park
- Subjects
education.field_of_study ,Markov chain ,business.industry ,Process Chemistry and Technology ,Population ,Nonparametric statistics ,Pattern recognition ,Density estimation ,Linear discriminant analysis ,Computer Science Applications ,Analytical Chemistry ,ComputingMethodologies_PATTERNRECOGNITION ,Statistics::Methodology ,Markov property ,Artificial intelligence ,Marginal distribution ,education ,business ,Spectroscopy ,Software ,Mathematics ,Curse of dimensionality - Abstract
We suggest a new approach for classification based on nonparametricly estimated likelihoods. Due to the scarcity of data in high dimensions, full nonparametric estimation of the likelihood functions for each population is impractical. Instead, we propose to build a class of estimated nonparametric candidate likelihood models based on a Markov property and to provide multiple likelihood estimates that are useful for guiding a classification algorithm. Our density estimates require only estimates of one and two-dimensional marginal distributions, which can effectively get around the curse of dimensionality problem. A classification algorithm based on those estimated likelihoods is presented. A modification to it utilizing variable selection of differences in log of estimated marginal densities is also suggested to specifically handle high dimensional data.
- Published
- 2011
- Full Text
- View/download PDF
20. Population and Temperature Effects onLucilia sericata(Diptera: Calliphoridae) Body Size and Minimum Development Time
- Author
-
Aaron M. Tarone, David R. Foran, Christine J. Picard, and Clifford H. Spiegelman
- Subjects
Michigan ,Molecular Sequence Data ,Population ,Zoology ,Lucilia ,California ,Genetic variation ,Animals ,Body Size ,Calliphoridae ,Forensic entomology ,education ,Forensic Pathology ,education.field_of_study ,Base Sequence ,General Veterinary ,biology ,Ecology ,Diptera ,Strain (biology) ,fungi ,Pupa ,Temperature ,West Virginia ,biology.organism_classification ,Ecological genetics ,Infectious Diseases ,Insect Science ,Parasitology ,Entomology - Abstract
Understanding how ecological conditions influence physiological responses is fundamental to forensic entomology. When determining the minimum postmortem interval with blow fly evidence in forensic investigations, using a reliable and accurate model of development is integral. Many published studies vary in results, source populations, and experimental designs. Accordingly, disentangling genetic causes of developmental variation from environmental causes is difficult. This study determined the minimum time of development and pupal sizes of three populations of Lucilia sericata Meigen (Diptera: Calliphoridae; from California, Michigan, and West Virginia) at two temperatures (20 degrees C and 33.5 degrees C). Development times differed significantly between strain and temperature. In addition, California pupae were the largest and fastest developing at 20 degrees C, but at 33.5 degrees C, though they still maintained their rank in size among the three populations, they were the slowest to develop. These results indicate a need to account for genetic differences in development, and genetic variation in environmental responses, when estimating a postmortem interval with entomological data.
- Published
- 2011
- Full Text
- View/download PDF
21. Multi-site assessment of the precision and reproducibility of multiple reaction monitoring–based measurements of proteins in plasma
- Author
-
Helene L. Cardasis, Lei Zhao, Angela M. Jackson, Christopher R. Kinsinger, Richard K. Niles, Jason M. Held, Steven J. Skates, Asokan Mulayath Variyath, Susan E. Abbatiello, David L. Tabb, Daniel C. Liebler, Charles Buck, Tara Hiltke, Mu Wang, Paul A. Rudnick, Terri A. Addona, Jeffrey R. Whiteaker, Ronald K. Blackman, Amanda G. Paulovich, Derek Smith, D. R. Mani, Trenton C. Pulsipher, Lorenzo Vega-Montoto, Mehdi Mesri, Bradford W. Gibson, Asa Wahlander, Michael P. Cusack, David F. Ransohoff, Christoph H. Borchers, Eric B. Johansen, Susan J. Fisher, Simon Allen, Clifford H. Spiegelman, Henry Rodriguez, Steven A. Carr, David M. Bunk, Paul Tempst, Nathan G. Dodder, Fred E. Regnier, Sofia Waldemarson, Hasmik Keshishian, Thomas A. Neubert, N. Leigh Anderson, Steven C. Hall, Birgit Schilling, Jing Li, Tony J. Tegeler, Amy-Joan L. Ham, and Lisa J. Zimmerman
- Subjects
Technology Assessment, Biomedical ,Proteome ,Protein biomarkers ,Biomedical Engineering ,Bioengineering ,Computational biology ,Biology ,Bioinformatics ,Proteomics ,Sensitivity and Specificity ,Applied Microbiology and Biotechnology ,Mass Spectrometry ,Humans ,Biomarker discovery ,Detection limit ,Reproducibility ,Selected reaction monitoring ,Multi site ,Reproducibility of Results ,Blood Proteins ,Targeted mass spectrometry ,Linear Models ,Molecular Medicine ,Biomarkers ,Blood Chemical Analysis ,Biotechnology - Abstract
Verification of candidate biomarkers relies upon specific, quantitative assays optimized for selective detection of target proteins, and is increasingly viewed as a critical step in the discovery pipeline that bridges unbiased biomarker discovery to preclinical validation. Although individual laboratories have demonstrated that multiple reaction monitoring (MRM) coupled with isotope dilution mass spectrometry can quantify candidate protein biomarkers in plasma, reproducibility and transferability of these assays between laboratories have not been demonstrated. We describe a multilaboratory study to assess reproducibility, recovery, linear dynamic range and limits of detection and quantification of multiplexed, MRM-based assays, conducted by NCI-CPTAC. Using common materials and standardized protocols, we demonstrate that these assays can be highly reproducible within and across laboratories and instrument platforms, and are sensitive to low mug/ml protein concentrations in unfractionated plasma. We provide data and benchmarks against which individual laboratories can compare their performance and evaluate new technologies for biomarker verification in plasma.
- Published
- 2009
- Full Text
- View/download PDF
22. Identifying optimal data aggregation interval sizes for link and corridor travel time estimation and forecasting
- Author
-
Byron J. Gajewski, Chang-Ho Choi, Dongjoo Park, Laurence R. Rilett, and Clifford H. Spiegelman
- Subjects
Estimation ,Mathematical optimization ,Engineering ,Mean squared error ,business.industry ,Poison control ,Transportation ,Interval (mathematics) ,Variance (accounting) ,Function (mathematics) ,Development ,Traffic flow ,Data aggregator ,Transport engineering ,business ,Civil and Structural Engineering - Abstract
With the recent increase in the deployment of ITS technologies in urban areas throughout the world, traffic management centers have the ability to obtain and archive large amounts of data on the traffic system. These data can be used to estimate current conditions and predict future conditions on the roadway network. A general solution methodology for identifying the optimal aggregation interval sizes for four scenarios is proposed in this article: (1) link travel time estimation, (2) corridor/route travel time estimation, (3) link travel time forecasting, and (4) corridor/route travel time forecasting. The methodology explicitly considers traffic dynamics and frequency of observations. A formulation based on mean square error (MSE) is developed for each of the scenarios and interpreted from a traffic flow perspective. The methodology for estimating the optimal aggregation size is based on (1) the tradeoff between the estimated mean square error of prediction and the variance of the predictor, (2) the differences between estimation and forecasting, and (3) the direct consideration of the correlation between link travel time for corridor/route estimation and forecasting. The proposed methods are demonstrated using travel time data from Houston, Texas, that were collected as part of the automatic vehicle identification (AVI) system of the Houston Transtar system. It was found that the optimal aggregation size is a function of the application and traffic condition.
- Published
- 2008
- Full Text
- View/download PDF
23. On the exact Berk-Jones statistics and their $p$-value calculation
- Author
-
Boaz Nadler, Amit Moscovich, and Clifford H. Spiegelman
- Subjects
FOS: Computer and information sciences ,Statistics and Probability ,Mathematics - Statistics Theory ,Statistics Theory (math.ST) ,02 engineering and technology ,Statistics - Computation ,01 natural sciences ,p-value computation ,Methodology (stat.ME) ,010104 statistics & probability ,Statistics ,FOS: Mathematics ,0202 electrical engineering, electronic engineering, information engineering ,Range (statistics) ,p-value ,0101 mathematics ,62G20 ,Computation (stat.CO) ,Statistics - Methodology ,Statistic ,Mathematics ,Rare-weak model ,020206 networking & telecommunications ,Infimum and supremum ,Hypothesis testing ,62-04 ,Asymptotically optimal algorithm ,Distribution (mathematics) ,Continuous goodness-of-fit ,Statistics, Probability and Uncertainty ,62G10 - Abstract
Continuous goodness-of-fit testing is a classical problem in statistics. Despite having low power for detecting deviations at the tail of a distribution, the most popular test is based on the Kolmogorov-Smirnov statistic. While similar variance-weighted statistics, such as Anderson-Darling and the Higher Criticism statistic give more weight to tail deviations, as shown in various works, they still mishandle the extreme tails. As a viable alternative, in this paper we study some of the statistical properties of the exact $M_n$ statistics of Berk and Jones. We derive the asymptotic null distributions of $M_n, M_n^+, M_n^-$, and further prove their consistency as well as asymptotic optimality for a wide range of rare-weak mixture models. Additionally, we present a new computationally efficient method to calculate $p$-values for any supremum-based one-sided statistic, including the one-sided $M_n^+,M_n^-$ and $R_n^+,R_n^-$ statistics of Berk and Jones and the Higher Criticism statistic. We illustrate our theoretical analysis with several finite-sample simulations., 29 pages, 3 figures, pdflatex; Minor revision
- Published
- 2016
- Full Text
- View/download PDF
24. A computation saving Jackknife approach to receptor model uncertainty statements for serially correlated data
- Author
-
Clifford H. Spiegelman and Eun Sug Park
- Subjects
Process Chemistry and Technology ,Computation ,Bilinear interpolation ,Confidence interval ,Computer Science Applications ,Analytical Chemistry ,Standard error ,Statistics ,Econometrics ,Receptor model ,Jackknife resampling ,Spectroscopy ,Software ,Independence (probability theory) ,Mathematics - Abstract
The use of receptor modeling is now a widely accepted approach to model air pollution data. The resulting estimates of pollution source profiles have error and frequently the uncertainties are obtained under an assumption of independence. In addition traditional Bootstrap approaches are very computationally intensive. We present an intuitive Jackknife alternative that is much less computationally intensive and in simulation examples and actual data seems to demonstrate that it provides wider confidence intervals and larger standard errors for receptor model profile estimates than does the Bootstrap done under the assumption of independence.
- Published
- 2007
- Full Text
- View/download PDF
25. Data Integrity and the Scientific Method: the Case of Bullet Lead Data as Forensic Evidence
- Author
-
Clifford H. Spiegelman and Karen Kafadar
- Subjects
Forensic science ,Forensic statistics ,Lead (geology) ,Computer science ,Data integrity ,ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION ,Special section ,ComputingMilieux_COMPUTERSANDSOCIETY ,ComputingMilieux_LEGALASPECTSOFCOMPUTING ,General Medicine ,Computer security ,computer.software_genre ,computer - Abstract
(2006). Data Integrity and the Scientific Method: the Case of Bullet Lead Data as Forensic Evidence. CHANCE: Vol. 19, Forensics: Special Section on Forensic Statistics, pp. 17-25.
- Published
- 2006
- Full Text
- View/download PDF
26. Some aspects of multivariate calibration with incomplete designs
- Author
-
Frits H. Ruymgaart, Clifford H. Spiegelman, Sang Joon Lee, and Joseph M. Conny
- Subjects
Multivariate statistics ,Calibration (statistics) ,Process Chemistry and Technology ,Carry (arithmetic) ,Univariate model ,Inverse ,Multivariate calibration ,Computer Science Applications ,Analytical Chemistry ,Statistics ,Econometrics ,Spectroscopy ,Software ,Mathematics - Abstract
There has been some debate whether inverse or classical calibration methods are superior when there are multivariate predictors and some of them are missing. In this paper, we compare these two methods in the case where the design is not completely known. We develop some general results in the multivariate case and carry out extensive simulations in a univariate model with partly known regressors and several error distributions. These simulations reveal that the methods perform differently, depending on the specifics of the model. Neither method, however, turns out to be consistently superior to the other.
- Published
- 2005
- Full Text
- View/download PDF
27. Factors Affecting Binder Properties between Production and Construction
- Author
-
Charles J. Glover, Amy Epps Martin, Eun Sug Park, Edith Arámbula Mercado, and Clifford H. Spiegelman
- Subjects
Engineering ,business.industry ,Building and Construction ,Test method ,Asphalt pavement ,Mechanics of Materials ,Asphalt ,Storage tank ,Dynamic shear rheometer ,Forensic engineering ,Production (economics) ,General Materials Science ,business ,Performance grade ,Process engineering ,Quality assurance ,Civil and Structural Engineering - Abstract
The majority of U.S. departments of transportation (DOT) maintain quality assurance (QA) programs that require asphalt binder testing to verify grade compliance according to Superpave performance grade (PG) specifications. In Texas the binder is tested immediately after binder production, although the goal of QA is to ensure that the binder specified is used during construction. Between production and construction, the material is stored and transferred multiple times before it reaches the construction site or hot mix asphalt (HMA) plant. During this journey, binder properties may change due to many factors and these changes may have a negative impact on performance. To evaluate which factors are detrimental, a laboratory testing program was conducted to simulate the effects of storage time, storage temperature, contamination, and modification on the dynamic shear rheometer (DSR) performance parameter G*∕sin δ after rolling thin film oven (RTFO) aging. A statistical analysis was performed, and the results...
- Published
- 2005
- Full Text
- View/download PDF
28. Correspondence estimation of the source profiles in receptor modeling
- Author
-
Byron J. Gajewski and Clifford H. Spiegelman
- Subjects
Statistics and Probability ,Analisis factorial ,Consistency (statistics) ,Ecological Modeling ,Statistics ,Estimator ,Contrast (statistics) ,Constant (mathematics) ,Asymptotic theory (statistics) ,Algorithm ,Least squares ,Metrology ,Mathematics - Abstract
This article considers the estimation of source profiles from pollution data collected at one receptor site. At this receptor site, varying metrological conditions can cause errors that are possibly a mixture of distributions. A standard estimator utilizes a least squares approach because of its optimal properties under normally distributed errors and consistency under many other distributions. In contrast, we study the behavior of least squares relative to our new approach, which is better suited for dealing with errors having a mixture of distributions. The estimator loses efficiency under normal errors, but in turn gains efficiency while in the presence of a mixture of distributions. The new alternative has a tuning constant that determines the level of efficiency, which we show using asymptotic theory for large samples and simulation for small samples. An example from Houston, TX, U.S.A. is considered. Copyright © 2004 John Wiley & Sons, Ltd.
- Published
- 2004
- Full Text
- View/download PDF
29. Locating nearby sources of air pollution by nonparametric regression of atmospheric concentrations on wind direction
- Author
-
Yu-Shuo Chang, Clifford H. Spiegelman, and Ronald C. Henry
- Subjects
Atmospheric Science ,Meteorology ,Air pollution ,Triangulation (social science) ,Regression analysis ,Wind direction ,medicine.disease_cause ,Confidence interval ,Nonparametric regression ,symbols.namesake ,Gaussian function ,symbols ,medicine ,Environmental science ,Air quality index ,General Environmental Science - Abstract
The relationship of the concentration of air pollutants to wind direction has been determined by nonparametric regression using a Gaussian kernel. The results are smooth curves with error bars that allow for the accurate determination of the wind direction where the concentration peaks, and thus, the location of nearby sources. Equations for this method and associated confidence intervals are given. A nonsubjective method is given to estimate the only adjustable parameter. A test of the method was carried out using cyclohexane data from 1997 at two sites near a heavy industrial region in Houston, Texas, USA. According to published emissions inventories, 70% of the cyclohexane emissions are from one source. Nonparametric regression correctly identified the direction of this source from each site. The location of the source determined by triangulation of these directions was
- Published
- 2002
- Full Text
- View/download PDF
30. Bilinear estimation of pollution source profiles and amounts by using multivariate receptor models
- Author
-
Eun Sug Park, Clifford H. Spiegelman, and Ronald C. Henry
- Subjects
Statistics and Probability ,Pollution ,Multivariate statistics ,Mathematical optimization ,Computer science ,Ecological Modeling ,media_common.quotation_subject ,Bilinear interpolation ,Estimator ,Context (language use) ,Standard error ,Non-linear least squares ,Econometrics ,Identifiability ,media_common - Abstract
Multivariate receptor models aim to identify the pollution sources based on multivariate air pollution data. This article is concerned with estimation of the source profiles (pollution recipes) and their contributions (amounts of pollution). The estimation procedures are based on constrained nonlinear least squares methods with the constraints given by nonnegativity and identifiability conditions of the model parameters. We investigate several identifiability conditions that are appropriate in the context of receptor models, and also present new sets of identifiability conditions, which are often reasonable in practice when the other traditional identifiability conditions fail. The resulting estimators are consistent under appropriate identifiability conditions, and standard errors for the estimators are also provided. Simulation and application to real air pollution data illustrate the results. Copyright © 2002 John Wiley & Sons, Ltd.
- Published
- 2002
- Full Text
- View/download PDF
31. Chemometrics
- Author
-
Philip K Hopke, Clifford H. Spiegelman, and KWANG‐Su Park
- Published
- 2014
- Full Text
- View/download PDF
32. Assessment of source-specific health effects associated with an unknown number of major sources of multiple air pollutants: a unified Bayesian approach
- Author
-
Man Suk Oh, Philip K. Hopke, Eun Sug Park, Clifford H. Spiegelman, Elaine Symanski, and Daikwon Han
- Subjects
Statistics and Probability ,Pollution ,Estimation ,Multivariate statistics ,Air Pollutants ,Models, Statistical ,Fine particulate ,media_common.quotation_subject ,Bayesian probability ,Uncertainty ,Bayes Theorem ,General Medicine ,Air pollutants ,Apportionment ,Cardiovascular Diseases ,Econometrics ,Environmental science ,Identifiability ,Humans ,Statistics, Probability and Uncertainty ,media_common - Abstract
There has been increasing interest in assessing health effects associated with multiple air pollutants emitted by specific sources. A major difficulty with achieving this goal is that the pollution source profiles are unknown and source-specific exposures cannot be measured directly; rather, they need to be estimated by decomposing ambient measurements of multiple air pollutants. This estimation process, called multivariate receptor modeling, is challenging because of the unknown number of sources and unknown identifiability conditions (model uncertainty). The uncertainty in source-specific exposures (source contributions) as well as uncertainty in the number of major pollution sources and identifiability conditions have been largely ignored in previous studies. A multipollutant approach that can deal with model uncertainty in multivariate receptor models while simultaneously accounting for parameter uncertainty in estimated source-specific exposures in assessment of source-specific health effects is presented in this paper. The methods are applied to daily ambient air measurements of the chemical composition of fine particulate matter ([Formula: see text]), weather data, and counts of cardiovascular deaths from 1995 to 1997 for Phoenix, AZ, USA. Our approach for evaluating source-specific health effects yields not only estimates of source contributions along with their uncertainties and associated health effects estimates but also estimates of model uncertainty (posterior model probabilities) that have been ignored in previous studies. The results from our methods agreed in general with those from the previously conducted workshop/studies on the source apportionment of PM health effects in terms of number of major contributing sources, estimated source profiles, and contributions. However, some of the adverse source-specific health effects identified in the previous studies were not statistically significant in our analysis, which probably resulted because we incorporated parameter uncertainty in estimated source contributions that has been ignored in the previous studies into the estimation of health effects parameters.
- Published
- 2014
33. Calibration Transfer between PDA-Based NIR Spectrometers in the NIR Assessment of Melon Soluble Solids Content
- Author
-
Kerry B. Walsh, Clifford H. Spiegelman, Peter Wolfs, and Colin Victor. Greensill
- Subjects
Materials science ,Spectrometer ,Mean squared error ,business.industry ,010401 analytical chemistry ,Near-infrared spectroscopy ,Analytical chemistry ,Wavelet transform ,04 agricultural and veterinary sciences ,040401 food science ,01 natural sciences ,0104 chemical sciences ,Photodiode ,law.invention ,Chemometrics ,symbols.namesake ,0404 agricultural biotechnology ,Fourier transform ,Optics ,law ,symbols ,business ,Instrumentation ,Diffraction grating ,Spectroscopy - Abstract
In near-infrared (NIR) spectroscopy, the transfer of predictive models between Fourier transform near-infrared (FT-NIR) and scanning–grating-based instruments has been accomplished on relatively dry samples (< 10% water) using various chemometric techniques—for example, slope and bias correction (SBC), direct standardization (DS), piecewise direct standardization (PDS), orthogonal signal correction (OSC), finite impulse transform (FIR) and wavelet transform (WT), and application of neural networks. In this study, seven well-known techniques [SBC, DS, PDS, double-window PDS (DWPDS), OSC, FIR, and WT], a photometric response correction and wavelength interpolative method, and a model updating method were assessed in terms of root mean square error of prediction (RMSEP) (using Fearn's significance testing) for calibration transfer (standardization) between pairs of spectrometers from a group of four spectrometers for noninvasive prediction of soluble solid content (SSC) of melon fruit. The spectrometers were diffraction grating-based instruments incorporating photodiode array photodetectors (MMS1, Carl Zeiss, Jena, Germany), used with a standard optical geometry of sample, light source, and spectrometer. A modified WT method performed significantly better than all other standardization methods and on a par with model updating.
- Published
- 2001
- Full Text
- View/download PDF
34. Intelligent Transportation System Data Archiving: Statistical Techniques for Determining Optimal Aggregation Widths for Inductive Loop Detector Speed Data
- Author
-
Shawn Turner, Clifford H. Spiegelman, Byron J. Gajewski, and William L Eisele
- Subjects
Induction loop ,Operations research ,Mean squared error ,Computer science ,Mechanical Engineering ,Detector ,Real-time computing ,Intelligent transportation system ,Road traffic ,Sufficient statistic ,Civil and Structural Engineering - Abstract
Although most traffic management centers collect intelligent transportation system (ITS) traffic monitoring data from local controllers in 20-s to 30-s intervals, the time intervals for archiving data vary considerably from 1 to 5, 15, or even 60 min. Presented are two statistical techniques that can be used to determine optimal aggregation levels for archiving ITS traffic monitoring data: the cross-validated mean square error and the F-statistic algorithm. Both techniques seek to determine the minimal sufficient statistics necessary to capture the full information contained within a traffic parameter distribution. The statistical techniques were applied to 20-s speed data archived by the TransGuide center in San Antonio, Texas. The optimal aggregation levels obtained by using the two algorithms produced reasonable and intuitive results—both techniques calculated optimal aggregation levels of 60 min or more during periods of low traffic variability. Similarly, both techniques calculated optimal aggregation levels of 1 min or less during periods of high traffic variability (e.g., congestion). A distinction is made between conclusions about the statistical techniques and how the techniques can or should be applied to ITS data archiving. Although the statistical techniques described may not be disputed, there is a wide range of possible aggregation solutions based on these statistical techniques. Ultimately, the aggregation solutions may be driven by nonstatistical parameters such as cost (e.g., “How much do we/the market value the data?”), ease of implementation, system requirements, and other constraints.
- Published
- 2000
- Full Text
- View/download PDF
35. Improving Complex Near-IR Calibrations Using a New Wavelength Selection Algorithm
- Author
-
Michael J. McShane, Brent D. Cameron, Clifford H. Spiegelman, and Gerard L. Coté
- Subjects
Chemistry ,business.industry ,010401 analytical chemistry ,01 natural sciences ,0104 chemical sciences ,010309 optics ,Chemometrics ,Optics ,Sampling (signal processing) ,Interfacing ,0103 physical sciences ,Partial least squares regression ,Genetic algorithm ,Calibration ,business ,Instrumentation ,Algorithm ,Selection algorithm ,Spectroscopy ,Selection (genetic algorithm) - Abstract
Near-infrared spectroscopy is being considered as a tool for the noninvasive determination of important cell culture media constituents, which would allow frequent, harmless sampling and computer interfacing for closed-loop control. Partial least-squares calibration models for glucose and lactate are constructed for cell culture media and aqueous media comprised of several absorbing species. Wavelength selection, having failed in previous attempts with these data, is shown to reduce the error prediction and number of required wavelengths when performed with the use of a newly developed “peak-hopping” algorithm. The selection method reduces prediction errors in every case considered here and is extendable to combined calibration models that are built for use with a particular type of sample with the aid of high-quality spectra from simpler mixtures. The new selection algorithm leads to calibrations producing accurate predictions with fewer wavelengths, in support of previous results obtained when applied to single-component Raman spectroscopy data. The findings continue to suggest that the algorithm can be used as a simple alternative to the difficult-to-configure genetic algorithm.
- Published
- 1999
- Full Text
- View/download PDF
36. Comparing a new algorithm with the classic methods for estimating the number of factors
- Author
-
Eun Sug Park, Ronald C. Henry, and Clifford H. Spiegelman
- Subjects
Trace (linear algebra) ,Covariance matrix ,Process Chemistry and Technology ,Analytic model ,Bartlett's test ,Cross-validation ,Computer Science Applications ,Analytical Chemistry ,Set (abstract data type) ,Indicator function ,Algorithm ,Spectroscopy ,Software ,Eigenvalues and eigenvectors ,Mathematics - Abstract
This paper presents and compares a new algorithm for finding the number of factors in a data analytic model. After we describe the new method, called NUMFACT, we compare it with standard methods for finding the number of factors to use in a model. The standard methods that we compare NUMFACT with are Malinowski's indicator function, Wold's cross-validation approach, Bartlett's test, scree plots, the rule-of-one, and using the number of factors (eigenvectors) needed to explain 90% of the trace of a correlation matrix. Using a diverse set of real applications, NUMFACT is shown to be the clear method of choice.
- Published
- 1999
- Full Text
- View/download PDF
37. A novel peak-hopping stepwise feature selection method with application to Raman spectroscopy1This paper is dedicated to the memory of Jean Thomas Clerc: scientist, editor, luminary, and dog breeder.1
- Author
-
Michael J. McShane, Clifford H. Spiegelman, Brent D. Cameron, Massoud Motamedi, and Gerard L. Coté
- Subjects
Chemistry ,Sample (statistics) ,Feature selection ,Biochemistry ,Analytical Chemistry ,Ranking (information retrieval) ,Chemometrics ,Partial least squares regression ,Statistics ,Genetic algorithm ,Environmental Chemistry ,Point (geometry) ,Algorithm ,Spectroscopy ,Selection (genetic algorithm) - Abstract
A new stepwise approach to variable selection for spectroscopy that includes chemical information and attempts to test several spectral regions producing high ranking coefficients has been developed to improve on currently available methods. Existing selection techniques can, in general, be placed into two groups: the first, time-consuming optimization approaches that ignore available information about sample chemistry and require considerable expertise to arrive at appropriate solutions (e.g. genetic algorithms), and the second, stepwise procedures that tend to select many variables in the same area containing redundant information. The algorithm described here is a fast stepwise procedure that uses multiple ranking chains to identify several spectral regions correlated with known sample properties. The multiple-chain approach allows the generation of a final ranking vector that moves quickly away from the initial selection point, testing several areas exhibiting correlation between spectra and composition early in the stepping procedure. Quantitative evidence of the success of this approach as applied to Raman spectroscopy is given in terms of processing speed, number of selected variables, and prediction error in comparison with other selection methods. In this respect, the procedure described here may be considered as a significant evolutionary step in variable selection algorithms.
- Published
- 1999
- Full Text
- View/download PDF
38. Assessment of Partial Least-Squares Calibration and Wavelength Selection for Complex Near-Infrared Spectra
- Author
-
Clifford H. Spiegelman, Gerard L. Coté, and Michael J. McShane
- Subjects
Analyte ,Chemistry ,010401 analytical chemistry ,Near-infrared spectroscopy ,Analytical chemistry ,01 natural sciences ,Spectral line ,0104 chemical sciences ,010309 optics ,Wavelength ,0103 physical sciences ,Partial least squares regression ,Calibration ,Fourier transform infrared spectroscopy ,Absorption (electromagnetic radiation) ,Instrumentation ,Spectroscopy - Abstract
Complex near-infrared (near-IR) spectra of aqueous solutions containing five independently varying absorbing species were collected to assess the ability of partial least-squares (PLS) regression and wavelength selection for calibration and prediction of these species in the presence of each other. It was confirmed that PLS calibration models can successfully predict chemical concentrations of all five chemicals from a single spectrum. It was observed from the PLS spectral loadings that spectral regions containing absorption bands of a single analyte alone were not sufficient for the model to adequately predict the concentration of the analyte because of the high degree of overlap between glucose, lactate, ammonia, glutamate, and glutamine. Three wavelength selection algorithms were applied to the spectra to identify regions containing necessary information, and in each case it was found that nearly the entire spectral range was needed for each determination. The results suggest that wavelength selection does result in a reduction of data points from the full spectrum, but the decrease seen with these near-infrared spectra was less than typically seen in mid-IR or Raman spectra, where peaks are narrower and well separated. As a result of this need for more wavelengths, the engineering of a dedicated system to measure these analytes in complex media such as blood or tissue culture broths by using this near-infrared region (2.0–2.5 μm) is further complicated.
- Published
- 1998
- Full Text
- View/download PDF
39. Evaluating black boxes: an ad-hoc method for assessing nonparametric and nonlinear curve-fitting estimators
- Author
-
F. Michael Speed and Clifford H. Spiegelman
- Subjects
Statistics and Probability ,Mean squared error ,Modeling and Simulation ,Outlier ,Statistics ,Nonparametric statistics ,Curve fitting ,Estimator ,Extreme point ,Jackknife resampling ,Smoothing ,Mathematics - Abstract
In many areas of science, novel curve-fitting algorithms are recommended and employed. Users often are left with little means of discerning whether or not the algorithms work as advertised. We propose an ad-hoc method for assessing the behavior of these estimators. By modifying a chemical technique called standard addition, we can assess 1) whether the estimator is responds sensibly to changes in the model and residual distribution and 2) possibly detect outliers or extreme points.
- Published
- 1998
- Full Text
- View/download PDF
40. Variable Selection in Multivariate Calibration of a Spectroscopic Glucose Sensor
- Author
-
Clifford H. Spiegelman, Gerard L. Coté, and Michael J. McShane
- Subjects
Calibration (statistics) ,Chemistry ,010401 analytical chemistry ,Near-infrared spectroscopy ,Analytical chemistry ,Multivariate calibration ,Feature selection ,Regression analysis ,01 natural sciences ,Spectral line ,0104 chemical sciences ,010309 optics ,Wavelength ,0103 physical sciences ,Absorbance spectra ,Biological system ,Instrumentation ,Spectroscopy - Abstract
A variable selection method that reduces prediction bias in partial least-squares regression models was developed and applied to near-infrared absorbance spectra of glucose in pH buffer and cell culture medium. Comparisons between calibration and prediction capability for full spectra and reduced sets were completed. Variable selection resulted in statistically equivalent errors while reducing the number of wavelengths needed to fit the calibration data and predict concentrations from new spectra. Fewer than 25 wavelengths were selected to produce errors statistically equivalent to those yielded by the full set containing over 500 wavelengths. The algorithm correctly chose the glucose absorption peak areas as the information-carrying spectral regions.
- Published
- 1997
- Full Text
- View/download PDF
41. Asymptotic minimax calibration estimates
- Author
-
Suojin Wang, Clifford H. Spiegelman, and Michael C. Denham
- Subjects
Mathematical optimization ,Calibration (statistics) ,Process Chemistry and Technology ,Multivariate calibration ,Feature selection ,Minimax ,Computer Science Applications ,Analytical Chemistry ,Nonlinear system ,Simple (abstract algebra) ,Partial least squares regression ,Applied mathematics ,Spectroscopy ,Software ,Mathematics - Abstract
This paper gives methods that use measurements from calibrated instruments in an effective and understandable manner. While some chemometric methods such as partial least squares might be considered, the procedures that we use are more transparent. In this paper two simple methods are proposed that use standard and saddlepoint approximations to combine nonlinear estimates from different regions of the instrument response. The asymptotic accuracy of the approximations is discussed. A worked example is given. A simulation study is also reported that supports our recommendations.
- Published
- 1996
- Full Text
- View/download PDF
42. Chemometrics
- Author
-
Philip K Hopke, Clifford H. Spiegelman, and KWANG‐Su Park
- Published
- 2012
- Full Text
- View/download PDF
43. Detecting interactions using low dimensional searches in high dimensional data
- Author
-
Clifford H. Spiegelman and C.Y. Wang
- Subjects
Clustering high-dimensional data ,Chemometrics ,Computer science ,Process Chemistry and Technology ,Monte Carlo method ,Data mining ,computer.software_genre ,computer ,Spectroscopy ,Software ,Computer Science Applications ,Analytical Chemistry - Abstract
One important issue in chemometrics is to detect interactions among several factors. In this paper, we propose methods that detect interactions using low dimensional smoothers. Two methods are investigated and compared with usual least squared methods via Monte Carlo simulations. In addition, we show, using real data, how the methods affect our decisions.
- Published
- 1994
- Full Text
- View/download PDF
44. Authors' response
- Author
-
Eun Sug Park, Clifford H. Spiegelman, and Ronald C. Henry
- Subjects
Statistics and Probability ,Ecological Modeling - Published
- 2002
- Full Text
- View/download PDF
45. Theoretical Justification of Wavelength Selection in PLS Calibration: Development of a New Algorithm
- Author
-
M.J. Goetz, Clifford H. Spiegelman, Michael J. McShane, Qin Li Yue, Gerard L. Coté, and Massoud Motamedi
- Subjects
Chemistry ,Gaussian ,Analytical Chemistry ,Data set ,Noise ,symbols.namesake ,Ranking ,Outlier ,Statistics ,Calibration ,symbols ,Selection algorithm ,Algorithm ,Selection (genetic algorithm) - Abstract
The mathematical basis of improved calibration through selection of informative variables for partial least-squares calibration has been identified. A theoretical investigation of calibration slopes indicates that including uninformative wavelengths negatively affect calibrations by producing both large relative bias toward zero and small additive bias away from the origin. These theoretical results are found regardless of the noise distribution in the data. Studies are performed to confirm this result using a previously used selection method compared to a new method, which is designed to perform more appropriately when dealing with data having large outlying points by including estimates of spectral residuals. Three different data sets are tested with varying noise distributions. In the first data set, Gaussian and log-normal noise was added to simulated data which included a single peak. Second, near-infrared spectra of glucose in cell culture media taken with an FT-IR spectrometer were analyzed. Finally, dispersive Raman Stokes spectra of glucose dissolved in water were assessed. In every case considered here, improved prediction is produced through selection, but data with different noise characteristics showed varying degrees of improvement depending on the selection method used. The practical results showed that, indeed, including residuals into ranking criteria improves selection for data with noise distributions resulting in large outliers. It was concluded that careful design of a selection algorithm should include consideration of spectral noise distributions in the input data to increase the likelihood of successful and appropriate selection.
- Published
- 2011
46. Applying and developing receptor models to the 1990 El Paso air data: a look at receptor modeling with uncharacterized sources and graphical diagnostics
- Author
-
Clifford H. Spiegelman and Stuart Dattner
- Subjects
Chemistry ,Completeness (order theory) ,Environmental Chemistry ,Sampling (statistics) ,Data mining ,computer.software_genre ,Biochemistry ,computer ,Spectroscopy ,Analytical Chemistry - Abstract
This paper represents an ongoing receptor modeling research of airborne species in El Paso, Texas. It represents a six month collaboration between the authors. It extends the case study reported by Spiegelman and Dattner in 1992. For completeness the background material is reviewed.
- Published
- 1993
- Full Text
- View/download PDF
47. Plotting aids for multivariate calibration and chemostatistics
- Author
-
Clifford H. Spiegelman
- Subjects
Chemometrics ,Multivariate analysis ,Calibration (statistics) ,Process Chemistry and Technology ,Statistics ,Multivariate calibration ,Spectroscopy ,Software ,Computer Science Applications ,Analytical Chemistry ,Mathematics - Abstract
Spiegelman, C.H., 1992. Plotting aids for multivariate calibration and chemostatistics. Chemometrics and Intelligent Laboratory Systems , 15: 29–38. There are few published procedures for plotting multivariate calibration data. In this paper I give some new plotting techniques that have been useful in my research and consulting. There are plots for detecting matrix effects, and other violations of Beer's law, plots for checking the selectivity of channels and frequencies, and plots that help diagnose the importance of frequencies within a peak.
- Published
- 1992
- Full Text
- View/download PDF
48. Chemometrics and spectral frequency selection
- Author
-
Clifford H. Spiegelman, Philip J. Brown, and Michael C. Denham
- Subjects
Chemometrics ,Basis (linear algebra) ,Calibration curve ,General Engineering ,Calibration ,Calculus ,Univariate ,Feature selection ,Measure (mathematics) ,Algorithm ,Data reduction ,Mathematics - Abstract
In many fields of science, the simple straight line has received more attention as a basis for calibration than any other form. This is because measuring devices have been mainly univariate and have had calibration curves which were sufficiently linear. As scientific fields become more computationally intensive they rely on more computer-driven multivariate measurement devices. The number of responses may be large. For example modern scanning infrared (IR) spectroscopes measure the absorptions or reflectances at a sequence of around one thousand frequencies. Training data may consist of the order of 10 to 100 carefully designed samples for which the true composition is either known by formulation or accurately determined by wet chemistry. In future one wishes to predict the true composition from the spectrum. In this paper we develop a variable selection approach which is both simple in concept and computationally easy to implement. Its motivation is the minimization of the width of a confidence interval. The technique for data reduction is illustrated on a mid-IR spectroscopic analysis of a liquid detergent in which the calibrating data consists of 12 observations of absorptions at 1168 frequency channels (responses) corresponding to five chemical ingredients.
- Published
- 1991
- Full Text
- View/download PDF
49. Bias correcting confidence intervals for a nearly common property
- Author
-
Daren B. H. Cline and Clifford H. Spiegelman
- Subjects
Systematic error ,Process Chemistry and Technology ,Confidence interval ,Robust confidence intervals ,Computer Science Applications ,Analytical Chemistry ,Random error ,Statistics ,Credible interval ,Confidence distribution ,Common property ,Algorithm ,Spectroscopy ,Software ,CDF-based nonparametric confidence interval ,Mathematics - Abstract
Cline, D.B.H. and Spiegelman, C.H., 1991. Bias correcting confidence intervals for a nearly common property. Chemometrics and Intelligent Laboratory System, 11: 131–136. Confidence intervals are an important tool. Realistic confidence intervals account for both random errors and systematic errors (bias). We improve the usual method for combining random and systematic errors. The new methods are simple and often result in increased accuracy for confidence interval levels.
- Published
- 1991
- Full Text
- View/download PDF
50. A statistical method for calibrating flame emission spectrometry which takes account of errors in the calibration standards
- Author
-
Linda Hungwu, Clifford H. Spiegelman, and Robert L. Watters
- Subjects
Observational error ,Calibration curve ,Process Chemistry and Technology ,Data transformation (statistics) ,Studentized residual ,Computer Science Applications ,Analytical Chemistry ,Linear regression ,Outlier ,Statistics ,Calibration ,Influential observation ,Spectroscopy ,Software ,Mathematics - Abstract
Spiegelman, C.H., Watters, R.L. and Hungwu, L., 1991. A statistical method for calibrating flame emission spectrometry which takes account of errors in the calibration standards. Chemometrics and Intelligent Laboratory Systems , 11: 121–130. The determination of potassium in sample solutions using flame emission spectrometry (FES) requires that the calibration function be estimated. Calibration standard solutions of potassium are made in the laboratory and nebulized into the FES instrument. A total of 240 data points were collected from chemical analyses. However the data come from only eight different values of the standards and are highly correlated within each standard. In the final analysis a sample size of eight was chosen to estimate the calibration curve. The main goal of this paper is to estimate the calibration function based on these data and then measure the amount of potassium in samples using this calibration function. The secondary goal is to show some of the important exploratory data analysis that should be done in any calibration. Since both the underlying theory of emission spectrometry and the scatter plot of data points suggest a linear relationship between the emission intensity and potassium concentration, a linear regression model is applied to fit these data and the residuals are examined based on regression assumptions. Data transformation is then attempted to stabilize the nonconstant variance of the residuals due to the fact that residuals fail to meet the assumptions. However, because the suggested transformation of taking the logarithm or 1/4 power of both x and y is hard to interpret, and because the log transformation would require an addition of an arbitrary constant to the standard values, we proceeded with the untransformed data. Outlier detection was used to find possible outliers. Ten consecutive observations (obs. 201–210) in the data set are potential outliers for they have absolute studentized residuals bigger than 2.7. However, influential observation techniques indicate that their effect on the estimation of the calibration curve is not great. In order to help compensate for the error in the calibration standards, we expand the calibration interval estimates. This compensation is important and helps to avoid the rather ad-hoc deletion of unusually influential data from the analysis. We think that a plausible explanation for the outliers is error in the calibration standards. In recognition of the heterogeneity of variance indicated by our data, we perform a weighted least squares type of confidence interval estimation for our calibration curve. The coefficient and standard error estimates are quite close for all weighted cases; in contrast, the unweighted case yields different values for the standard errors. If heteroscedasticity (nonconstant variance of the observations) is ignored, confidence intervals will be too wide at the low end and too narrow at the high end of the calibration curve. At both ends of the calibration curve, each resulting multiple-use calibration confidence interval is somewhat wider than the corresponding single-use calibration confidence interval. Finally, since the measurement errors that have a known finite bound in working standards have been taken into account, the increase in confidence intervals relative to presumed exact standards is about 0.1%.
- Published
- 1991
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.