1. Commentary: The reliability of telomere length measurements
- Author
-
Jeremy D. Kark, Simon Verhulst, Athanase Benetos, Mirre J. P. Simons, Troels Steenstrup, Ezra Susser, Pam Factor-Litvak, Abraham Aviv, and Verhulst lab
- Subjects
Reproducibility ,education.field_of_study ,Correlation coefficient ,Epidemiology ,Coefficient of variation ,Population ,Methodology ,General Medicine ,Statistical power ,QPCR ,Sample size determination ,Statistics ,ComputingMethodologies_DOCUMENTANDTEXTPROCESSING ,education ,GeneralLiterature_REFERENCE(e.g.,dictionaries,encyclopedias,glossaries) ,Reliability (statistics) ,Mathematics ,Rank correlation - Abstract
The importance of telomere biology in human disease is increasingly recognized and, in parallel, use of telomere length (TL) measures is proliferating in epidemiological and clinical studies. Such studies measure leukocyte TL (LTL) using several methodological approaches. Shorter LTL is associated with atherosclerosis1 and all-cause mortality.2 Given the increasingly recognized role of TL in human ageing and its related diseases, it is essential to know more about the reliability and validity of TL measurement methods, their comparability and which method is optimal for a specific epidemiological/clinical setting. In an effort to address this knowledge gap, Martin-Ruiz et al. (MR)3 studied the reliability of TL measurement techniques. They compared the popular qPCR method with the labour-intensive Southern blots (SBs) and single telomere length analysis (STELA). MR concluded that ‘neither technique nor laboratory had strong influence on result variation’, and that ‘Southern blotting and qPCR are similar in their reproducibility’. Unfortunately, for the following reasons we believe that for epidemiological studies neither conclusion is justified by the data. Reliability of LTL Most DNA samples (10/12) used by MR were obtained from human placenta, cell cultures and cancer cells. However, the inter-assay reliability of LTL is the pertinent parameter for epidemiological studies. MR included only two DNA samples from leukocytes and, because these were added in the second round of the study, they could not be used to measure inter-assay reliability of LTL. TL results for human placenta, cultured and cancer cells cannot be automatically generalized to LTL reliability, which is the primary concern of epidemiologists. Note also that MR used pooled leukocyte samples of multiple donors, and effects of pooling on assay reliability can therefore not be excluded. A previous comparison of LTL reliability has been done for the SB and the qPCR methods in a study4 cited by MR. The study reported a clear difference in inter-assay coefficient of variation (CV) between SB = 1.74% and qPCR = 6.54%, using 50 leukocyte DNA samples from individual donors. Moreover, Steenstrup et al.5 investigated whether LTL elongation in longitudinal studies can be attributed to measurement error vs a real biological phenomenon. They found little evidence for LTL elongation over and above the effects expected from measurement error. At the same time, the available data indicated a substantially larger proportion of individuals with an apparent LTL elongation in qPCR-based studies when compared with SB-based studies. In our view, the most parsimonious explanation for this finding is the higher measurement error of the qPCR method. MR observed that rank correlations between measurements obtained in different laboratories and with different methods were high, reflecting similar rank orders of the observations. Due to the inclusion of different cell types, the range of TLs in this study (4.7-9.2 kb) is much higher, however, than the age group-specific range (about 3 kb by direct SBs within age groups) used in most epidemiological studies of LTL. This will have inflated the rank correlation beyond what is relevant for LTL in epidemiological studies considerably, contributing to the erroneous conclusion that the SB and qPCR methods yielded similar results. Sample size and composition MR used 12 samples. These were measured by two laboratories using SBs, one laboratory using STELA and seven laboratories using qPCR. As both the number of samples and the number of laboratories using techniques other than qPCR were low, the statistical tests used by MR to infer no difference in reliability between methods are underpowered and consequently of limited value. We are thus left puzzled by the authors’ claim of > 95% power to detect the difference previously reported between inter-assay CVs for LTL using SBs and qPCR in 50 leukocyte DNA samples.4 MR provide no details of their calculation in support of this statement, nor on the exact difference between inter-assay CVs for which they calculated their statistical power. Furthermore, the authors combined the two SB and one STELA laboratories for comparisons of inter-laboratory CV across methods. We see little scientific justification for this choice, which in effect leaves one with no information specific to either the SB or STELA technique. For the two leukocyte samples, the inter-laboratory CVs were 6.2% and 6.5% for the SB/STELA laboratories vs 22.2% and 22.2% for the qPCR laboratories (samples K and L, Table 2, in erratum MR)6. These results, albeit from a tiny sample size, are consistent with higher measurement error of the qPCR over SB/STELA based-methods. This is not specific for the leukocyte samples; overall the inter-laboratory CVs were substantially higher when using qPCR (P = 0.001 according to MR). Finally, for the crucial analyses of the inter-assay and intra-assay CVs, the total number of DNA samples was restricted to 5 and 3, respectively, and none of these were from leukocytes. CV as a measure of reliability A characteristic of the CV is its dependence on the mean, and hence the implicit assumption when using the CV is heteroscedasticity, i.e. that the variance is proportional to the mean. We examined whether this assumption holds in the results presented by MR. Figure 1 suggests that it holds for SB. There is a negligible correlation between mean and CV, which is not surprising given the logarithmic nature of molecular size ladders on gels.7 By contrast, Figure 1 suggests that it does not hold for qPCR. There is a strong negative correlation between average TL and CV, which implies that the error made in qPCR-based TL measurements is not proportional to the mean, but instead is closer to a constant (assay-specific) value. Such a finding undermines the CV as a reliability measure for qPCR-based TL studies. Instead we recommend using the intra-class correlation coefficient, which yields an informative estimate, provided that the ‘test’ samples are similarly distributed as the samples in the investigated population. Figure 1. Coefficient of variation (CV%) between laboratories for SB/STELA vs qPCR plotted against telomere length. Telomere length was standardized per laboratory, dividing the results for all samples by the value obtained for sample G. The X-axis displays the ... Figure 1 also illustrates the larger range of values obtained with qPCR when compared with SB. MR suggest that the larger ‘dynamic range’ obtained with qPCR compensates for the lower precision of the method. However, when CV values are calculated for SB laboratories alone (i.e. ignoring the STELA results), the inter-laboratory CV is in fact over 40% higher for the qPCR laboratories (paired t-test, t = 2.39, df = 18, P
- Published
- 2015