Inference in science depends on us having the right tools to measure with sufficient accuracy and precision the phenomena we are attempting to understand. In this issue of Molecular Metabolism, Burnett and Grobe call into question the accuracy of respirometry, the standard method for measuring energy expenditure in animals and humans [1]. This is an important contribution because in the field of obesity, almost without exception, scientists are agreed that the problem is due to energy imbalance. Energy intake is too high, expenditure is too low, or both. This energy balance framework provides a useful starting point for any discussions about fat storage and obesity [2]. However, the level of energy imbalance that can drive fat storage is rather small. An example is given in Ref. [3] of a typical 45 year old man who might accumulate 0.5 kg of fat over the course of a year, containing around 13.8 MJ of energy, which is only 0.27% of the estimated 5180 MJ of energy expenditure over the same period. Clearly the tools necessary to detect this sort of level of variation in energy expenditure would need to be spectacularly good, and it is widely known that none of the available methods comes close to providing us with this level of accuracy or precision. However, other questions in the field may require less stringent methods. Take for example the characterisation of a mutant mouse, or the impact of a given dietary exposure, or a drug. Here we might expect the impacts on expenditure to be larger, and hence the methodological requirements less taxing. What Burnett and Grobe show is that the techniques we currently have available may not be up to even this less demanding task. Generally, scientists interested in energy expenditure by animals do not measure it directly. They use an indirect approach called respirometry (or indirect calorimetry), in which gaseous O2 consumption and CO2 production are quantified over time. By making several simplifying assumptions the measured O2 consumption can be converted to heat production (energy expenditure) utilising the simultaneous CO2 production to diagnose the metabolic substrate being used, and hence the energy equivalence of the respired oxygen. This is done because accurately and directly measuring the heat produced in small animals is very difficult [see discussion in Ref. [4]]. The quantities are small (mice typically burn energy at a rate of less than 0.2 W), hence a typical 50 W light bulb is pumping out heat equivalent to over 250 mice, so the devices to pick it up have to be supersensitive. But this heat doesn't always immediately leave the body. It can be used to heat up the body, or, alternatively, excess heat can be released if the body cools down during the measurement period. Mice often have very labile body temperatures when placed in a measurement apparatus like a direct or indirect calorimeter (Figure 1). Converting such temperature changes back to the heat that caused them is complex because it depends on the exact body composition of the animal. Changes of up to 2 °C are not uncommon in such measurements requiring about 100 J of energy. If this change happened over say an hour (as in Figure 1) the impact on the direct heat production would be an error of about 16%. Moreover if the animal urinates in the chamber this causes all sorts of issues because the voided urine then cools down to ambient, releasing its stored heat. Half a millilitre of urine cooling down from 37 to 20 °C over 10 min would spuriously increase the measured heat production by 36%. Direct calorimetry is therefore very difficult, and depends on custom built pieces of kit – which is why indirect calorimetry based on gas exchange, utilising the technology developed for gas analysis that is employed in many industrial applications and is extremely accurate, and independent of issues like changes in body temperature has become the standard method. But it is important to recognise that the assumptions on which respirometry are built are also approximations, and dynamics of gas flow and mixing can complicate the inference of the energy expenditure that generated such effects [4]. Figure 1 Simultaneous traces of energy expenditure (kJ/day) measured by indirect calorimetry (thick black line), activity (Counts per 30 s; thin line below thick black line) and body temperature (°C; thin line above thick black line) in four mice ... So we have two imperfect technologies. One might anticipate they wouldn't exactly match up. The key is the size of the discrepancy. Burnett and Grobe examined how resting metabolic rate (RMR) changes when mice are placed on a high fat diet (HFD). They measured the changes in two ways. First, using a custom built direct calorimeter combined with simultaneous radio telemetry of body temperature changes to eliminate that source of error. Second, they used a continuous flow respirometry system based on the applied electrochemistry SA3 analyser. It is important to note here that the respirometry system that they used is superior in most respects to the majority of commercially available respirometry equipment, because the analyser utilises two sensors measuring background and excurrent air from the chamber continuously. In many senses, therefore, this is a best case scenario. The difference between methods at baseline was 6.85%, with respirometry being consistently lower. On high fat feeding RMR by respirometry increased, but that by direct calorimetry didn't, narrowing the difference to almost zero. Subsequently shifting from HFD back to chow caused a decrease in RMR by respirometry of 5.93% (back to baseline levels) and by direct calorimetry down by 8.85%, considerably reduced relative to baseline. In other words, the discrepancies were enormous, relative to the sorts of differences we are interested in measuring. In fact this isn't the first demonstration of such discrepancies. The same authors published a similar paper last year in American Journal of Physiology making comparisons of the methods in a different context [5] and there are some earlier studies within the past decade [6,7]. What does this mean? There is a strong tendency in such papers to assume that the direct calorimetry measure is correct, i.e. measured without error, and hence all the error is due to something wrong with the respirometry. The title of Burnett and Grobes previous paper is ‘Direct calorimetry identifies deficiencies in respirometry…’, Walsberg and Hoffman's paper in 2005 is titled ‘Direct calorimetry reveals large errors in respirometric estimates of energy expenditure’ and while their current paper is thankfully less dogmatic about the source of the discrepancy, some bias in this respect still creeps in. For example there is the inference that respirometry falsely detected an increase in RMR on high fat feeding. Does this study, and the other studies comparing direct and indirect calorimetry, mean we should pack up our respirometers until companies develop direct calorimetry machines for the commercial market? My own view is that the source of the discrepancy between the methods remains unproven, and given the theoretical problems highlighted above, may as much lie in the direct as the indirect calorimetry. The inference that respirometry falsely detected an increase in RMR on HFD could equally be interpreted that direct calorimetry failed to detect the change. Attributing the problem to respirometry alone is therefore unfounded. Moreover, surmounting the technological and practical issues in a 24 h direct calorimetry system would be extremely challenging, and, as mentioned by Burnett and Grobe would require surgical implantation of telemetric monitors into all measured animals. Plus we need to remember why indirect calorimetry came to be the standard method in the first place – because the measurements are more reliable and the technology more accurate. I therefore don't think it is time to pack up the respirometers. Not yet. Perhaps, not ever. Nonetheless, this study does emphasise that errors in our existing technology may compromise our interpretations. If these errors are random then it places more emphasis than ever on making sure that such studies have adequate sample sizes to detect the effect sizes of interest based on an appropriate power analysis [8].