Back to Search Start Over

Chip to Chiller Experimental Cooling Failure Analysis of Data Centers: The Interaction Between IT and Facility

Authors :
Russell Tipton
Bahgat Sammakia
Mark Seymour
Husam A. Alissa
David Mendo
Kourosh Nemati
Dustin W. Demetriou
Ken Schneebeli
Source :
IEEE Transactions on Components, Packaging and Manufacturing Technology. 6:1361-1378
Publication Year :
2016
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), 2016.

Abstract

Cooling failure in data centers (DCs) is a complex phenomenon due to the many interactions between the cooling infrastructure and the information technology equipment (IT). To fully understand it, a system integration philosophy is vital to the testing and design of experiment. In this paper, a facility-level DC cooling failure experiment is run and analyzed. An airside cooling failure is introduced to the facility during two different cooling set points as well as in open and contained environments. Quantitative instrumentation includes pressure differentials, tile airflow, external contour and discrete air inlet temperature, intelligent platform management interface (IPMI), and cooling system data during failure recovery. Qualitative measurements include infrared imaging and airflow visualization via smoke trace. To our knowledge of current literature, this is the first experimental study in which an actual multi-aisle facility cooling failure is run with real IT (compute, network, and storage) load in the white space. This will establish a link between variations from the facility to the central processing unit (CPU). The results show that using the external IT inlet temperature sensors, the containment configuration shows a longer available uptime (AU) during failure. However, the IPMI data show the opposite. In fact, the available uptime is reduced significantly when the external sensors are compared to internal IT analytics. The response of the IT power, CPU temperature, and fan speed shows higher values during the containment failure. This occurs because of the instantaneous formation of external impedances in the containment during failure, which renders the contained aisle to be less resilient than the open aisle. The tradeoffs between PUE, OPEX, and AU are also explained.

Details

ISSN :
21563985 and 21563950
Volume :
6
Database :
OpenAIRE
Journal :
IEEE Transactions on Components, Packaging and Manufacturing Technology
Accession number :
edsair.doi...........bbbdd03619921a89167e50c3173f816f