The compaction of information technology equipment and simultaneous increases in processor power consumption are creating challenges for Data Centre managers in ensuring adequate distribution of cool air, removal of hot air and sufficient cooling capacity. This article provides a checklist for assessing potential problems that can adversely affect the cooling environment within a Data Centre.
Introduction
There are significant benefits from the compaction of technical equipment and simultaneous advances in processor power. However, this has also created potential challenges for those responsible for delivering and maintaining proper mission-critical environments.
While the overall total power and cooling capacity designed for a data centre may be adequate, the distribution of cool air to the right areas may not. When more compact IT equipment is housed densely within a single cabinet, or when Data Centre managers contemplate large-scale deployments with multiple racks filled with ultracompact blade servers, the increased power required and heat dissipated must be addressed.
Blade servers take up far less space than traditional rack-mounted servers and offer more processing ability while consuming less power per server. However, they dramatically increase heat density. In designing the cooling system of a Data Centre the objective is to create an unobstructed path from the source of the cooled air to the inlet positions of the servers. Likewise, a clear path needs to be created from the rear exhaust of the servers to the return air duct of the air-conditioning unit.
There are, however, a number of factors that can adversely impact this objective. In order to ascertain that there is a problem or potential problem with the cooling infrastructure of a Data Centre, certain checks and measurements must be carried out. This audit will determine the health of the Data Centre in order to avoid temperature-related electronic equipment failure. They can also be used to evaluate the availability of adequate cooling capacity for the future. The current status should be assessed and a baseline established to ensure that subsequent corrective actions result in improvements. This article shows how to identify potential cooling problems in existing Data Centres that will affect the total cooling capacity, the cooling density capacity, and the operating efficiency of a Data Centre.
1. Capacity check
Remembering that each Watt of IT power requires 1 Watt of cooling, the first step toward providing adequate cooling is to verify that the capacity of the cooling system matches the current and planned power load. The typical cooling system is comprised of a CRAC (Computer Room Air Conditioner) to deliver the cooled air to the room and a unit mounted externally to reject the heat to atmosphere. Newer forms of CRAC units are appearing on the market that can be positioned closer (or even inside) data racks in very high-density situations. In some cases, the cooling system may have been oversized to accommodate a projected future heat load. Over sizing the cooling system leads to undesirable energy consumption that can be avoided.
Verify the capacity of the cooling system by finding the model nomenclature on or inside each CRAC unit.
Refer to the manufacturer technical data for capacity values. CRAC unit manufacturers rate system capacity based on the EAT (entering air temperature) and humidity control level. The controller on each unit will display the EAT and relative humidity. Using the technical data, note the sensible cooling capacity for each CRAC.
Likewise, the capacity of the external heat rejection equipment should be of equal or greater capacity than all the CRACs in the room. In smaller packaged systems the internal and external components are often acquired together from the same manufacturer. In larger systems the heat rejection equipment may have been acquired separately from a different manufacturer. In either case they are most likely sized
appropriately, however an outside contractor should be able to verify this. If the CRAC capacity and heat rejection equipment capacity are different, take the lower rated component for this exercise. (If in doubt when taking measurements, contact the manufacturer or supplier.) This will give you the theoretical maximum cooling capacity of the data center. It will be seen later in this article that there are a number of factors that can considerably reduce this maximum. The calculated maximum capacity must then be compared with the heat load requirement of the data centre.
2. Check CRAC units
If CRAC units in a data centre do not work together in a coordinated fashion they are likely to fall short of their cooling capacity and incur a higher operating cost. CRAC units normally operate in four modes: cooling, heating, humidification and dehumidification. While two of these conditions may occur at the same time (i.e., cooling and dehumidification), all systems within a defined area (4-5 units adjacent to one another)
should always be operating in the same mode. Uncoordinated CRAC units operating in opposing modes (i.e. dehumidifying and humidifying), called “demand fighting”, leads to wasted operating costs and a reduction in the cooling capacity. CRAC units should be tested to ensure that measured temperatures (supply & return) and humidity readings are consistent with design values.
Demand fighting can have drastic effects on the efficiency of the CRAC system. If not addressed, this problem can result in a 20-30% reduction in efficiency which in the best case results in wasted operating costs and worst case results in downtime due to insufficient cooling capacity. Operation of the system within lower limits of the relative humidity design parameters should be considered for efficiency and cost savings. A slight change in set point toward the lower end of the range can have a dramatic effect on the heat removal capacity and reduction in humidifier run time. Changing the relative humidity set point from 50% to 45% results in a significant operational cost savings.
3. Check and test main cooling circuits
This section requires an understanding of basic air condition equipment. Get your maintenance company or an independent HVAC consultant to check the condition of the chillers (where applicable), pumping systems and primary cooling loops. Ensure that all valves are operating correctly.
Chilled water cooling circuit: The condition of the chilled water loop supply to the CRACs will directly affect the ability of the CRAC to supply proper conditioned air to the room or raised floor plenum. To check the supply temperature, contact your maintenance company or an independent HVAC consultant. As a quick check, the temperature of the piping supply to the CRAC can be used. Using a laser thermometer, measure the supply pipe surface temperature to the CRAC unit. In some cases, gauges may be installed inline with the piping, displaying temperature of the water supply.
Chilled water piping will be insulated from the air stream in order to prevent condensation on the pipe surface. For the most accurate measurement, peel back a section of the insulation and take the measurement directly on the surface of the pipe. If this is not possible, a small section of piping is likely exposed inside the CRAC unit at the inlet to the cooling coil on the left or right side of the coil.
Condenser water circuit (water and glycol cooled): Water and glycol cooled systems utilized a condenser in the CRAC for transferring heat from the CRAC to the water circuit. Condenser water piping will likely not be insulated due to the warmer temperatures of the supply water. Measure the supply pipe surface temperature at the entry point to the CRAC unit. Direct expansion (DX) systems should be checked to ensure that they are fully charged with the proper amount of refrigerant.
Air cooled refrigerant piping: As with water and glycol cooled CRACs, refrigerant charge should be checked for the proper levels. Contact your maintenance company or an independent HVAC consultant to check the condition of refrigerant piping, outdoor heat exchangers and refrigerant charge.
4. Record aisle temperatures
By recording the temperature at various locations between rows of racks, a temperature profile is created which helps diagnose potential cooling problems and ensures that cool air is supplied to critical areas. If the aisles of racks are not properly positioned hot spots can occur in various locations and may cause multiple equipment failures. Section 9 below describes and illustrates a best practice for rack layouts. Take room temperatures at strategic positions within the aisles of the data centre. These measuring positions should generally be centered between equipment rows and spaced at approximately one point at every fourth rack position.
Aisle temperature measurement points should be 5 feet (1.5 metres) above the floor. When more sophisticated means of measuring the aisle temperatures are not available this should be considered a minimal measurement. These temperatures should be recorded and compared with the IT equipment manufacturers’ recommended inlet temperatures. When the recommended inlet temperatures of IT equipment are not available, 68-75°F (20-25°C) should be used in accordance to the ASHRAE standard.
Temperatures outside this tolerance can lead to a reduction in system performance, reduced equipment life and unexpected downtime. Note: All the above checks and tests should be carried out quarterly. Temperature checks should be carried out over a 48-hour period during each test to record maximum and minimum levels.
5. Record rack temperatures
Poor air distribution to the front of a rack can cause the hot exhaust air from the equipment to recirculate back into the intakes. This causes some equipment, typically those mounted toward the top of the rack, to overheat and shutdown or fail. This step is to verify that the bulk inlet temperatures in the rack are adequate for the equipment installed. Take and record temperatures at the geometric centre of the rack front at
bottom, middle and top. When the rack is not fully populated with equipment, measure inlet temperatures at the geometric centre of each piece of equipment. Refer to the guidelines in section 2 for acceptable inlet temperatures. Temperatures not within the guidelines represent a cooling problem for that monitoring point. Monitoring points should be 2 inches (50 mm) off the face of the rack equipment. Monitoring can be accomplished with thermocouples connected to a data collection device. Monitoring points may also be measured by using a laser thermometer for quick verification of temperatures as a minimal method.
6. Check air velocity from floor grilles
It is important to understand that the cooling capacity of the cabinet is directly related to the airflow volume delivery stated in CFM (cubic feet per minute). IT equipment is designed to raise the temperature of the supply air by 20-30°F (11-17°C). Using the equation for heat removal, the amount of airflow required at a given temperature rise can be quickly computed.
7. Visual inspection of enclosures
Unused vertical space within rack enclosures causes the hot air output from equipment to take a “short circuit” back to the inlet of the equipment. This unrestricted cycling of hot air causes the equipment to heat up unnecessarily which can lead to equipment damage or downtime. Visually examine each rack. Are there any gaps in the u positions? Are CRT monitors being used? Are blanking panels installed in these racks? Is an excess of cabling impeding the airflow? If there are visible gaps in the U space positions, blanking panels are not installed or there is excessive cabling in the rear of the rack, then airflow within the rack will not be optimal.
8. Check air paths below floor
Check sub-floors for cleanliness and / or obstructions. Any dirt and dust present below the raised floor will be blown up through floor grills and will be drawn into the IT equipment. Floor obstructions such as network and power cables will obstruct airflow and have a negative effect on the cooling supply to the racks. Subsequent addition of racks and servers will result in the installation of more power and network cabling.
Often, when servers and racks are moved or replaced, the redundant cabling is left beneath the floor. A visual inspection of the floor surface should be conducted when a raised floor is utilised for air distribution. Voids, gaps and missing floor tiles have a damaging effect on the static pressure of the floor plenum. The ability to maintain airflow rates from perforated floor tiles will be diminished with the presence of unsealed areas on the raised flooring. Missing floor tiles should be replaced. The floor should consist of solid or perforated floor tiles in every section of the grid. Holes in the raised flooring tiles used for cabling access should be sealed using brush strips or other cable access products. Measurements conducted show that 50-80% of available cold air escapes prematurely through unsealed cable openings.
9. Check aisle and floor tile arrangement
With few exceptions, most rack-mounted servers are designed to draw air in at the front and exhaust at the back. With all the racks facing the same way in a row, the hot air from row one is exhausted into the aisle where it will mix with supply or room air and then enter into the front of the racks in row two. As air passes through each consecutive row the IT equipment is subjected to hotter intake air. If all the rows have the cabinets arranged so that the inlets of the servers face the same direction equipment malfunction is imminent. Configuring the rack in a hot aisle / cold aisle configuration will separate the exhaust air from the server inlets. This will allow the cold supply air from the floor tiles to enter into the cabinets with less mixing. Improper location of these vents can cause CRAC air to mix with hot exhaust air before reaching the load equipment, giving rise to the cascade of performance problems and costs described previously. Poorly located delivery or return vents are very common and can erase almost all of the benefit of a hot aisle / cold aisle design.
10. Check placement of CRAC units
The position of the CRAC units relative the aisle is important for air distribution. Depending on the air distribution architecture, CRAC units should be placed perpendicular to the aisle on either a cold or hot aisle. When using a raised floor for distribution, the CRAC units should be placed at the end of the hot aisles. The hot air return path to the CRAC is directly down the aisle without pulling air over the tops of aisles where the opportunity for air to be re-circulated is increased. With less mixing of the hot air in the room, the capacity of the CRAC units will be increased by warmer return air temperatures. This could potentially lead to a requirement for fewer units in the room. When a slab floor is used, the CRAC should be placed at the end of the cold aisle. This will distribute the supply air to the front of the cabinets. Some mixing will exist in this configuration and it should be implemented only when low power densities per rack exist.
Conclusion
Routine checks of a data centre’s cooling system can identify potential cooling problems early on to help prevent downtime. Changes in power consumption, IT refreshes and growth can change the amount of heat produced in the data centre. Regular health checks will most likely identify the impact of these changes before they become a major issue. Achieving the proper environment for a given power density can be accomplished by addressing the problems identified through the health checks provided in this article.