In December 2013, Datacenter Dynamics magazine published figures showing that one minute’s data centre downtime costs on average £4,800. As a typical incident lasts 86 minutes, the associated incident cost reaches £412,800, plus the loss of reputation and customers.
Accordingly, data centre operators regard UPS failures as completely unacceptable – but how can they eliminate the outage risk as far as realistically possible? While partly dependent on the UPS’s underlying design and construction quality, the answer owes much to how well the system is monitored and maintained after installation.
These overall, ‘full-lifetime’ considerations underline the vital importance of selecting the right UPS supplier; one that can not only offer balanced advice and UPS hardware that’s well-suited to an application, but also design and reliably implement a professional service plan. The reward is a UPS system that operates reliably and safely for 15 years or more.
A well-organised plan starts by considering the UPS elements most likely to fail, and then evolves a strategy to mitigate these failure risks.
UPS ‘failure points’
Bad batteries cause typically 20% of UPS failures. Manufacturers’ estimates are unreliable, because they are typically based on ideal circumstances – steady 20°C operating conditions, and zero working cycles. However, real-world operation involves both cycling and elevated temperature operation, while battery life is typically reduced by 50% for every 10°C operating temperature rise.
UPSs contain typically 12 or more electrolytic capacitors, with age rates influenced by electrical and thermal stress. As for batteries, service life rating is just a guideline and cannot be used for accurate planning, while capacitor failure often drives UPSs into bypass mode.
Fans can also cause an unwanted switch to bypass if they overheat; their lifespan depends on electrical and mechanical quality and specifications. Other failure causes include lightning damage, vibration, blocked air filters causing overheating, input power filters causing cable and choke overheating, and contact failures due to deposit build-up.
A strategic and tactical service plan
An effective service plan encompasses a strategy of regular monitoring and preventative maintenance for the UPS and its batteries, and tactical responses to emergencies when they occur.
Battery monitoring is essential to maximise battery life – which in any case is finite – and prevent UPS failures. A battery self-test should be run every 30 – 60 days, with more specialist testing every six months, as well as impedance testing. Batteries should be renewed after 80% of their theoretical life, and a monitoring system should be deployed to catch unexpected problems early.
Preventative maintenance visits should include physical inspection of the batteries, capacitors and fans. Capacitors should be replaced every 5 – 9 years, and fans every 3 – 4 years. Other components should also be inspected for signs of damage or blocked filters. Additionally, firmware upgrades can help to optimise performance.
The tactical response should comprise trained technicians available 24/7, and based close enough to ensure arrival on site within agreed SLA response times. These personnel should be backed with immediate access to a comprehensive local spare parts inventory, and more in-depth technical support if required.
The service contract should meet each installation’s particular circumstances; the load’s type and size, and its business-criticality. Is 24/7 coverage needed 365 days a year? Also, should parts and labour be included, or treated as chargeable extras? In practice, battery and capacitor parts and labour are typically excluded.
In KUP’s experience, 70% of their customers choose a fully comprehensive 24/7 service contract with a guaranteed 6 hours’ emergency response. UPS parts and labour, apart from batteries and capacitors, are included, together with two preventative maintenance visits per year scheduled during normal working hours.