Automating the zombieroundup Where are the weapons?

Part 1. By Andy Lawrence, Vice President of Research for Datacentre Technologies and Eco-Efficient IT at 451 Research.

  • 10 years ago Posted in

FOR YEARS NOW, it has been apparent that many datacentres are home to large numbers of servers that do nothing, or very little. This is extraordinarily wasteful, and in any other industry would be considered absurd – it might be compared with an airline flying its aircraft with mannequins sitting in the last ten rows of seats.

The servers were originally put there for good reasons, but when the applications were retired or virtualised elsewhere, or moved to the cloud, or the business unit merged, or the development team finished their work, no one arranged for the removal or redeployment of the servers.

These servers continue to use valuable rack space; have their connectivity maintained and tested, and their software patched; are scanned for viruses; are regularly backed up; and use precious and conditioned datacentre power, supported by costly infrastructure (such as a UPS and a generator).

Often, server software licenses are paid on a full stack of software as well as applications. Evidence suggests that zombie servers are even quite common in colocation datacentres, paid by customers’ finance departments with no understanding of why these services are needed.

Zombie or comatose servers are a huge waste of resources – energy, maintenance, software and space – but they should be yesterday’s problem. Modern IT is more than capable of identifying and eliminating underused equipment – even more so in virtualised and cloud environments. So far, the adoption of tools and processes for dealing with comatose servers is piecemeal – and the reason lies in the deceptive complexity of dealing with the issue. Ultimately, a combination of the software-driven datacentre and cloud provisioning will likely solve the problem.

In theory, of course, but not always in practice, there are business and technical processes for preventing and locating zombie servers. Many organizations will deny there is an issue. But as the Uptime Institute (a division of The 451 Group) has shown through its Server Roundup initiative, the problem is widespread and far from solved.

In its 2014 annual survey, 24% of companies surveyed think that at least one in 10 of their servers, and possibly many more, are comatose. But Uptime consultants suspect that those who think they have fewer zombies are probably being complacent; they have estimated that 20% of racked servers do no real work. If extrapolated on an industry-wide scale, this amounts to billions of dollars wasted.

As an example of what can be achieved, Barclays Bank won recognition from Uptime for saving $10m in two years by decommissioning more than 9,000 servers that consumed 2.5MW of power and took up nearly 600 racks; it also freed up 20,000 network ports and 3,000 storage area network ports.

In a similar program presented at the Uptime Symposium in 2013, AOL calculated it had saved $5.05m in power, software licenses and maintenance. In a smaller case study presented by award winner Viridity (a software provider later acquired by Schneider) in 2011, a motor manufacturer retired 813 servers, saving $600,000. Sun Life Financial also reported that it had retired 441 servers in 2014. (Not all of these servers are necessarily comatose, but they are candidates for replacement due to underutilization or overprovisioning at the physical level.)

Weapons for zombie hunting
So how can software help? Software vendors have been addressing this problem for some time, and several vendors have, at various points, said their tools are singularly effective at identifying zombies.

One such example is NightWatchman Server, from the UK company 1E, which measured the power consumption of servers and compared this against usage patterns, but the product was not widely successful, and is not actively sold independently. Another supplier, Viridity, modelled server power use with its EnergyCenter product, and by mapping this against server workloads, could identify mismatches. The technology, however, was acquired by Schneider Electric, which has integrated it into its StruxureWare for Data Centers DCIM suite; Schneider will certainly say that its tools can be helpful in identifying zombies, but it is no longer a key marketing message, nor a strong reason why customers buy it.

Why are these tools not more widely used or, indeed, more widely offered? There are several reasons for this. First, in order for DCIM (datacentre infrastructure management) tools to be effective in identifying and removing zombie servers, a complicated set of tools is needed, which extend beyond core DCIM, and these ideally need to be integrated with each other. This means that the costs are high and the effort may seem disproportionate to the challenge.
Most organizations do not perceive their zombie problem to be big enough to justify this investment – especially if they think they can solve the problem with a concerted one-off push using manual methods and by the better application of new or existing auditing and other processes. This has been the case for some of the companies that have succeeded in eliminating servers.

A second reason is that many believe zombie servers are the responsibility of the IT teams, and not the responsibility of the facilities teams that manage space, infrastructure and power. While all parts of the organization benefit from removing the equipment, the biggest beneficiaries – the facilities teams – have limited visibility or responsibility.

Finally, there is a belief (and a good argument) that virtualization and cloud technologies will separate the physical and virtual layers, and that, ultimately, the issues of overprovisioning will move to cloud orchestration and capacity management tools, where they will be more easily solved. This may be the case in the long term, but evidence suggests that this is some way off and the problem of comatose servers will still occur to some degree.

One thing is clear – it is possible to eliminate zombie servers by adopting good processes and using the right tools. However, the choice of systems and how they interrelate is far from straightforward. In the second part of this two-part article, in next month’s magazine, we will describe the tools required, identify some suppliers and discuss how they may be integrated.