Infrastructure as Code (IaC) continues to grow and has significant potential benefits for data centre owners and managers, but there can be a big gap between the theory and successful deployment. Based on experience both in-house as a DevOps engineer in some of the world’s largest companies (and now in my vendor-side role), I have witnessed first-hand what does (or does not) work.
To help organisations shorten the route to IaC success, I would like to share some of those best practices, as well as some pitfalls to avoid. But first, it helps to clarify what IaC is: using technology to control software, hardware, network components, data storage, and operating systems. Via machine-readable definition files, IaC manages and provisions data centre resources instead of physical hardware.
The consequential advantages can be powerful. IT is deployed faster, and it becomes easier for end users to access what they require, thus reducing the number of requests to IT operations teams. Costs decrease due to less dependency on physical hardware, plus it is easier to update and patch IT systems, even remote sites. Security and compliance can be drastically improved, ensuring the IT infrastructure is continuously in a ‘desired’ state.
However, successful IaC is as much about people and processes as it is about applying technology. The best IaC projects work with current teams and embed themselves in their processes. Involve all stakeholders — existing teams, consultants, vendors, and other relevant in-house teams in all IaC explorations and plans right from the start. Invite everyone to meetings and regular stand-ups. This will be the best method to discover a scope and focus with everyone’s buy-in.
Focus and scope
Also, it is best to avoid the temptation to chase big goals quickly to see an impressive ROI. Decision makers and other colleagues might be pressuring the IT team to see a result to justify the budget commitment. However, computer configuration is complex, so starting small with some of the more achievable wins that still demonstrate ROI makes sense. At the same time, show the future vision but emphasise that it must be an incremental journey.
Once the scope and focus have been agreed upon, it is important to set the minimal acceptance criteria (an equivalent to a minimal-viable-product) for code to be delivered to production and, therefore, which tests it must first pass. In addition, be wary of scope creep, such as expanding on the use cases for IaC: as implementation shows its benefits, other teams may want to add
extra use cases. While it is important to keep an open mind, it is also essential to stay focused on the end goal or vision.
Of course, introducing IaC into a greenfield site is easier than a brownfield or heritage one, especially if it has a lot of existing configuration drift (when ongoing, often small changes or misalignments to policies grow over time). Furthermore, many organisations are a result of mergers and acquisitions, so there could be multiple configuration standards with which to deal.
One pattern of IaC adoption is to incrementally build automation levels in heritage services to create confidence, for instance, by installing software agents on all nodes within the existing infrastructure. This will provide better visibility of the current state, making it easier to manage roll-out progressively and provide valuable data for the configuration management (CMDB) system.
Next, look at orchestration, and there are probably common scripts and tasks across the heritage IT estate that various teams perform manually or semi-automatically. Using an orchestrator, these scripts and tasks can be wrapped appropriately to give greater access control and management, plus the option to be triggered on events. In this way, there is no need to deconstruct working scripts into a new language and can reduce the toil and risk of using scripts.
Look at baseline configuration and find something non-negotiable to commence with, such as versions of application agents that are critical to upgrade and manage to prevent vulnerabilities from occurring. This leads to then implementing of the tools required for automated audit reporting and compliance remediation.
To cloud or not to cloud
Implementing IaC often coincides with a move to the cloud, and it can potentially deliver significant benefits, such as greater flexibility, reduced costs (though sometimes not the case in reality), access to cloud-specific technologies, and reduced burden on IT operations. For instance, the ease of using availability zones for compilers to minimise the risk of data centre failures is a complex feature to implement in private data centres.
However, once the legacy private data centre environment has been examined before IaC implementation, it is vital to consider how this could be different in the cloud. There are two common mistakes I have seen in cloud adoption, the first of which is a wholesale copy of all infrastructure, processes and components — in their working stage in the private data centre — to the public cloud without first understanding what is a suitable fit.
Consequently, this can lead to unexpected bills because the infrastructure is inflexible, without considering that the public cloud is based on a rental model. Also, many solutions that work well in a private data centre are better implemented as cloud-native solutions in the public cloud.
The second problem is the opposite: leaving everything behind, which may be because teams are frustrated with inefficient internal processes and delivery times. However, that means losing the hard-won lessons learned in private data centres, including best audit, configuration, and testing practices. Hence, teams must start from scratch, with issues inevitably found in this new fractured environment.
Cost is also a significant determining factor when choosing cloud, but also whether an organisation is looking for the benefits of cloud-native features, such as the flexibility of availability sets and load balancers, which can allow easy horizontal scaling of infrastructure. The
public cloud might also be used to jump-start a new, more cloud-native way of working, so having separate cloud and private infrastructure is more logical.
Agile and platform engineering
I am also an advocate of Agile and a platform engineering approach to enhance IaC's success. Good Agile sprint practices include having epics, which are then broken down into a small number of focused objectives. Each task should be possible to complete in a regular time-boxed sprint cycle (commonly two weeks). Once the sprint finishes, these new features can be demonstrated to stakeholders. Retrospectives at the end of sprints help ensure that what is being worked on is still effective and that issues are being resolved. This also helps prevent developers from working in isolation.
Platform engineering is another hot industry topic, and — once IaC has been implemented — it makes a major contribution to ongoing IaC success, making it easier for users to access what they need while preventing IT operations teams from being overloaded by requests. Platform engineering means having a platform team responsible for the management of tooling, workflows, and a self-service platform for end users. This platform should be treated as a product and its users as customers, ensuring their needs are met and that the platform continues to evolve to meet those needs.
IaC has much to offer, but while there are some impressive success stories, there are also some problematic or even failed IaC projects. So, understanding the fundamentals — what the aim is, what the current IT environment is, what to take or leave behind, choice of cloud, and how it will be managed once deployed — must all be considered before starting a new IaC transformation. And, above all, involve everyone: successful IaC is a team effort.