The cluster sprawl just over the horizon

In recent years, Kubernetes has exploded in popularity among organisations trying to harness the power of cloud native. And it’s been hugely successful, with project teams able to adopt new Kubernetes infrastructure – or clusters – at an incredible pace. By Tobi Knaup, Co-CEO & Co-Founder at D2iQ.

Monday, 5th October 2020 Posted 5 years ago in AI AIOps by Phil Alsop

For a few years, all has been going well but many organisations more advanced in their Kubernetes journey are now running into a problem. Cluster-sprawl. The ease of spinning up new environments with Kubernetes means disparate clusters exist across the organisation, with little standardisation and, as a result, significant waste and risk.

While the problem of sprawl in IT isn’t a new one (a similar problem occurs with virtual machines, for example), the lack of maturity in the Kubernetes market means it’s one most aren’t aware they need to act on. Understanding how to manage cluster sprawl – and how to avoid it in the first place – will be important for businesses scaling their cloud native infrastructure.

What is cluster sprawl?

Not too long ago, applications were simple and limited. Developer teams knew where an application resided because they were typically monolithic, connected to simple middleware and backend data sources, and all components were manually assigned to on-site systems which often had pet names to make them easy to remember. Today it isn’t quite so simple. To keep pace with the ever-changing digital landscape, organisations are adopting open source and cloud native technologies quickly, and that means more clusters. One team may be building a stack on one cloud provider using their favorite set of tools, while another team is building a different stack on a different cloud provider, using that team’s favorite tools. And, if they’re provisioning and using clusters with different policies, roles, and configurations, you can quickly lose sight of where those clusters exist and how they are being managed. This is cluster sprawl.

Finding yourself in the midst of cluster sprawl is not only a headache from the point of view of managing your infrastructure, it can also lead to security issues and a huge amount of waste. With no centralised governance or visibility into clusters deployed across the organisation, security controls may be inconsistent, increasing the risk of vulnerabilities within applications and making them more difficult to support – as well as lead to compliance, regulatory and IP challenges down the line.

In addition, cluster sprawl leads to waste in resources as with each new added cluster comes new overhead to manage a separate set of configurations. When it comes to patching security issues or upgrading versions, a team is doing multiple times the amount of work, deploying services and applications repeatedly within and across clusters. On top of that, all configuration and policy management, such as roles and secrets, are repeated, wasting time and creating a greater opportunity for mistakes.

Visibility brings everything together

One of the reasons Kubernetes has become so popular amongst developers is because it allows them to spin up their own environments with ease, enabling them to rapidly deploy code at scale. Exactly the issue that leads to cluster sprawl. As such, they tend to lose that flexibility when their platforms are brought into IT operations, who need consistent ways of administering, standardised user interfaces, and the ability to manage and obtain insights about their infrastructure. So, dealing with cluster sprawl (and preventing it occurring at all) requires a careful balance between developer flexibility and the need for IT governance.

The first step on this journey is visibility. Organisations need to have a clear view of all their clusters and workloads at once, so they know what they are dealing with through a centralised control-plane. Not only does this provide an understanding of where clusters are, it also allows IT teams to obtain insights and troubleshoot problems much more quickly. It will provide centralised governance to ensure consistency, security, and performance across the business’s digital footprint - key in the long term and as the number of deployments grows. With centralised visibility, mission-critical cluster information can be viewed at a glance and any issues arising within applications monitored in one place, and without valuable time and resources being lost to troubleshooting problems. Ideally, this visibility will be in place from the very start of an organisation’s Kubernetes journey however, as a nascent technology, many organisations may suddenly find themselves in the weeds. Visibility can, and should, still be achieved with a control-plane that offers a birds-eye view of the cluster landscape.

Maintaining governance

While visibility initially provides the insight into what you are dealing with, in order to ensure everything continues to run smoothly it has to be combined with the creation of policies. All organisations will have unique governance and access control requirements based on the type of business they are in, but policies allow admins to assert control over how clusters are being created and run, reducing risk in the environment. For example, organisations need to be able to govern the usage of sanctioned software and which versions can be used within which projects. This type of version control reduces the potential vulnerability surface area and also helps to more effectively deliver support by providing a catalogue of software that has been approved by the organisation for when they are needed. Policies are also critical for access control.

On the Kubernetes journey, staff may change their roles and responsibilities and that makes it difficult to manage individual logins, account privileges, assess governance risk, and perform compliance checks against industry models and in-house policies. Admins need a simple way to provision Role-Based Access Control (RBAC) that provides flexibility in configuring access as users’ roles within an organisation change. This also balances the need for developer flexibility and IT control by empowering division of labour across developers, operations and any other necessary roles across the business.

Of course, very few organisations will move to the cloud in one quick sprint and, as such, many will maintain a combination of on-premise and cloud-based infrastructure. As such, any governance framework has to extend to all aspects of cloud use and all processes must be standardised across the whole infrastructure, whether it’s on-premise or in the public cloud.

A smoother journey to cloud native

Without intervention, many organisations are going to find themselves dealing with cluster sprawl at some point on their Kubernetes journey. However, with a centralised control plane, oversight can be regained, and cluster sprawl eased. With governance over, and lifecycle management of, disparate Kubernetes clusters, admins will be able to maintain multi-cluster health, manage distributed operations, leverage operational insights and retain control of policies without interfering with the development process. Organisations that are able to contain cluster sprawl to increase the security, manageability and governance for enterprise-grade Day 2 operations will find themselves on a much smoother cloud native journey.

The cluster sprawl just over the horizon

AI is accelerating application development, but control is now the limiting factor

Why effective governance is essential in an agentic world

Agentic AI exposes enterprise governance gaps

How AI is reshaping data centres

The Mobile Technology Challenges Facing Local Government

The AI data centre buildout has a security problem

A Framework for Delivering Digital Sovereignty and a More Competitive Europe

Why your email account is the most valuable target you are overlooking