Preventing unplanned application downtime with machine learning

By Paul Scarrott, EMEA director at Nimble Storage.

  • 7 years ago Posted in
With the rise in the consumerisation of technology and society’s desperate ‘need for speed’, employees expect immediate access to their applications and data, at any time, and without interruption. But reality rarely meets such expectations.
 
Recent research commissioned by Nimble Storage revealed that nearly two-thirds of Brits believe that the speed of their work applications significantly impacts their performance. While often the time wasted waiting for apps to load is discredited as mere second, this time quickly adds up.
 
And as organisations increasingly progress digital transformation policies, with the new projects and processes driven by hundreds or thousands of applications, these wasted moments can have a dramatic impact on a business’s output. Indeed, currently employees experience on average four software delays each day at work, each lasting around seven seconds, which, when measured against the UK’s average hourly wage, costs the British economy a massive ?744,235,520 every year.
 
The problem of reacting
 
These issues are caused by an app-data gap. This is created when there are delivery delays between the data and the application, which then prevents information from being instantly available and causes processes to slow down. This ultimately results in a performance bottleneck, which not only impacts employee productivity, but prevents the entire business from operating effectively.
 
And so great responsibility befalls the IT team to remediate such issues, for when application performance is impacted by the app-data gap, the team must launch into a reactive mode. Best case scenario, complaints from users start a troubleshooting process which may result in a vicious circle of finger pointing between the VM, networking, storage and development teams. Worst case scenario, the down time leads to a fire drill with an ‘all hands on deck’ approach and sleepless nights.
 
This is a dangerous cyber for IT leaders. With the team constantly reacting to application breakdowns, little opportunity is left to deliver proactive initiatives in partnership with the business. And as a result, IT is perceived as a barrier to productivity, rather than championed for as increasingly competitiveness.
 
Barriers to data velocity 
 
The application breakdowns which cause these delays pose a significant challenge to IT teams. Real forensics work must be conducted by IT leaders if they are to unravel the maze of issues throughout the oraganisation’s infrastructure that contribute to delays in the delivery of data to applications.
 
Storage is normally the first suspect when identifying the culprit for slow app-data delivery. However, more frequently the app-data gap is a result of complexity across the entire data centre.
 
Recent research conducted by Nimble Storage into the origins of applications breakdowns in more than 7,500 companies found that more than half of problems (54%) arose from issues with the interoperability, configuration, and not using best practices that were unrelated to storage. From these roots, the common chain of events prevailed where application breakdowns led to the creation of an app-data gap, which then disrupted the data delivery to user applications.
 
One fundamental reason for this is in the way that data centre infrastructure is purchased. Whether sold from a single vendor or from multiple vendors, most data centre components are designed independently. Indeed, too frequently these “best of breed” technologies are originally built by start-ups seeking to optimise individual functions, rather than to achieve overall infrastructure interoperability.
 
And with these start-ups often acquired by large IT vendors to build up their portfolio, interoperability issues between adjacent products from the same company are not uncommon. And once you add in server and desktop virtualisation, the resulting infrastructure stack is diverse and complex.
 
Data driven optimisation
 
Optimising across the entire data centre requires IT teams to analyse interactions between components. Machine learning and data science are now increasingly being deployed to harness the big data gathered around the data centre to help remediate these issues. Deploying such solutions, IT teams can:
 
·         Analyse the performance metrics gathered from a large volume of ‘healthy’ (high performing) environments. Creating this baseline helps identify poor performance early, before users perceive an impact.
 
·         Correlate a number of elements across the infrastructure to identify the root cause. Using the time of an event and comparing sensors across the environment, IT teams can identify cause and effect relationships.
 
·         Prevent problems arising through highlighting interoperability issues between the different releases for different components based on the results from other environments, and then recommending actions to avoid conflicts.
 
·         Use machine learning to evolve software releases to optimise performance and availability from correlations across the stack.  
 
In flagging potential issues and abnormal behaviour before it arises, recommending steps to bring the environment to ‘peak health’, and using machine learning to incorporate in ongoing releases, IT teams can deliver optimal application performance and availability.
 
Having rapid and undisrupted applications has become essential to every business process: from enhancing product development and improving customer interaction, to running the back off. And with digital transformation now driving an ever greater number of applications and software driven services, closing the app-data gap and reducing the barriers to data velocity has become a business-critical task.
By Ram Chakravarti, chief technology officer, BMC Software.
Anders Brejner, Investment Director and Enabling Solutions Lead at Circularity Capital, discusses...
By Andy Baillie, VP, UK&I at Semarchy.
By Paul Gampe, Chief Technology Officer, Console Connect.
By Aaron Partouche, Innovation Director, Colt Technology Services.