Tina Huang is its Founder and CTO Transporta platform that bridges the gap between developers and operations with automated human-machine workflows.
Judging by the ubiquitous adoption of the cloud, most organizations have concluded that the trade-off between increased complexity and the benefits of microservices and cloud infrastructure is worth it. In fact, Gartner predicts that nearly two-thirds (65.9%) of application software spending will go toward cloud technologies in 2025, up from 57.7% in 2022.
However, the current approach to managing complexity is ineffective. Organizations have mountains of data and tools, but little or no useful, maintainable documentation and no place to find what they need. These barriers have impacted reliability, with 62.9% of organizations reporting an increase in service incidents affecting customers. The situation is also causing talented people to burn out as rapidly increasing complexity outstrips ability.
Knowledge capture and the extension of automation are the fundamental mechanisms that enable significant progress in managing today’s increasing operational complexity. But current approaches are inadequate and too difficult—they must be incremental and affordable.
An explosion of best-in-breed tools and services
Gartner reported that the SaaS industry has grown to $171.9 billion. However, according to our company’s recent report, only 24.7% say all of their tools are integrated through one tool or platform. It’s becoming increasingly difficult to create cohesive toolchains that promote collaboration, visibility, and actionability across teams.
While the best cloud tools and services may address a subset of complexity, today’s professionals often need to further customize tools to fit their environment or adopt additional tools to fill capability gaps. For example, Terraform was built to manage the complexity of infrastructure provisioning. However, organizations are now looking to new SaaS tools to provide self-service infrastructure to people who lack the specialized skills needed to use tools like Terraform.
Ideally, the tools would arrive tailored to the requirements, but, even so, people would have to continue to integrate the tool, find ways to integrate with other tools, and develop tool-specific processes. Many organizations have legacy systems that don’t work well with new tools. Even newer tools often don’t provide the code-level customization needed to fully integrate into the toolchain. Site reliability engineers (SREs) are often tasked with supporting multiple platforms that live in silos, from infrastructure platforms like AWS and Azure to ticketing platforms like Jira and ServiceNow.
Insufficient documentation and clarity
Lack of immediate access to key how, who, when and where information slows down business processes ranging from incident response to infrastructure provisioning. In fact, “poor documentation” ranks as the top barrier to automation, according to our company’s report.
Documentation challenges can be even more pronounced when combined with information anxiety. For example, when an SRE receives an alert, there may be existing information about that event. However, the lack of visibility of where this information resides forces SREs to collect information manually or without it. Reliance on institutional knowledge prevents organizations from scaling processes and teams. The lack of a searchable and updateable knowledge base increases the likelihood that individuals will perform functions inconsistently, multiplying the risks associated with change.
Today’s engineers are drowning in manual labor. The complexity of running a data center controlled by low-level hardware and configuration settings has been replaced by low-level and complex configuration settings controlled by APIs and software.
In the absence of automation, SREs do massive amounts of low-level manual work that hinders their ability to work on higher-value projects. This is especially true when it comes to recording human actions, with over half of SREs (56.5%) saying they manually enter data into an ITSM system or another system or record. Manual tasks significantly impact businesses, slowing down processes and forcing organizations to increase headcount to handle the volume of work.
The power of approachable automation
To offset these challenges, teams need an incremental approach to automation that keeps people in the loop and allows business teams to better manage complexity.
Sitting in the middle of systems, accessible automation cultivates an environment and consists of a simple container or activity, used to collect and group relevant information and automation of various types. These include:
• Checklists: Simple to-do lists that allow for quick tracking of steps, providing the basis for capturing and evolving knowledge.
• Minimum Viable Automations: Affordable automation encourages small automations that have a significant impact on work, such as analyzing data, creating tickets, or configuring systems. In the beginning, much of the process is manual, and people bridge the gap between automation. Over time they are assembled into larger automations once the context is better understood.
• Keeping people informed: Minimum viable automation is not orchestrated by a workflow system but by people who understand when they are needed, guided by checklists.
• Automated history: When an activity is used for any business process, a timeline is automatically compiled to detail what happened. Over time, this history provides a framework for understanding when an automation is successful, how automations stack up, and how automations can be made more powerful.
When these elements are integrated with existing tools, teams find that many frustrations are addressed.
• Teams starting their automation journey can use the activities and checklists to gradually build a process that works for them.
• Minimum Viable Automations can significantly reduce effort and delay with simple automations that enable many actions to be triggered from collaboration environments like Slack.
• Activities become hubs to which alerts are routed, providing the knowledge reuse and automation required by documentation and runbooks.
• APM, observability and ITSM tools are better built to fit the operations process.
Increasing operational complexity can create a crisis if not addressed systematically. Throwing tools and people at the problem may help in the short term, but ultimately doesn’t solve the underlying problems. As a developer who has had to own and operate services, I know what it’s like to be in a reactive mode, with piles of menial work thrown at you from all sides. Using affordable automation, business teams will be able to free themselves from the clouds of stress and get back to innovation. In the third installment of this Automation Playbook series, I’ll delve into the basics of affordable automation.
The Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Am I eligible?