Enterprise Artificial Intelligence: Building Trusted AI in the Sovereign Cloud
The decade of responsible intelligence has begun — are you ready?
Enterprise AI is hitting a wall: Public models aren’t trained on your business data, but you can’t hand over your organization's proprietary information to a public system. The definitive roadmap for this new reality is Enterprise Artificial Intelligence: Building Trusted AI in the Sovereign Cloud, a new book written by OpenText leaders. Listen now to learn why this book is a must for organizations looking to move from isolated AI experiments to enterprise-grade deployments.
Learn more here: https://www.opentext.com/resources/enterprise-artificial-intelligence-building-trusted-ai
Enterprise Artificial Intelligence: Building Trusted AI in the Sovereign Cloud
Chapter 11: The Future of EAI and Operations Management
Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.
Explore the evolution of Enterprise AI and operations management, and how understanding it is essential for maximizing the benefits of Agentic AI deployments.
Chapter 11. The Future of EAI and Operations Management Often overlooked, operations management plays a vital role within the enterprise by supporting all business units and driving business growth. As operations practices have transformed over time, AI has become pivotal in this realm. In this chapter, we will explore the evolution of enterprise AI and operations management and how understanding it is essential for maximizing the benefits of agentic AI deployments. Earlier in this book, we examined the convergence of trusted data in AI in delivering innovative experiences in operationalizing agentic AI. We also considered its effects on the workforce and within the organization and strategies for managing and maintaining an AI workforce. Clear and specific job descriptions and measurable KPIs for both human and digital workforces are essential for success. Assigning the digital workforce simple, logic, and small, well-defined tasks allow them to operate efficiently. When multiple digital agents work together, they can handle more complex tasks. Meanwhile, humans retain oversight, enabling them to quickly identify and address issues or anomalies. This approach is like conducting an orchestra. When each instrument plays its part, the conductor can easily detect if one is out of tune, ensuring a harmonious performance. The pages ahead focus on all aspects of EAI-driven operations management, essentially how to keep your infrastructure, platforms, data, human workforce, and now digital agents in Harmony 24-7. Discover how a global leader in healthcare technology was able to improve their operations by implementing a sophisticated automated predictive maintenance platform that leverages advanced AI and machine learning algorithms. Case study A Global Leader in Healthcare. Our predictive maintenance system built on vast amounts of data and advanced AI models allows us to detect and address potential issues before they impact clinical operations. This improves the reliability of our equipment and enhances patient outcomes and satisfaction. Principal Architect Service. A health technology organization faced mounting challenges maintaining its advanced medical imaging systems, MRI and CT scanners that are vital to patient diagnosis and care. A single MRI unit can log over a million events and produce 200,000 sensor readings each day, spanning tens of thousands of data points. Yet, in such a complex and tightly regulated environment, medical devices take years to develop and certify. While they generate vast amounts of operational data, that data was never structured to enable predictive maintenance. The transition was not only technical, but also operational, because the organization had to rethink existing processes. It required the integration of massive datasets from medical devices, advanced analytics, and machine learning models to predict and prevent potential failures before they could disrupt patient care. Improving patient health care is the organization's top priority. Using AI, they are able to process complex data sets efficiently and identify patterns that indicate imminent issues, allowing them to take preventative action well in advance. The company has integrated more than 200 data streams, real-time logs, error reports, and performance metrics into a single data warehouse holding over a decade of history and 1.5 petabytes of continuously refreshed information. Predictive models now mine this vast data set to spot anomalies early, enabling proactive maintenance and reducing costly equipment downtime by 30%. The system has led to 50% of CT service cases being diagnosed and resolved remotely, and an 84% first-time fix rate for on-site equipment issues, enhancing the company's service efficiency and improving overall patient care outcomes. Defining the future of AGI beyond the technical horizon. The future of operations management will fundamentally hinge on the adoption of AI and its transformative impact on operational experiences. Several key factors are crucial. One, transitioning from reactive to autonomous operations. Being reactive is no longer an option in the operations domain. Cyber threats, as well as network and technology complexity, demand 24-7 autonomous operations that detect and act on issues before they impact customers. Two, evolution of operations management. Over the past two decades, operations has undergone major transformations, becoming far more efficient through advances in data collection, management, and analysis. Three, core elements of AI and operations. EAI operations depend on five key components data utilization, intelligence formulation, decision-making processes, human intervention within the operational loop, and a comprehensive feedback life cycle. This holistic approach enables operations teams to navigate the transition from manual to automated methodologies effectively. Four, application of AI in network and security operations. Agentic AI has the potential to transform network and security operations. Later in this chapter, we'll explore practical use cases that show how Agentic AI enhances operational effectiveness. Five, transformational impacts on operations metrics. The transitions described bring deep changes to core operations metrics such as mean time to restore, MTTR, service availability levels, outage numbers, incident numbers, and the ability to achieve five nines availability when technology and services are up and running 99.999% of the time. The gold standard in operations. Adopting AI will bring significant changes and benefits to each of these areas. Now that we've introduced them, we can look at each of the five transitions in detail. 1. From reactive to autonomous operations. The size and scale of networks and enterprise operations have made manual and reactive monitoring things of the past. Historically, operations teams responded to issues as they arose, leading to a predominantly reactive operational posture. However, in recent years, operations leaders have been striving to enhance their team's capabilities, transitioning toward a more proactive approach. This involves not only understanding potential incidents, but also identifying and mitigating these issues before they escalate into significant outages. Modern monitoring tools have advanced significantly, enabling the detection of minimal changes in operations, latency, or performance metrics related to specific services or applications. These tools are designed to provide early warnings of potential problems. Similar to fire alarms, they allow teams to act before a minor issue becomes a widespread disruption. Despite these advancements, a substantial segment of operational practices remains heavily reactive. The evolution towards autonomous operations marks a significant shift in this paradigm. With the integration of EAI, operations teams can attain a more granular understanding of system dynamics, including the complex correlations between various activities and events that contribute to incidents. This deeper insight is crucial for not only enhancing the proactive capabilities of teams, but also for implementing self-healing mechanisms within operations. Such autonomous capabilities enable systems to automatically address certain issues, thus alleviating the operational workload. As a result, operations teams can redirect their focus to higher priority tasks and preventative measures that are essential for mitigating the occurrence of incidents. The adoption of EAI provides a transformative opportunity to redefine how operations are managed, shifting from a primarily reactive stance to a more strategic and proactive framework that is essential in today's dynamic operational environment. 2. The evolution of operations management. Operations management has traditionally been associated with teams working in dark operational centers, constantly monitoring screens for critical alerts. However, the reality of operations management has significantly evolved. In its early stages, operations relied heavily on manual monitoring and troubleshooting of critical events. The introduction of automation through scripting represented a significant advancement, enabling teams to automate specific tasks, enhance repeatability, and reduce human error. With the advent of artificial intelligence and the adoption of agentic AI, operations management can now leverage more sophisticated analytical capabilities. EAI tools can effectively assemble and analyze vast amounts of data, facilitating root cause analysis by identifying anomalies and detecting changes in data patterns. This process enables operators to correlate different data sets, which has become increasingly essential in identifying root causes of issues. The ability to quickly pinpoint these causes is critical to minimizing operational downtime. Furthermore, EAI enhances predictive analytics within operations. By examining trends, activities, timelines, and the relationships between various symptoms, causes, and effects, AI facilitates the forecasting of operational challenges and outcomes. This marks a significant shift in how operations management is approached today and suggests a future wherein operations management will be radically different. The evolving landscape also necessitates changes in the skill sets required for operations management. There is a growing trend of developers joining operations teams in roles such as site reliability engineers, SREs. These professionals utilize their technical expertise to address incidents, identify root causes, and develop solutions to these issues in real time, thereby preventing issues from reoccurring. A blend of domain expertise and program proficiency, augmented by AI, is becoming increasingly pervasive across operations. Today's operations center also serves as the primary training ground for applying enterprise AI and cutting-edge technologies to solve high-stakes challenges. Organizations that successfully position their operations centers as critical hubs of innovation and talent development are better equipped to attract, train, and retain top performers. These centers become the proving grounds where future leaders hone their development and problem-solving skills, ensuring a robust pipeline of talent ready to take on leadership roles within the organization. As operations management continues to evolve, it serves as a valuable training ground for senior developers in product and technology companies. These individuals gain first-hand experience with incidents affecting customers and operations, allowing them to apply this knowledge in product development roles. Ultimately, this understanding of advanced operations management will significantly inform their future contributions to the field. Operations, once viewed as an antiquated round-the-clock function focused solely on monitoring issues, has evolved into the central nervous system of the organization, a true center of excellence for innovation. A South African retailer is doing just this, analyzing data on how its teams use AI to best adapt its processes and optimize performance. Case study leading retailer of consumer goods in Africa. With AI integrated in our product testing processes, it takes literally two seconds to understand where we are per sprint, per release, per feature level, for every application we test in our omnichannel space. SQA Manager. With thousands of stores across South Africa and in seven other countries, this leading retailer of consumer goods in Africa manages a massive portfolio of digital omnichannel applications. The company also has a strong online shopping presence for its grocery, home, and clothing businesses, including a mobile app for ultra-fast local delivery of groceries. Keeping physical stores replenished and digital applications running smoothly is vital in a highly competitive market. Under constant time-to-market pressure, the retailer integrated AI into their process to speed up product testing and release. The AI-enabled acceleration of test case creation has opened the way for the retailer to adopt in-sprint and in-release automation. This has enabled them to automate much earlier in its two-week sprints, driving a massive increase in automation coverage from about 65% to about 95%. In just eight weeks across almost 20 applications tested in OmniChannel, the retailer has completed four to five releases a week. They've cut cycle times by 43%, increased release frequency 60 times, and improved test coverage with faster insights to speed time to market. With performance outcomes like these, the retailer is investigating the potential of applying AI across additional business functions. 3. Core components of AI-driven operations. As organizations evolve their operations and adopt advanced AI capabilities, several foundational layers must be addressed. A. The data layer. Building a unified and accessible data layer is critical. Traditional operations often suffered from fragmented data, spread across multiple systems, service management tools, and organizational silos. Bringing these data sets together and enabling AI agents to operate across them dramatically improves visibility and detection capabilities. For example, integrating security and network operations allows security logs to be analyzed alongside network data. This broader data set enables teams to identify and resolve issues or incidents more effectively by seeing the full picture in real time. B, the intelligence layer. This is where language models, machine learning, and generative and agentic AI platforms operate. Within this layer, correlations are drawn, knowledge is built, and AI applications begin to work alongside human analysts. Generative AI enhances situational understanding and supports faster, more informed responses within the operations center. Agentic AI applications enable autonomous operations. C. The decision layer. Analyzing data is one thing, making decisions based on it is another. In the early stages of AI adoption, most operations centers have preferred to have AI present insights while leaving the final decisions to human operators. As maturity grows, organizations are beginning to allow AI systems to make certain predefined or low-risk decisions autonomously. Over time, as AI models and governance structures evolve, these systems will handle more complex, repeatable decisions. D. Human in the loop and continuous feedback. Even in AI-driven environments, humans remain essential. Their evolving role centers on oversight, contextual understanding, strategy, high-stakes judgment calls, and AI system improvement. Equally critical is the feedback loop. As incidents are resolved and root cases identified, feeding this information back into the system ensures continuous learning, which in turn makes recovery faster each time or reduces MTTR. This minimizes repeat incidents and strengthens consistency in response. When operations teams address root causes effectively, incident volume drops, freeing time for proactive, innovative AI work that enhances resilience and efficiency across the operations center. Working together, these key components enable the AI-driven operations required to effectively support Agentic AI. 4. Agentic AI in Network and Security Operations. The role of the Network Operations Center operator is one of high stress and tight deadlines and often involves searching for a needle in a haystack. Success is measured by how much time a task takes. Before the introduction of Agentic AI, NOC operators would need to find correlation or similar issues in vast databases so they could locate repeat issues in data that would help solve the issue at hand. The example below demonstrates the benefits of partnership with an Agentic AI application. The agent can help search the database to better pinpoint past issues, glean learnings, and then help the operator resolve the current issue not in hours, but minutes. 5. Transformational impacts on enterprise operations. Consider this. It's 2 47 AM when a global e-commerce NOC detects database latency spikes, hammering their payment processing. The monitoring system automatically fires the alert, but agentic AI changes everything. Traditional NOC administrator solo 4 plus hours. Operator manually reviews 20 plus alerts, queries multiple log systems individually, searches memory for similar incidents, manual correlation across data sources, trial and error troubleshooting. Time to resolution. AI agent plus NOC administrator 14 minutes. AI correlates all events automatically, searches 847 historical incidents instantly, identifies root calls with 89% confidence, presents ranked resolution options, human approves, AI executes and monitors. Clarify. The agent immediately performs automated event correlation across the flood of alerts, distinguishing between the actual cause, a batch job conflict, and all the symptom events, latency spikes, timeout errors, cue backups. Instead of 20 confusing alerts, the operator sees one clear picture. Database conflict detected, likely root calls identified. Analyze. The AI agent queries security logs, network telemetry, application traces, and change management databases, while simultaneously searching through years of historical incident data in a vector database. In seconds, it finds 847 similar patterns and flags an 89% match to two past incidents. Based on how the symptoms, timing, and system behaviors line up, it surfaces the smoking gun, change ticket CHG-45209, modified a batch job schedule that's now running during peak transaction times. Diagnose. The AI doesn't just find correlations, it pinpoints the root calls by cross-referencing the batch job timing with transaction volume patterns and shows exactly when things went wrong. It presents the diagnosis. Batch job user underscore analytics underscore aggregation underscore V2. Conflicting with real-time payment processing, same pattern as incidents 3421 and 11203. Resolve. Here's where the human comes back in. The AI agent offers three resolution options based on what has actually worked before, ranked by success rate and risk. The operator reviews them, considers the context, it's 252 AM outside maintenance windows, and makes the call, suspend the batch job, and set up automatic rollbacks if things go sideways. The AI executes the approved actions while maintaining continuous monitoring. Total time from alert to resolution? 14 minutes. Without agentic AI, the same process would have taken over four hours. The agentic AI handled the heavy lifting at each stage, including automated correlation, intelligent analysis, precise diagnosis, and guided resolution. But the human stayed in control where it mattered most, approving the fix. That drove a meaningful reduction in MTTR. Not only that, but every incident handled this way enriches the historical database, making future responses even sharper. Every operations center is measured through a range of key performance indicators, KPIs. Traditional metrics typically focus on service availability, with many organizations striving for 5.9s, 99.999%, uptime, as well as incident volume, including the number and severity of major and minor incidents. Other core measures include MTTR, Recovery Time Objective, RTO, and Recovery Point Objective, RPO. While these metrics are important, they remain largely reactive. They assess how well a team responds to disruptions rather than how effectively it prevents them. As AI becomes embedded in operations, organizations must begin to incorporate proactive metrics. For example, how many incidents were detected or mitigated by AI before they escalated? How quickly did agentic AI systems identify early indicators of failure or compromise? What percentage of total incidents are being managed autonomously by AI agents? Measuring the impact of agentic AI separately helps organizations understand the true transformational effect of adoption. Success should be reflected in measurable improvements such as a reduction in the total number of incidents, particularly major ones, faster time to identify and restore from incidents, greater operational resilience driven by predictive monitoring and automated response. Generative and agentic AI bring the capabilities to find the needle in the haystack, uncovering root causes quickly, and enabling earlier intervention. The evolution of operations must progress hand in hand with the adoption of AI. Too often, organizations focus on deploying AI models while neglecting to modernize their operations centers. A team relying on manual monitoring and Traditional workflows cannot keep pace with an enterprise moving toward intelligent AI-augmented operations. Moreover, as threat actors increasingly use AI to automate and amplify cyber attacks, security operations centers must evolve in parallel, leveraging AI to maintain parity and defend at machine speed. This transformation is not optional. It is essential for maintaining competitiveness, resilience, and trust in the era of AI-driven enterprise operations. In this new era, the ability to effectively manage organizational data is what will ultimately empower you to get the most out of AI. Leveraging enterprise AI to manage, govern, and continually optimize data operations is essential. As operations centers evolve into nerve centers of innovation and resilience, data stewardship and strategy must remain at the core of every executive agenda. Enterprises that treat data not only as a resource, but as a strategic asset and apply AI to maintain its health and availability will be best positioned to unlock the full transformative potential of AI at scale. A major manufacturer is unlocking the power of data for insights into its processes to cut costs and reduce its reliance on manual work in the following case study. Case study North Star Blue Scope Steel A subsidiary of Australia-based Blue Scope, North Star Blue Scope Steel, produces and supplies hot rolled steel bands for coil processors, cold rolled strip producers, pipe and tubers, original equipment manufacturers, and steel service centers. Founded in 1997, the company is the largest scrap steel recycler in Ohio, recycling nearly 1.5 million tons of scrap steel every year. North Star Blue Scope Steel needed a more efficient tool to help it more accurately understand its costing data and workflow so the company could use it to engage with customers, conduct market-based analysis, and build purchasing breakdowns. The technology would have to eliminate the manually intensive process and collect data automatically from a variety of sources, including databases in the company's electric arc furnaces, EAF, allowing it to reallocate staff and save on resources, all while better meeting customers' needs. The company opted to use intelligent data and analytics to automatically access, blend, explore, and analyze data. The solution allows North Star Blue Scope still to apply algorithms to extracted information to generate a final monthly report, reducing the reliance on manual work. Using data and analytics, the company can compare month-to-month data to analyze how events such as plant delays and bottlenecking might affect profitability. Embracing the IoT, the company hopes to integrate analytics into data points coming directly from its instruments to analyze electricity consumption, weather patterns, material usage, and still prices for a better idea of future needs and sales potential. The Fast Five Download. One, drive the shift to autonomous operations. Lead your organization to move beyond reactive monitoring by investing in AI-powered, self-healing systems that proactively prevent incidents and optimize performance. 2. Champion human AI collaboration. Ensure your teams are equipped to work alongside AI, establish governance frameworks, empower human oversight, and create continuous feedback loops to maximize both safety and innovation. 3. Elevate operations centers to innovation hubs. Transform your operations center from a traditional support function into a strategic engine for enterprise innovation and talent development, making it a focal point for AI adoption and experimentation. 4. Make AI adoption a board level priority. Treat the integration of AI into operations as a critical competitive differentiator. Allocate executive attention and resources to accelerate adoption, strengthen security, and build organizational trust. Five, redefine success metrics for the AI era. Move beyond traditional reactive KPIs. Implement new metrics that track AI's impact, such as incidents prevented, response speed, and autonomous actions to accurately measure value and drive continuous improvement.