Networks are becoming increasingly complex, making effective and efficient network management a challenge. With new and emerging technologies and the increasing adoption of cloud, users expect faster network speeds and seamless network availability. In addition, security threats are more advanced and agile. As the network incorporates several devices, tools, applications, systems, and now, work from home users, the complexity escalates. A larger scale of network amplifies the potential for failure.
Networks now need to be intelligent. Artificial intelligence (AI) and machine learning (ML) will be critical in automating network operations and optimizing end user experience. This paper looks at how AI, ML, and automation can change the existing network operations and simplify the life of network engineers. It explores how businesses can leverage AI/ML to make a true self-healing autonomous network, going beyond automation and work done by bots.
Effective network operations in a new age technology world
With millions of applications developed every day, the ‘networking’ stack plays a vital role in ensuring secure and any time access, regardless of where the applications are hosted - on premise or on the cloud.
The pressure on the network to quickly adapt to new and emerging technologies while ensuring seamless user experience only amplifies the network’s complexity and the efforts it takes to manage it. There are myriad issues that can come up on a daily basis, based on the size of the infrastructure and the design complexity. A wrong VLAN on a port or misconfiguration of a VLAN can cause multiple ports and circuits to go down as the ports may get blocked due to spanning tree. A small change, misconfiguration or incident can have a significant impact on the business.
Robust change management and incident management systems help in avoiding such issues. However, network complexity gives rise to many such scenarios, and solution doesn’t come easy as one size does not fit all. For instance, checking a layer 2 issue when there are 10 switches to manage is radically different from a landscape with 100 switches in a domain. There is no efficient solution that can help in simulating such issues, as replication of the existing configuration and the device sets adds to complexity and additional requirements.
Simplifying configuration with software-defined networking
Software-defined networking (SDN) has eased the configuration part by segregating the network control plane from the forwarding plane. It has enabled a ‘single plane of glass’, wherein all the nodes can be managed and controlled from a single controller and management dashboard. The traditional configuration is abstracted with user-friendly graphical user interface. The time spent on configuring routing policies by logging into individual routers/switches command-line interface is dramatically reduced by pushing the same from the single user interface, which, in turn pushes to all the respective nodes where the policies need to be configured.
However, even though the configuration part is simplified, the alarms or events have increased and have become more complex. Consider this - the connectivity from the user to an application as a single entity has multiple components like LAN (either wired/Wi-Fi), WAN (either traditional or software defined) plus where it is hosted -- in the data center or the cloud. Each of this is monitored by its own respective controllers, and each logs the alerts/incidents to the organization’s IT Service Management desk.
An application can slow down due to network latency or high response time from the application server itself. Earlier, these were marked as network issues. Now, leveraging new gen solutions, we are able to garner better insights into these issues and classify these in two different domains – network and applications – thereby, ensuring issues are assigned to the appropriate teams for resolution.
Similarly, an overlay tunnel would have gone down because of the underlay device's port which is either flapping or has gone down. In such scenarios, each endpoint logs its own events - the SD-WAN controller would log an incident for the tunnel going down and the underlay device would log an event for the port going down. With multiple incidents being logged from multiple controllers and the underlay devices, ITOps find it very difficult to correlate each of them. With separate teams handling issues from different components, the correlation becomes even more difficult.
The focus of the operations teams is on configuration and troubleshooting. The configuration part is now simplified with next-gen solutions, but troubleshooting has become more complex.
Is there a solution that can ease the troubleshooting or issue resolution process as well? The answer is a resounding ‘yes’.
Intelligent network with AI and ML
Networks need to be intelligent to meet the dynamic needs of the digital age. AI and ML play a critical role in enabling this by automating and infusing intelligence in network operations.