Chaos Engineering: Establishing resilience by embracing chaos

Chaos Engineering as a discipline experiments with the system in production to build confidence in its capability to respond and sustain in turbulent conditions. It helps to identify faults and gaps in the systems. One of the basic principles of chaos engineering is to introduce hypothesis and experiments. The nature and scale of the hypothesis are small while closer to the live systems in terms of functionalities. The primary objective of chaos engineering is to generate new unknown information about a system and its behavior pattern as a whole while reacting to a catastrophe.

Why chaos engineering is important

Ecosystems are becoming more complex and complicated in the digital age. The service outage in current scenarios is costly, and the impact is multi-fold. The traditional ways and means of testing are not enough to guarantee service availability with next-gen systems. Hence, there is a need for an innovative approach to verify and validate availability in an automated manner.

Chaos engineering addresses these requirements. With the approach of identifying the individual component level failures along with ecosystem-level failures, chaos engineering helps to minimize the impact of outages.

Chaos engineering is becoming a norm for disaster recovery testing.

The use cases and benefits

It is recommended to run chaos engineering continuously in the environment. Some of the key benefits of chaos engineering are:

Ability to simulate unpredictable user behavior intersecting with unforeseeable events to avoid service interruptions
Ability to manage complex, complicated systems in a structured manner
Leveraging the benefits of automation
Building knowledge-based on system failures

Key resiliency areas addressed by chaos engineering

Chaos engineering addresses resiliency of key components in any organization (See Figure 1).

Chaos Engineering: Establishing resilience by embracing chaos

Figure 1: Key resiliency areas addressed by chaos engineering

How to adopt chaos engineering

Chaos engineering should be adopted and implemented in a systematic manner. Key principles to enable adoption of chaos engineering are:

Create and manage hypothesis and experiments around steady state behavior
Create varied tests based on real-world incidences
Carry out tests in live systems
Define and create automated process for tests to run continuously
Start with small to simulate and minimize impact
Learn the process for designing chaos engineering experiments
Define and follow Chaos Maturity Model

How to execute chaos engineering in an automated way

The overall process of implementation of chaos engineering in an automated way is described in Figure 2

Figure 2: Chaos engineering implementation process

Application of chaos engineering

Chaos engineering is not limited to IT/software development. It is widely used across industries cutting across locations such as tech-savvy companies, financial companies, educational institutions, media and communication, service providers etc.

Key categories of chaos engineering adoption:

HR department
Data Center
Business processes
Hands and fleet support
Disaster Recovery team preparedness
Health care and life sciences
To simulate natural disasters such as earthquakes, floods etc.

Conclusion

Chaos engineering is enabling industries to increase service availability and resiliency. This proven approach can be applied to any segment as part of disaster management. Currently, Wipro is using chaos engineering as part of its IPs under the SDx umbrella. One of the use cases is AppAnywhere in which chaos engineering is used to provide resiliency and self-healing capabilities.

To know more about AppAnywhere or Wipro’s chaos engineering capabilities, connect with us at sdx-cis@wipro.com

Chaos Engineering: Establishing

resilience by embracing chaos

About the Author

Contact Wipro