While various industry players are investing in data management systems, several companies globally, face an increasing challenge of system failures and outages resulting in unhappy customers, brand damage and revenue loss. To illustrate, in March 2013, the regulatory body of U.K. fined a multinational banking and financial services holding company for failing to keep updated information on client objectives, risk profile and risk appetite. The regulatory body found that the reason attributed to this was the failure of the bank’s data processing systems to allow sufficient client information to be processed and not so much a human failure. Such failures can have long term impact on companies with loss of potential customers even in the future. Out of several factors that can be attributed; failures and delays during data warehouse batch runs are an important area of focus for data managers today. This paper proposes a reference architecture to improve operational IT analytics with predictive capabilities to overcome such challenges. The proposed solution is vendor agnostic and provides consistent experience across a range of Data Integration and Business Intelligence tools.
Data warehouse and application service loss can adversely impact businesses in many ways like delay in financial closure leading to liability, penalty for non-compliance in delivering data on time and down time for information workers. On average, businesses lose between $84,000 and $108,000 (US) for every hour of IT system downtime, according to various estimates from the studies performed by industry analyst firms .Application problems are the single largest source of downtime, causing 30% of annual downtime hours and 32% of downtime cost, on an average .The leading cause of application downtime is software failure (36% of cost on average), followed by human error (22%) and third such rising costs are attributed to complex data warehouse environments that exist in companies today. The dynamic nature and complexity of the data warehouse
The increased complexity of the data warehouse environment intensifies the task of data managers manifolds. The ask of troubleshooting under such a dynamic environment becomes tougher. Additionally, a host of other issues such as human error, hardware failure, and natural disasters can disrupt data warehouse availability.
Over several years, innovative solution providers have tried to address these problems. There are multiple analytical solutions (Operations Management (OM) tools) existing today which help data managers monitor and measure the data warehouse environment. These solutions are catered to by features within Data Warehouse DI or BI tools and also within infra monitoring schedule management tools. The solutions help identify issues, isolate causes and resolve outages. The solutions also support performance management through IT infrastructure support.
However, these features which are restricted only to each particular tool, do not take into account the business process aspects and at best provide insights into what has gone wrong (instead of finding what would go wrong). These tools do not have the feature to constantly and proactively monitor system behaviour and provide real time insights into system performance and capacity trends. A typical solution today would analyse historical data to develop a certain trend in performance, after the failure has occurred. Whereas data managers are increasingly demanding capabilities to predict a failure before it occurs. This has created a need for predictive analytics driven proactive monitoring of data warehouse processes.
Current OM Tools – Limitations
As discussed above, a host of OM tools and solutions are used by data managers today. However, these solutions have the following limitations:
Current OM Tools – Potential Improvements
The limitations of the existing solutions, outlined previously can be plugged in through “predictive insight on batch analytics”. The various capabilities of such a solution are outlined below:
Proposed Reference Architecture
Based on the characteristics outlined previously, below is the proposed reference architecture:
The numbered areas in the above reference architecture are described below:
1. Data Acquisition and Conversion
2. Data Integration
3. Predictive Analytics
4. Alert Interface
Several benefits can accrue to the data managers through the new approach:
Reduced Risks: The new architectural approach de-risks batch operations through predictability, early warnings and fast issue resolution resulting in lower downtime of operations and improved reliability of data. In addition, infrastructure capacity utilisation forecasts.
Optimised Costs: The proposed architecture further improves productivity by reducing manual monitoring and troubleshooting efforts resulting in reduced Total Cost of Ownership, lower cost of batch optimisation exercise and improved productivity of the platform due to reduced downtime and improved system availability.
Standardised Operations: Data managers can avail standardised batch operation processes across technology stacks leading to – i) A tool-agnostic approach - the alert management, predictive insights, diagnostics and trend reports are uniform irrespective of the tool used. This also helps reduce system upgrade costs. ii) Simplified operational process and iii) Improved standardisation of IT operations overall
This proposed architecture catering to improving operational IT analytics is vendor agnostic and provides consistent experience across a range of data integration and business intelligence tools. It provides predictive insights that are business process aware and helps achieve higher availability and better reliability of data environments. It combines ability to monitor infra resources, integration tools as well as job schedules. Data managers will find the proposed solution cost effective in addition to helping reduce overall operational risks and design standardised operations. It will help organisations maintain high customer centricity through improved operational processes.