While various industry players are investing in data management systems, several companies globally, face an increasing challenge of system failures and outages resulting in unhappy customers, brand damage and revenue loss. To illustrate, in March 2013, the regulatory body of U.K. fined a multinational banking and financial services holding company for failing to keep updated information on client objectives, risk profile and risk appetite. The regulatory body found that the reason attributed to this was the failure of the bank’s data processing systems to allow sufficient client information to be processed and not so much a human failure. Such failures can have long term impact on companies with loss of potential customers even in the future. Out of several factors that can be attributed; failures and delays during data warehouse batch runs are an important area of focus for data managers today. This paper proposes a reference architecture to improve operational IT analytics with predictive capabilities to overcome such challenges. The proposed solution is vendor agnostic and provides consistent experience across a range of Data Integration and Business Intelligence tools.
Data warehouse and application service loss can adversely impact businesses in many ways like delay in financial closure leading to liability, penalty for non-compliance in delivering data on time and down time for information workers. On average, businesses lose between $84,000 and $108,000 (US) for every hour of IT system downtime, according to various estimates from the studies performed by industry analyst firms .Application problems are the single largest source of downtime, causing 30% of annual downtime hours and 32% of downtime cost, on an average .The leading cause of application downtime is software failure (36% of cost on average), followed by human error (22%) and third such rising costs are attributed to complex data warehouse environments that exist in companies today. The dynamic nature and complexity of the data warehouse
- Exponential growth of data
- Concurrent source and schema updates
- Decrease in batch load window and increase in real time processing
- Increase in data warehouse usage and demand for greater availability
The increased complexity of the data warehouse environment intensifies the task of data managers manifolds. The ask of troubleshooting under such a dynamic environment becomes tougher. Additionally, a host of other issues such as human error, hardware failure, and natural disasters can disrupt data warehouse availability.
Over several years, innovative solution providers have tried to address these problems. There are multiple analytical solutions (Operations Management (OM) tools) existing today which help data managers monitor and measure the data warehouse environment. These solutions are catered to by features within Data Warehouse DI or BI tools and also within infra monitoring schedule management tools. The solutions help identify issues, isolate causes and resolve outages. The solutions also support performance management through IT infrastructure support.
However, these features which are restricted only to each particular tool, do not take into account the business process aspects and at best provide insights into what has gone wrong (instead of finding what would go wrong). These tools do not have the feature to constantly and proactively monitor system behaviour and provide real time insights into system performance and capacity trends. A typical solution today would analyse historical data to develop a certain trend in performance, after the failure has occurred. Whereas data managers are increasingly demanding capabilities to predict a failure before it occurs. This has created a need for predictive analytics driven proactive monitoring of data warehouse processes.
Current OM Tools – Limitations
As discussed above, a host of OM tools and solutions are used by data managers today. However, these solutions have the following limitations:
- Issue identification is reactive and not proactive – The current solutions largely work on a reactive approach by performing root cause analysis once the system failure has occurred. However, data managers require capabilities of the solutions to predict a system failure before it has occurred.
- Inconsistent view of business process – The data environment is ever growing and has multiple systems, devices, services, and applications. The current solutions do not provide a holistic view of business process failures which data managers require.
- Inconsistent experience across tools – The current solutions are limited in their capability to provide a consistent experience in integration,monitoring and data visualisation, thus limiting data managers’ ability to garner a comprehensive view of the data warehouse environment
Current OM Tools – Potential Improvements
The limitations of the existing solutions, outlined previously can be plugged in through “predictive insight on batch analytics”. The various capabilities of such a solution are outlined below:
- Predictive monitoring capabilities: The solution should be able to proactively monitor the data warehouse environment based on historical as well as real time data processing.
- Failure trend monitoring: The solution should in specific monitor the failure trend and predict probable failures during data processing.
- Troubleshooting in real time: The solution should proactively troubleshoot issues from a business process data centre level view to a job / sub-task level view facilitating possible corrections before the system failure occurs.
- Comprehensive view and monitoring of business processes: The solution should provide insights on data warehouse execution environment parameters in real time to identify impact on business deliverables which further entails that the solution should have a holistic view of the impact on various business deliverables.
- Integration of tools and technologies in the data warehouse environment: The solution should effectively monitor and capture the environment parameters across various data warehouse tools and technologies to provide a complete view of the data warehouse environment. In summary, the solution should be tool agnostic.
- Consistent visualisation: The solution should provide consistent user interface and outputs across different data integration, reporting and scheduling tools. Such a feature should enable data managers have a uniform view of the otherwise heterogeneous data
Proposed Reference Architecture
Based on the characteristics outlined previously, below is the proposed reference architecture: