If data is the fuel of the digital age, then the set of technologies that process this data in real-time are the engines that drive the cogs of the modern enterprise. From delivering engaging customer experiences, to keeping the cars of the future on the road, tracking our packages while they are on their way, helping us stay healthy and putting the ‘smart’ in our smart cities and smart homes, ‘stream processing’ technology is increasingly playing a fundamental role behind the scenes, in transforming many aspects of our modern lifestyle. While the internet native companies, having to deal with the scale and data diversity, had this capability from the get go, we now see traditional enterprises taking steps to incorporate this as a core capability in their technology landscape.
Companies follow multiple paths in adopting this technology. Most large organizations already have a batch analytics infrastructure based on the hadoop ecosystem or commercial analytics product suite. This infrastructure is being augmented with frameworks and tools that support stream processing. In other organizations, the middleware platform groups are leveraging, enhancing their existing event processing and CEP capabilities. Others, who are launching IoT programs, adopt IoT platforms that support both integration as well as real-time processing at scale, for the data that flow in from the sensors and devices.
Whichever path an organization takes and the activities they pursue to drive adoption, in our experience of building large scale streaming analytics solutions, we see that there are a few common factors that when overlooked, cause considerable time and cost overruns.
Availability of events and master data
Integration efforts can make up to 60% of the total time and cost of implementing real-time insights solutions. If your enterprise applications, ESB and integration platforms already publish all the events that are required for the use case, great. On the contrary, log files and database changes have to be scrapped to identify a surface business events. The good news is that the last couple of years have seen maturing of frameworks that ease Change Data Capture and log file processing. In addition, events have to be combined with master data and reference data to generate impactful inferences. The availability of this data, within the latency that real-time insights demand, is critical. Distributed In-Memory Data Grids play a role here.
Discrete vs. continuous event processing
It is important to understand whether the use case needs discrete event processing, where rules and processing logic are applied to every incoming event or a continuous stream processing that mostly deals with aggregates, not individual events. The programming model for these two scenarios are different and hence it is important to ensure the tool or platform supports the programming model. Typical examples for the former style are real-time customer engagement, fraud detection. While for the latter, the examples include predictive maintenance, transportation command center, etc.
Testing and monitoring the event processing infrastructure
Given the scale at which these solutions operate (dealing with terabytes of events per day), operations teams can do with all the tools they can get that will give them visibility into the processing and performance of the solution. A scalable logging and reporting solution has to be a part of the supporting ecosystem.
Self-service rule definition for business users
To take advantage of the rapid changes in market conditions, business users should be able to define, modify and test rules and processing logic with the help of an easy-to-use interface. A web interface that allows a user to define decision tables is a common example.
Closed loop from model building to action
For use cases that require predictive analytics, plan for an architecture that directly supports embedding the models generated out of analytics tools. In addition, capture enough data to track and tune the performance of the models.