Captures the complete data journey. Binds the data elements into a record. Patching data at different points may involve application changes to tie back the elements to the record. Issue of differing granularity may surface. The data however is rich and is useful when decisions at one steps effects the other.
In all cases where fresh data will be captured one would need to provision a data store. There may perhaps be a UI change or backend code change to allow for extra information either for capture or for enabling it to be patched with the main data.
It would be recommended to have a parallel data store and not to touch any of the existing application data as the purpose for the two are quite different. They are likely to store the data at different granularities and most likely at very different points of the business process flow.
Data Grain (Raw vs Jpeg)
Depending on the automation requirements the capture points could be decided. Do you want to click stream all the webpages or just when the ads are clicked? One would have to be aware of the various transformations taking place on the data being captured in the entire process flow. Usually pre-ML Modeling phase is data greedy and the ask is usually to get whatever one can get their hands on. However, the data need to be rationalized, tied together into a notion of a record. The data capture if spread across multiple points would require catering for changing grain and turn out to be typical data integration exercise. There should ideally be very little cleansing/standardization of data at this stage. This by no means should be taken as an Extraction Transformation Load (ETL)exercise. Unless the transformations are part of the business process flow, the data should be captured raw and the curation be left as a pre-ML Modelling activity.
The management of data would be like any data intensive app. An owner, data steward would need to be identified. Governance process, measuring data quality, addition of data elements and their impact would be part of its management. One would choose a data store reflective of the volume and data type involved (Audioetc.). The data lifecycle would need to be managed since any change in business and context would either need the historical data to be patched or discarded. Modelling or automation requirements would necessitate the data persistence – Historical vs Current, low grain vs aggregate, Raw vs cleansed. The model versions would in turn dictate the data versioning as well. Data used for Robotic Process Automation (RPA) purpose should be extracted and persisted for a longer term. The data store should not be utilized for any business reporting and the purpose should solely be for leveraging for Automation.
Much like any other business application the data journey needs to be mapped out and held very dear to allow it to be managed as an explicit component.
Operational data implies that one would have more control over the data quality or predicting its quality. The first and most time-consuming step is the data curation, which must be done keeping the automation technique in mind.The human decision being automated can either be accelerated implying probabilistic model or eradicated in case of deterministic models. Data needs to be checked for patterns to consider fitment for both MLand RPA based automation. In case of RPA fitment, the data merely holds various well understood and finite variations that could be turned into a dictionary/dataset to then look up and achieve the desired result.
The application would need to cater for the feedback or a loopback from the ML/RPA module. Usually one cannot capture all variations and there is no guarantee that variations will not fan out further and hence even in RPA cases there would need to be a fall back option if an unexpected variation is encountered, requiring user input, which is then logged for further learnings.
Every human decision made within the software system flow is a potential automation opportunity. With emerging technologies, one can transform an existing application into a smart one. The key is recognizing, tapping and exploiting the abundant data that is implicit within an application.
Businesses can benefit from ‘faster to realize’ automation initiatives with a well understood landscape to build upon. With minimal investment, higher ROI is realizable subject to automation scope and impact. Businesses should not seek to replace manual intervention but in present day context strive to accelerate human decision making and even enhancing and enriching the user experience. Both monetary and experiential gains are possible within the applications’ current boundary (to start with…).