At the Edge: Data Management Functionalities for the Intelligent Edge Apps
Fast Data - The Real-time Enterprise
Data ingestion rates are continuous and need to be processed either as streams or complex events as digital apps are real-time based
The rate of data flow has increased manifolds due to social media, Machine data (CDRs), IoT, etc. A new class of applications i.e. Listening Platforms are being deployed in enterprises that allow the organizations to read streams of social interactions and glean insights from them. This means, reading from wirehose of social data providers such as Gnip and DataSift for a specific digital app around a specific conversation or topic.
Also, the real-time decision making is in need of an enterprise to necessitate the creation of new patterns of DI pipelines inside the enterprise that handle:
a. Collection, reading, aggregation and repor ting of KPIs and metrics for data stream monitoring (e.g. machine health monitoring, social object monitoring, network operations monitoring). Pattern focused on fast ingestion of data and repor ting
b. Data transformation of streaming data, based on business rules and ingestion into data lakes or data warehouses (e.g.Telecom CDR sessions) pattern focused on maintaining the state of incoming data and transform data for complete state and if required deliver a new stream of data
c. Real-time data processing for critical events based on model scoring and interfaced with decision management applications to provide real-time responses (e.g. Next best offers based on in-store location data received from iBeacon, personalized micro-campaign launch for top 10% customers whose calls were dropped more than normal in the last ‘n’ minutes etc.) Pattern focused on interfacing with operational applications to support high request-response pipelines
d. Complex event processing across multiple data streams - monitoring for pre-defined complex events over multiple streams of data and by querying the streams continuously to build high-level metrics and detect anomaly events (e.g. fraud detection based on a defined complex event pattern derived from multiple streams of data, machine failure threshold identification based machine data streams) pattern focused on monitoring complex events across streams that individual streams alone can’t compute
There are many technologies that can address these fast data requirements and they are not restricted to open source realm of Esper, Storm and Kafka, etc. Technologies such as Stream Base, SQLStream, IBM Info Sphere Streams, WSO2, Oracle CEP, Microsoft Stream Insight, Amazon Kinesis, Informatica Rule Point, VoltDB can provide the foundational components required to build these fast data pipeline processing capabilities.
Digital Applications - Age of Polyglot Storage
Data that comes for analysis doesn’t come in rows and columns anymore. Log Files, JSON, and XML etc., stress the existing IM capabilities
In the past, enterprise data for all the applications it neatly into relational tables. Digital apps are built using new set of data management technologies decided upon the factors such as availability, requirements and ease/rapidness of development i.e. NOSQL database such as Aero Spike, MongoDB, Cassandra, etc. Digital Applications - Age of Polyglot Storage on a multi-model storage engine like Foundation DB, Arango DB. A hypothetical digital app may store its application data in more than one database like customer information on a relational DB, catalogue information on document database, click-stream and mobile session information data in Cassandra, etc.
Once these digital apps are deployed, rich interaction around a certain context and business process is now available to be integrated with information in the data warehouse for completely new set of insights. From an IM perspective the data supply chain has to be re-configured with new capabilities to manage this integration
- New DI patterns that can connect, read and decipher structure (e.g. JSON, XML etc.) to retrieve information from these new generation databases in real-time and integrate with existing data in DWH
- Ability to decode structure (metadata) and content (profiling metrics) from these semi and unstructured data and develop metadata lineage
- Business process oriented end-to-end metadata lineage to address specific business processes across Core and Edge. E.g. claims submission, customer onboarding, BCBS 239 in banking, etc.
- Ontology build-out to understand and map attributes of rapidly arriving files without human intervention
- Machine learning driven metadata context generation for high velocity data ingestion workloads
- Ability to govern data creation, sharing in the digital apps based on data governance policies with respect to dat privacy and compliance
- Ability to provide data-quality-as-a-service for data standardization to digital apps in a high-performance setup using concepts of Data Virtualization
- Ability to provide entity resolution as a service, across multi-channel digital apps so that the entity resolution efforts are not duplicated in the enterprise
- Ability to enhance master data entities and relationships with entities that are socially available information enabling connection analytics