Looking into the Future of the Classic Data Historian

The incredible velocity of the world’s industrial output has inspired ongoing advancements in operational technology. Both digital transformation and Industry 4.0 are accelerating the demand for industrial systems like DCS, SCADA, RTUs, PLCs, sensors, edge devices, data historians, industrial robots, and 3D printers, and these systems are now generating unprecedented and escalating production volumes, velocities, and efficiencies. However, the data captured by these systems must be properly managed, cleaned, processed, stored, routed, secured, and leveraged.

In the past, data historians -- which are time-series databases -- were located on the premise, next to the industrial system where they captured and stored sensor data. However, to take advantage of artificial intelligence and big data analytics applications, which are mostly available in cloud environments, this data now needs to be moved, stored, and made searchable in a cloud-based database.

This IIoT evolution is not new to the market -- the concept has existed within the industry for years. However, the demand for the convergence of IT and OT is increasing, and time-series data historians play a major role here. Industrial time series data will give way to complex adaptive systems and multi-processing. The future belongs to nanotech, cloud computing, wireless everything, artificial intelligence (AI)-based machine learning (ML), Big Data, and complex adaptive systems.

Time-series data refers to data that changes with time (e.g., digital sensor readings). A time-series database keeps data values and timestamps that are collected over time with the unique ability to consistently store large amounts of data over time. Today, time-series data and related technologies make up the fastest growing segment in the market. As a result, we have witnessed many investments and acquisitions recently like AVEVA’s OSIsoft acquisition for $5 billion on August 25, 2020.

Industrial time-series data has gravity, and researchers anticipate that it will grow at a rate of more than 6.90% from 2020 to 2025. Leading public and private cloud platforms, software startups, data lake vendors, control systems, SCADA companies, top-tier visionary investors, and venture capital firms are all rushing to be the vendor/partner/investor of record for this time-series data storage business across the world. In the near future, competition will fuel more innovation, growing the time-series data historian market as a whole.

The future of data historians requires a new way of thinking

The data historian has evolved from simply storing data to acting as a data infrastructure. This means that data collection, storage, or visualization individually -- or even together – doesn’t make a complete industrial data management system valuable. Industry4.0 evolution requires a complete infrastructure solution capable of integration, archiving, asset modeling, notifications, visualizing, analysis, and many other analyzing features. Data historian, MES, and ERP all might become part of a DataLake (where everything is stored), which we can call the unified namespace. However, a DataLake will still not be able to supply data at the right time and with the right context for time-stamped process data with proper data integrity.

The future of data historians encompasses much more than a traditional operational data historian -- data historian capabilities are just a sub-function. A more apt name for a system like this would be an “Industrial Data Management System” or something similar. Operational data historians are expensive, challenging to work with, and typically outdated with limited analysis and visualization capabilities. Additionally, not all data historians are horizontally scalable, and the system faces a performance penalty during large archived data retrievals. It is also very difficult to contextualize sensor data with other metadata for data historians, which is very lengthy and costly for customers to work with. An operational data historian provides the benefit of interfacing with data collection points with buffer capability (store and forward), as well as industrial system compatibility such as DCS, SCADA, and OPC.

The future of the Industrial Data Management System relies on these key technologies:

Digital Twin: The merging of the virtual and physical worlds, providing the capability to model either physical or logical objects with the view of assets or the data associated with them. It enables virtual replication for physical objects and processes, including asset hierarchy.
Blockchain: IDMS (Industrial Data Management System) augmentation in conjunction with Blockchain technology to implement decentralization and robustness in a network to improve asset lifecycles.
Blockchain Security: Protection, verification, and nonrepudiation of critical data for historian streams and connected applications.
Embedded Cyber Security: Ingress, egress, and historian integrations systems need to be extremely reliable with the support of robust cybersecurity.
OPC UA Adoption: More collaboration with the OPC Foundation for OPC UA to develop robust, redundant, and scalable UA connectors/interfaces.
IoT Edge Device Integration: Interfacing with edge infrastructure and enabling intelligence at the edge level through ready-to-use interfaces for edge devices and intelligent gateways.
Deal Volume: High-speed data collection and retrieval (millions of inputs per second).
Efficient Data Storage: Efficient and easy techniques for time-series data archiving such as compression algorithms.
No SQL: Simple archival storage in blocks of time, providing easy access to data.
Sensor Level Security: Data security roles defined down to individual data point granularity levels together with Microsoft Windows Integrated Security and fine-grained access control.
Vertical and Horizontal Integration: Ready-to-use (Plug & Play) interfaces/connectors for a wide range of different input sources with third-party applications and business systems. Prebuilt ingress interface, which can be deployed near source systems with buffering (Data Storage) capabilities and noise filtering capabilities (Exception Reporting).
Open-Source Connectivity Support: A wide range of data access technology support including OPC DA, HDA & UA, APIs, SQL, programmatic access via Software Development Kit (SDK), support to Microsoft's Component Object Model (COM), Data connectors, and Web API interfacing.
Quick Ingress and Egress: Fast access for real-time analytics, machine learning, and AI as a singular source.
High Availability and Redundancy: Features that can mirror data across multiple nodes with high availability at each level and function.
Visualization and Alerts: High-quality querying, visualizing, notifying, and alerting capabilities.
Data Enrichment and Cleansing Capabilities: Incorporated data aggregation, automatic data cleansing, interpolation, and data enrichment capabilities.
Data Contextualization: Enough power to easily contextualize time series data with metadata as well as combine different data types and sources.

Finally, Industrial Data Management Systems need to have a solution for solving human talent availability issues in order to help customers. As of today, there are multiple players in this race, but no one has a complete solution for the future of the Industrial Data Management System yet.

Major time series data management market players in 2020 include these organizations:

AVEVA Group (OSIsoft PI, WonderWare InSQL, eDNA, and Citec historians)
Honeywell (PHD)
Aspen Technology (InfoPlus 21)
GE (Proficy & Predix)
Microsoft (Time Series Insight)
Amazon (AWS TimeStream)
Google (BigTable)
ABB Historian
InfluxDB
TimescaleDB
OpenTSDB
Graphite
Cassandra
MongoDB
Elasticsearch
Riak
kdb+ (Kx)

References

AVEVA & OSIsoft combine to accelerate the digital transformation of the industrial world received from https://www.aveva.com/en/info/aveva-agrees-to-acquire-osisoft/
https://www.kenresearch.com/blog/2019/04/increasing-trends-in-the-global-data-historian-market-outlook-ken-research/
https://www.influxdata.com/modern-time-series-platform/
https://kx.com/why-kx/
https://www.controleng.com/articles/the-data-historians-history-told/
National Institute of Standards and Technology (NIST) – Definition of Time Series
http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc41.htm

About the author

Lalit Kumar Pokharana
Principal Domain Consultant, Wipro

Lalit Kumar Pokharana is a principal domain consultant at Wipro and is based in Houston, TX. He has over 13 years of experience in industrial manufacturing, automation, and control, and he specializes in consulting with technical expertise in control systems (PLC, DCS, SCADA), data historians, manufacturing execution systems (MES), and IoT sensors. Recently, he has been providing consultation for Real-Time Data Management, alarm management applications, SCADA, DCS systems, and OSIsoft. He also focuses on developing architectural best practices and working with customers on how they use real-time operational data platforms to transform their business, bridging the gap between operational technology and information technology.