The IT departments in enterprises see a lot of value in building service oriented architecture around their data warehouse environment to empower their internal customers. The arrival of the Internet of Things (IoT) introduced a new deluge of data getting processed and used for analytics. With more data getting processed and stored, the need for multi-platform data warehouse environment has emerged. The volume, velocity and variety of data and its potential use for the organic growth of the business elicited the data platforms growing bigger.
Today, data warehouse environment in organizations are at the threshold of fulfilling diverse use cases and provide data to broad users across the spectrum like business applications, business intelligence, data analysts, data scientists, etc.
Real-time data ingestion and extraction need to be easier with or without the involvement of IT. With the availability of features like text analysis, pattern matching in analytical platforms, REST as a framework is a great vehicle to carry and retrieve data from the data process and storage engines.
This paper addresses how RESTFUL framework becomes a cost effective solution to achieve the mounting need to serve data in real-time.
The heavy dependence on Extract, Transform and Load (ETL), and business intelligence tools has created some fatigue among business users. It takes multiple iterations and a long wait for businesses to get the data that they need. The emergence of simple but efficient open source frameworks like REST enable fast movement of data using most popular web protocols.
Internet of Things (IoT) and Big Data
Internet-enabled computer embedded chips in products and devices are used primarily for data-gathering, offering enterprise-level details on everything from how efficiently their machines are running to the purchasing habits of their consumers.
Without proper data-gathering in place, it will be impossible for businesses to sort through all the information flowing in from these embedded sensors. What that means is that, without analytics on the Big Data being captured, the Internet of Things can offer an enterprise only little more than noise.
Emergence of Multi-Platform Data Warehouse Environment
The 21st century was the period which marked the emergence of data warehouse as a science. The need to process and store data got traction with the business finding its usage. With more and more data getting processed, data appliances became popular. With the arrival of Internet of Things, data collection and processing got a new definition as the amount of data being collected increased exponentially.
The need to build multiple platforms to process and store data has hit the organizations. With the introduction of architectural principles like Teradata Unified Data Architecture (UDA), there are a lot of options to build a true multi-platform data warehouse environment. It is possible to store data of any size. Data Lake gives the options of storing data as it comes and in any data format. A combination of platform which is interconnected gives the facility to move data between platforms. There is now an option to perform insights on data in real-time. Tools like Teradata Query Grid, helps to move data between platforms and also has features to retrieve data from different platforms without the user knowing where the data was stored .
The volume and variety of data is directly correlated to the number of components to process. The days of conventional batch processing and canned analytics don’t satisfy the new type of users who use this data. That’s why organizations are looking out for non-formal ways to integrate, store and access data. The Open Source RESTFUL framework is one of the technologies which facilitate the ease of data integration and extraction.
RESTFUL Web services:
REST defines a set of architectural principles by which one can design
web services that focus on a system’s resources. Major advantages
• Use HTTP methods explicitly
• Be stateless
• Expose directory structure-like URIs
Building Real-Time Data Access with RESTFUL Framework
Given that multi-platform data warehouse environments have different work load capabilities, looking for real-time data ingestion and extraction becomes more difficult. Assume that there is a requirement to load and access unstructured data in real-time into multi-platform data warehouse environment. Being unstructured data, it makes sense to load into Hadoop (which is basically meant/good for batch processing). After the data is cleansed and ready for integration, it is meaningful to load the cleansed data into EDW or IDW to make it more efficient for the need for real-time access.
The web HDFS (Hadoop Data File Store) feature which is being offered by Apache or Hortonworks could be used for real-time data ingestion into HDFS. Also, the REST API can easily communicate to the Hadoop clusters. The file read and file write calls are redirected to the corresponding data nodes. It uses full bandwidth of the Hadoop cluster for streaming dataiv.
Conceptual Representation of Using REST for Multi-Platform Data Warehouse Environment