We all suffer from the propensity to create order out of chaos.This extends to the unruly dimensions of big data when we have such a familiar structure at hand in the ever faithful Enterprise Data Warehouse (EDW). Most of us feel the need to hammer big data into a mold that we can use in our everyday business interactions, but then is big data any different than a data warehouse with consistent reports being extracted?
In order to unravel the entanglements of EDW and big data, we need to look at "time" which is most precious to all of us. There are a number of differences between big data and the data that reside in an EDW, but the primary difference is the value of time in each ecosystem.
An EDW consumes source system data rhythmically and the vast majority of its output is also cadenced. There is no denying that many industry standard platforms are shifting towards self-service modes where the data in the warehouse can be queried as needed. Big Data is however different. The information in the typical data warehouse is well known; metadata, linking tables, primary keys, etc. Looking at this data differently can produce unique insights, but it’s not big data. Big data is data that is without structure, or with a little structure, a mixture of CLOBs and BLOBs and that has value in the moment - a time span much shorter than what is found in the EDW.
The relationship between the EDW and Big Data can be that of distant cousins but should be more akin to the differences between brother and sister with parent company being the same. Looking at the DNA of both ecosystems, the potential for big data can be realized in coexistence with an EDW or Data Mart architecture. Before data enters the EDW or Data Mart, it is manipulated, enhanced, and transformed; most commonly by an ETL toolkit. Big Data is no different - it needs to be transformed in the moment to be understood. A possible and most probably the best location for a company’s big data platform is running parallel to their EDW staging. While a greater level of talent is necessary to decipher big data, many of the ETL toolkits can be used to explore and eventually refine natural relationships between entities in the big data platform.