There is so much talk about Big Data these days, yet there is no common understanding of what it means, what impact it has on business. The academic and analyst community has described its attributes widely known as 5 V’s, Volume, Velocity, Variety, Veracity and Value. Depending on the context, different people have different perspectives on Big Data.
To understand Big Data, one needs to look at its origins and evolution over a period of time and where it is headed. It can be examined from many vantage points such as technology, process, organizational structure, people, applications and philosophy of using data. Let me focus on the technology perspective in this blog.
Based on my 15 years of experience in the areas of BI, DW, Information Management and Analytics, I believe Big Data is at the cross roads of various technologies coming together in the following categories:
- Category 1 – usually referred to as structured data/information management technologies:
- Reporting, OLAP, BI, Dashboard/Scorecard, business/data discovery and analytics
- ETL, Data Integration, EDI, Enterprise Application Integration, Enterprise Business Integration, BAM, rules engines, Complex Event Processing (CEP)
- Data Quality, Data Migration, Meta Data, Master Data, Data Archival, DW appliance, Data Virtualization technologies
- Category 2 – classified under semi/unstructured data management technologies:
- Document Management, Content Management, Knowledge Management and Collaboration
- Search Engines, Text Mining
- Hadoop & MapReduce family, Streaming, NoSQL
- Category 3 - next generation data types, audio and video
- Audio, Speech recognition technologies
- Image processing, Video Analytics, Geographical Information Systems
Data generated by machines, devices, sensors, syndicated data, and social media data, all fall under one or more of the above categories of technologies. Innovation happens at the intersection of the above categories of technologies. New applications get developed at these intersections paving the way for technology convergence in these areas and technology service providers and system integrators would understand this and want to leverage this, the most.
To give an example in category 1, all the earlier independent technologies like reporting, OLAP, dashboard/scorecard, data discovery and analytical tools are converging into one single platform; the way IBM is integrating them all into its Cognos platform. MS Office is another example of this trend. Most of the BI tools are incorporating search capabilities in them, and search tools are incorporating BI capabilities in them leading to the convergence of BI (category 1) and Search (category 2) technologies. Taking one step further, products like Oracle Endeca , HP Autonomy and Attivio’s Active Intelligence Engine (AIE) are trying to combine the features from Category 1 and 2, by offering features to manage both structured and unstructured data.
In future we could see products and applications combining structured, unstructured, audio and video data as well. That is when the true potential of Big Data would be realized. What do you think, do write in with your views?