In this post I would like to round up the history of data series with a quick look at how big data came to be. My previous post discussed the transition from physical data stored on punched cards to data warehouses that were used to store large amounts of primarily structured data.
The rise of distributed computing
Until the 1990s, computer usage was primarily restricted to enterprises and large organizations that needed to archive and derive some business intelligence from data that they thought to be useful. In the 1980s, most computers were dumb terminals - which meant they were only gateways for users to a central computer that provided processing and memory to the terminals. However, thanks to the development of intuitive and user friendly operating systems such as Apple's Mac OS and Microsoft Windows, the PC revolution started towards the late 80s. Computer adoption became mainstream as almost anyone could use a computer to perform routine tasks such as word processing and use spreadsheet applications such as Lotus 123. As computer adoption democratized, there was a need to connect these disparate processing units so that the data between the various computers could be stored more efficiently - data storage was still quite expensive - especially in businesses. The networking era led to the creation of the browser and search engines which only fed the growth of data on global networks.
Data on the internet was primarily served up through the use of mark-up languages (semi-structured languages) such as HTML and later XML. Although a lot of information from the internet and intranets was still stored as structured data in databases and data warehouses, the velocity and scale of data that was generated from users (as well as machines serving up content in the form of logs) required the use of alternative data containers. Databases and Data Warehouses could not deal with the variety of data that required storage as the data would need to be transformed to be stored. This led to the creation of semi-structured and quasi-structured data, wherein data was stored in containers that had a little structure to it.
The connected era
Through the 1990s and 2000s, fueled by plummeting data storage costs and the widespread use of mobile and connected devices, more data was generated in a few minutes than was previously generated in decades. The mobile era heralded the use of data intensive media such as video, images and audio which then led to large scale storage of what is now known as unstructured data. As we approached the 2010s, more than 80% of data generated for storage was unstructured.
The rapid growth of data and storage led to the growth of alternative storage containers such as NoSQL databases. In addition, this also led to the formation of data centric platforms and frameworks (called Big Data platforms) such as Hadoop and MPI. These platforms allow continuous access and processing of data at a large scale to aid Datafication - the ability to discover previously unknown/unseen trends and relationships using data. This trend has further accelerated with the explosion in the number of connected data collecting devices (IOT).
As the popular saying goes – Information is Wealth. As organizations become more competitive, the quality and scope of intelligence that can be derived from data will be a key feature of an organization's success. Therefore, it is critical that organizations adopt the data mantra sooner than later to ensure future success.