Data is an essential asset of all organization. Today, with technological advancements, data comes from umpteen number of disparate sources. Experts now point to a 4300% increase in annual data generation by 2020, of which 80% is unstructured data in the form of audio recordings, PDFs and texts. This data explosion is resulting in the creation of an overwhelming Data lake. IDC suggests that about 90% of this data is 'dark' and unstructured. In fact, companies use only a mere 12% of the available data to derive business insights and the rest are just stored as there is no proper means to access this data. This also means that the process for the data to run through its lifecycle, is quite elongated as of date.
Gartner research suggests that by 2018, 90% of the deployed data lakes would be useless as they are overwhelmed with information. However, companies spend millions of dollars in storing this data in the repository. This growing need for fast data discovery has been identified by companies like Paxata, Trifacta and others. They provide a self-service Data Preparation Tool which 'swims' through the huge data lakes to fetch all the relevant data and helps analysts by providing clean, standardized and enriched data set collated from various data sources. According to New York Times, Analysts spend about 80% of their time in preparing data. These tools would thus bring a revolution in the world of analysts by helping them save a lot of time and efforts. These tools are dynamic and visual with great user-interface with additional capabilities of smart-data discovery, in-built semantic library, data quality assurance etc.
The current market size of Data Preparation market is about $460 million and it is expected to grow at an estimated CAGR of 16.6%. It has been predicted that between now and 2020, spending on self-service visual discovery and data preparation tools will grow 2.5x faster than similar traditional IT-controlled tools. There has been quite a roar about data preparation tools undoing the dependency on IT. But is this really possible? Does Data Preparation shrink the scope of IT services industry or does it open up new doors of opportunity? In our opinion, IT services would serve as an enabler to these Data Preparation tools. These tools just provide a platform to collect data from various sources and further cleanse it. They cannot replace the need for formal / traditional data governance programs and robust data integration / extraction, transformation and loading solutions. They do not provide the expert advice to derive the right business insights. Services industry has a huge talent pool of SMEs with relevant domain knowledge and industry experience to help bring right insights hidden in the data. Another issue is that there are a lot of players in the data preparation market and hence, the small and nascent tool vendors would not be able to attract the customer competing with the market leaders. They need help of system integrators who have a huge trusted client network to sell their service and gain market share. What are your thoughts?