Using social media as a central element of business growth strategy has become a norm for organizations, both small and big. Social has evolved as one of the main consumer interaction channels with its immense power to capture customer sentiments and help create meaningful business insights.
The article demonstrates the steps involved in identifying sentiments out of social media data.
This applies to all industries in the B2C space in particular, as the volume of consumers in such cases can be very high. Listening to consumers on a one- to-one basis may be not be cost effective and an efficient method. This will further become cumbersome if the industry is subject to seasonality as there can be variation in consumer responses and the business may not be in a position to scale up to this change. This can lead to undesired consequences. To address this issue, machine learning techniques like classification and prediction are great options[i].
The first step of the solution is data collection. For social media monitoring, in this article we have limited our scope to Twitter. Twitter allows to use download data real- time for analysis. Some of the parameters that can be downloaded pertaining to a tweet include username, message and location among others. On Twitter, consumers use different techniques to voice their opinions. They use hashtags which are basically keywords used by consumers to collectively express their opinions. These hashtags contain words corresponding to a particular business and can be tracked. Tweets corresponding to these tweets can be tracked real-time and be locally stored in the database.
Once the data has been collected, the next step is data pre-processing. This process contains a series of steps i.e. tokenization and lemmatization using stemming algorithms. Lemmatization is a process of removing suffixes of words to ensure that there are no two words with the same meaning. This is achieved by performing tokenization initially where each and every word is split from the rest of the sentence. Apart from these, regular expressions are used to remove special characters, numbers, stop words etc. as they have no use in the analysis.
Following data pre-processing, the next step is to use machine learning algorithms to identify the anomalies. Leveraging the knowledge of SMEs, tweets can manually be classified into appropriate categories. This will help us create the training data asset for the model. Once the model classifier is trained, opinions will be classified accordingly and can be programmed to raise an alert for the same[ii]. Businesses can further study these tweets and identify location, person etc. and take measures to address these issues.