Text Analytics and Twitter: What can Companies Derive from Social Media and what are the Challenges?

Sari Nahmad Data Scientist
Read Time: 4 minutes apprx.
data science social media data text analytics

So what can companies expect to derive from social media, in this case Twitter? To give a little background, Twitter is an online social networking service with more than 100 million users. Users can post and read “tweets” which are short 140 character messages. These tweets are little messages that companies can hone in on to understand public opinion regarding their products and services. Although individual tweets can provide good case examples for public opinion, the strength in Twitter is in the sheer volume of tweets that are continuously being generated from the public. Useful analysis of Twitter data is constructed at a summary level (this of course can be broken down by other variables such as user specific information like location or age).

The dashboard below, which explores Chicago Park District tweets, reveals some of the kinds of information that can be derived from Twitter. There are four components to the dashboard: Tweet Frequency, Topic, Location, and the Word Cloud. Tweet Frequency shows the number of tweets over time broken down by their sentiment (positive, negative, and neutral). Topic, Location, the Word Cloud are on a weekly level (play around with “Select Week” to see how the discussions evolve). The data spans a 10 week period (August 16 – October 16).

This type of analysis is applicable to many businesses where there is otherwise unused text data existing in reviews, comments, notes, etc. Businesses can use text analytics to derive a deeper understanding and insights into their data and use it to supplement typical structured data. Text analytics is used in applications such as business intelligence, predictive analytics, sentiment, information retrieval, indexing, recommendations, customer service (email support), auto labeling, spam filtering, and fraud detection.

Note about using Text Analytics specifically in Twitter:

There are challenges with text analytics in the world of Twitter. These issues do not deal with processing or analysis as expected, but concern the choices companies make in regards to promoting (or not) their products and services. This is related to the concept of precision and recall. We can think of precision as the fraction of tweets we collect that are relevant to the company or topic, while recall is the fraction of tweets collected that are relevant to the company or topic that are successfully retrieved.

If companies desire an accurate representation of (twitter) public opinion they should aim for high precision and high recall. While recall is important to gain a large sample size of tweets, I argue that precision is a higher priority in order to extract a clear and accurate representation of public opinion. This requires companies to brainstorm specific keywords and hashtags to market to consumers, and secondly to engage consumers and ask for feedback. For example if a food production company wants to track a specific milk product, they should develop a unique keyword such as “companyXmilk” and then promote it (in ads or on the milk bottle itself). In other words, they should not expect general hashtags or keywords such as “milk” to accurately represent their own customer’s opinions.

Knowing what additional keywords to include or exclude to create your tweet repository is an iterative process and may require subject expertise. As times change and new products are introduced, public opinions change and new conversations are born. Therefore, companies should continue to monitor the Twitter-verse and spot developing trends as they emerge in order to track and analyze them.

Details on the Technical Process:

The process is as follows: first use Twitter API to collect (or “listen” to) tweets with relevant keywords.  After collecting the tweets, scrape the twitter JSON for the desired variables such as text, user ID, hashtags, location, favorite counts, etc. Then perform sentiment analysis, split the data into positive and negative tweets, and lastly use topic analysis to derive the final topics (via LDA).