Open In App

Twitter Sentiment Analysis using Python

This article covers the sentiment analysis of any topic by parsing the tweets fetched from Twitter using Python.  

What is sentiment analysis? 
Sentiment Analysis is the process of ‘computationally’ determining whether a piece of writing is positive, negative or neutral. It’s also known as opinion mining, deriving the opinion or attitude of a speaker. 



Why sentiment analysis?

Installation:



pip install tweepy
pip install textblob
python -m textblob.download_corpora

Authentication: In order to fetch tweets through Twitter API, one needs to register an App through their twitter account. Follow these steps for the same:

Implementation: 




import re
import tweepy
from tweepy import OAuthHandler
from textblob import TextBlob
 
class TwitterClient(object):
    '''
    Generic Twitter Class for sentiment analysis.
    '''
    def __init__(self):
        '''
        Class constructor or initialization method.
        '''
        # keys and tokens from the Twitter Dev Console
        consumer_key = 'XXXXXXXXXXXXXXXXXXXXXXXX'
        consumer_secret = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXX'
        access_token = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXX'
        access_token_secret = 'XXXXXXXXXXXXXXXXXXXXXXXXX'
 
        # attempt authentication
        try:
            # create OAuthHandler object
            self.auth = OAuthHandler(consumer_key, consumer_secret)
            # set access token and secret
            self.auth.set_access_token(access_token, access_token_secret)
            # create tweepy API object to fetch tweets
            self.api = tweepy.API(self.auth)
        except:
            print("Error: Authentication Failed")
 
    def clean_tweet(self, tweet):
        '''
        Utility function to clean tweet text by removing links, special characters
        using simple regex statements.
        '''
        return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])
                                    |(\w+:\/\/\S+)", " ", tweet).split())
 
    def get_tweet_sentiment(self, tweet):
        '''
        Utility function to classify sentiment of passed tweet
        using textblob's sentiment method
        '''
        # create TextBlob object of passed tweet text
        analysis = TextBlob(self.clean_tweet(tweet))
        # set sentiment
        if analysis.sentiment.polarity > 0:
            return 'positive'
        elif analysis.sentiment.polarity == 0:
            return 'neutral'
        else:
            return 'negative'
 
    def get_tweets(self, query, count = 10):
        '''
        Main function to fetch tweets and parse them.
        '''
        # empty list to store parsed tweets
        tweets = []
 
        try:
            # call twitter api to fetch tweets
            fetched_tweets = self.api.search(q = query, count = count)
 
            # parsing tweets one by one
            for tweet in fetched_tweets:
                # empty dictionary to store required params of a tweet
                parsed_tweet = {}
 
                # saving text of tweet
                parsed_tweet['text'] = tweet.text
                # saving sentiment of tweet
                parsed_tweet['sentiment'] = self.get_tweet_sentiment(tweet.text)
 
                # appending parsed tweet to tweets list
                if tweet.retweet_count > 0:
                    # if tweet has retweets, ensure that it is appended only once
                    if parsed_tweet not in tweets:
                        tweets.append(parsed_tweet)
                else:
                    tweets.append(parsed_tweet)
 
            # return parsed tweets
            return tweets
 
        except tweepy.TweepError as e:
            # print error (if any)
            print("Error : " + str(e))
 
def main():
    # creating object of TwitterClient Class
    api = TwitterClient()
    # calling function to get tweets
    tweets = api.get_tweets(query = 'Donald Trump', count = 200)
 
    # picking positive tweets from tweets
    ptweets = == 'positive']
    # percentage of positive tweets
    print("Positive tweets percentage: {} %".format(100*len(ptweets)/len(tweets)))
    # picking negative tweets from tweets
    ntweets = == 'negative']
    # percentage of negative tweets
    print("Negative tweets percentage: {} %".format(100*len(ntweets)/len(tweets)))
    # percentage of neutral tweets
    print("Neutral tweets percentage: {} % \
        ".format(100*(len(tweets) -(len( ntweets )+len( ptweets)))/len(tweets)))
 
    # printing first 5 positive tweets
    print("\n\nPositive tweets:")
    for tweet in ptweets[:10]:
        print(tweet['text'])
 
    # printing first 5 negative tweets
    print("\n\nNegative tweets:")
    for tweet in ntweets[:10]:
        print(tweet['text'])
 
if __name__ == "__main__":
    # calling main function
    main()

Here is how a sample output looks like when above program is run:

Positive tweets percentage: 22 %
Negative tweets percentage: 15 %


Positive tweets:
RT @JohnGGalt: Amazing—after years of attacking Donald Trump the media managed
to turn #InaugurationDay into all about themselves.
#MakeAme…
RT @vooda1: CNN Declines to Air White House Press Conference Live YES! 
THANK YOU @CNN FOR NOT LEGITIMI…
RT @Muheeb_Shawwa: Donald J. Trump's speech sounded eerily familiar...
POTUS plans new deal for UK as Theresa May to be first foreign leader to meet new 
president since inauguration 
.@realdonaldtrump #Syria #Mexico #Russia & now #Afghanistan. 
Another #DearDonaldTrump Letter worth a read @AJEnglish 


Negative tweets:
RT @Slate: Donald Trump’s administration: “Government by the worst men.” 
RT @RVAwonk: Trump, Sean Spicer, etc. all lie for a reason. 
Their lies are not just lies. Their lies are authoritarian propaganda.  
RT @KomptonMusic: Me: I hate corn 
Donald Trump: I hate corn too
Me: https://t.co/GPgy8R8HB5
It's ridiculous that people are more annoyed at this than Donald Trump's sexism.
RT @tony_broach: Chris Wallace on Fox news right now talking crap 
about Donald Trump news conference it seems he can't face the truth either…
RT @fravel: With False Claims, Donald Trump Attacks Media on Crowd Turnout 
Aziz Ansari Just Hit Donald Trump Hard In An Epic Saturday Night Live Monologue

We follow these 3 major steps in our program:

Now, let us try to understand the above piece of code:

fetched_tweets = self.api.search(q = query, count = count)
analysis = TextBlob(self.clean_tweet(tweet))

Full Code Explanation:

  1. The code first creates a RandomForestRegressor object.
  2. This object is used to train a model that predicts air quality index (AQI) values.
  3. Next, the code separates the class label (train) and other attributes (target).
  4. The train data set contains information about air quality index values for different classes, while the target data set contains only AQI values.
  5. The code then fits the RandomForestRegressor object on the train data set and targets the predicted AQI value in the target data set.
  6. The code creates a model using the RandomForestRegressor algorithm.
  7. The model is fit to data consisting of air quality index values from training data and target values for air quality index.
  8. The code in this section is used to train a Random Forest Regressor.
  9. A Random Forest Regressor is a machine learning algorithm that uses a collection of trees (or forests) to make predictions.
  10. The first thing the code does is set some parameters.
  11. The most important parameter is the bootstrap parameter, which determines how often the training data should be randomly sampled from.
  12. The default value is True, which means that the training data will be randomly sampled every time it’s needed.
  13. Another important parameter is ccp_alpha, which controls how much weight should be given to features when making predictions.
  14. By default, ccp_alpha is set to 0.0, which means that all features are equally important when making predictions.
  15. However, if you want more weight to be given to certain features over others, you can set ccp_alpha to a value between 0 and 1 .
  16. If you set ccp_alpha too low (i.e., less than 0), then the feature with the lowest importance will have the most weight in predicting outcomes; if you set ccp_alpha too high (i.e., greater than 1), then the feature with the highest importance will have the most weight in predicting outcomes.
  17. The next thing
  18. The code will create a Random Forest Regressor to predict sales.
  19. The Random Forest Regressor will use bootstrap sampling to generate samples, and will have a criterion of mse.
  20. The Random Forest Regressor will also have a max_depth and max_features parameter.
  21. The max_depth parameter controls the maximum number of layers in the Random Forest Regressor, while the max_features parameter controls the number of features that are used in the model.
  22. Finally, the code specifies that the Random Forest Regressor should have a min_impurity_decrease and min_impurity_split parameter.
  23. These parameters control how aggressively the model should try to reduce impurity (i.e., variance).
  24. Lastly, the code specifies that the Random
  25. The code begins by importing the necessary modules.
  26. The AdaBoostRegressor module is used to create and fit the model.
  27. The learning_rate, loss, and n_estimators parameters are all optional; they can be left at their default values (1.0, ‘linear’, and 50, respectively).
  28. Next, the base_estimator parameter is set to None.
  29. This means that the model will be fitted using a random forest algorithm instead of a simple linear regression model.
  30. The next step is to define the model parameters.
  31. The learning_rate parameter sets how often the algorithm should learn from data; it should be greater than 1 but less than or equal to 2 (in this case, 1.0).
  32. The loss parameter specifies how much weight each prediction should have in determining the final score; it should be ‘linear’ in this case (meaning that predictions with lower scores will have less impact on the final score).
  33. Finally, n_estimators defines how many trees will be used in the random forest algorithm; 50 is used here.
  34. After defining these parameters, it’s time to fit the model!
  35. First, train1 and target are passed into the fit() method as input data.
  36. Next, m2
  37. The code first imports the AdaBoostRegressor module.
  38. This module allows you to train a model using a gradient descent algorithm.
  39. Next, the code defines the model using the AdaBoostRegressor() function.
  40. The parameters that are defined include the base estimator (which is None in this case), learning rate (1.0), and loss (linear).
  41. Finally, the code sets up 50 training iterations and passes in the target value as an input.
  42. After fitting the model, the predicted values for each sample are returned.

References:


Article Tags :