Facebook Sentiment Analysis using python

This article is a Facebook sentiment analysis using Vader, nowadays many government institutions and companies need to know their customers’ feedback and comment on social media such as Facebook.

What is sentiment analysis?

Sentiment analysis is one of the best modern branches of machine learning, which is mainly used to analyze the data in order to know one’s own idea, nowadays it is used by many companies to their own feedback from customers.

Why should we use sentiment analysis?

  • Invaluable Marketing:
    Using sentiment analysis companies and product owners use can use sentiment analysis to know the demand and supply of their products through comments and feedback from the customers.

  • Identifying key emotional triggers:
    In psychology and other medical treatment institutions, sentiment analysis can be used to detect whether the individuals’ emotion is normal or abnormal, and based on the data record they can decide person health.



  • Politics:
    In the political field, candidates to be elected can use sentiment analysis to predict their political status, to measure people’s acceptance. It can also be used to predict election results for electoral board commissions.

  • Education:
    Universities and other higher institutes like colleges can use sentiment analysis to know their student’s feedback and comment, therefore they can take consideration to revise or improve their education curriculum.

Installations in Anaconda

  • NLTK:is used for understanding of human natural language.
    Installation Using conda command.

     
    conda install -c anaconda nltk
    

    Installation Using pip.

    pip install nltk
    
  • NumPy: is a python package used for scientific and computional methods in python.
    Installation Using conda.

    conda install -c conda-forge numpy
    

    Using pip.

    pip install numpy
    
  • Pandas: is a python module used for data preprocessing and analysis .
    Installation Using conda

    conda install -c anaconda pandas
    

    Installation Using pip.

    pip install pandas
    
  • Matplotlib: is a python module used for data visulalization and and 2D plotting for representation of data.
    Installation Using conda.



    conda install -c conda-forge matplotlib
    

    Installation Using pip.

    pip install matplotlib
    

Authentication

There are many ways to fetch Facebook comments those are:

  • Facebook graph API
  • Direct download from Facebook
  • Downloading from another dataset provider sites

Among the above methods, we used downloading the Facebook comment dataset from the Kaggle website which is the best dataset provider. For the code we already used kindle.txt for analysis of kindle amazon facebook comment, you can use your own Facebook comment using this code to analyze your own comments or create a file in text format and try it for simplification.

Below is the implementation.

filter_none

edit
close

play_arrow

link
brightness_4
code

import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import nltk
import io
import unicodedata
import numpy as np
import re
import string
from numpy import linalg
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.tokenize import PunktSentenceTokenizer
from nltk.tokenize import PunktSentenceTokenizer
from nltk.corpus import webtext
from nltk.stem.porter import PorterStemmer
from nltk.stem.wordnet import WordNetLemmatizer
  
  
with open('kindle.txt', encoding ='ISO-8859-2') as f:
    text = f.read()
      
sent_tokenizer = PunktSentenceTokenizer(text)
sents = sent_tokenizer.tokenize(text)
  
print(word_tokenize(text))
print(sent_tokenize(text))
  
porter_stemmer = PorterStemmer()
  
nltk_tokens = nltk.word_tokenize(text)
  
for w in nltk_tokens:
    print ("Actual: % s Stem: % s" % (w, porter_stemmer.stem(w)))
      
  
wordnet_lemmatizer = WordNetLemmatizer()
nltk_tokens = nltk.word_tokenize(text)
  
for w in nltk_tokens:
    print ("Actual: % s Lemma: % s" % (w, wordnet_lemmatizer.lemmatize(w)))
      
text = nltk.word_tokenize(text)
print(nltk.pos_tag(text))
  
sid = SentimentIntensityAnalyzer() 
tokenizer = nltk.data.load('tokenizers / punkt / english.pickle')
  
with open('kindle.txt', encoding ='ISO-8859-2') as f:
    for text in f.read().split('\n'):
        print(text)
        scores = sid.polarity_scores(text)
        for key in sorted(scores):
            print('{0}: {1}, '.format(key, scores[key]), end ='')
               
    print()

chevron_right


Output:

here is the  sample output of the code:
['i', 'love', 'my', 'kindle']
['i love my kindle']
Actual: i Stem: i
Actual: love Stem: love
Actual: my Stem: my
Actual: kindle Stem: kindl
Actual: i Lemma: i
Actual: love Lemma: love
Actual: my Lemma: my
Actual: kindle Lemma: kindle
[('i', 'NN'), ('love', 'VBP'), ('my', 'PRP$'), ('kindle', 'NN')]
i love my kindle
compound: 0.6369, neg: 0.0, neu: 0.323, pos: 0.677,

We follow these major steps in our program:

  • Downloading(fetching) facebook comment from Kaggle site and save it as text format.
  • Preprocessing the data through SkLearn and nltk libraries .we first tokenize the data and then after tokenizing we stemize and lemmatize.
  • Parse the comments using Vader library . Classify each comment as positive, negative or neutral.

Now, let us try to understand the above piece of code:

  • First we open a file named kindle which is downloaded from Kaggle site and saved in local disk.

    with open(‘kindle.txt’, encoding=’ISO-8859-2′) as f:

  • After we open a file we preprocess the text through tokenize, stemize and then lemmatize:
    • Tokenize the text, i.e split words from text.

      sent_tokenizer = PunktSentenceTokenizer(text)
      sents = sent_tokenizer.tokenize(text)
      print(word_tokenize(text))
      print(sent_tokenize(text))

    • Stemize and lematize the text for normalization of the text:
      1) For stemize we use PorterStemmer() function:



      from nltk.stem.porter import PorterStemmer
      porter_stemmer = PorterStemmer()
      nltk_tokens = nltk.word_tokenize(text)
      for w in nltk_tokens:
           print (“Actual: %s Stem: %s” % (w, porter_stemmer.stem(w)))

      2) For lematize we use WordNetLemmatizer() function :

      from nltk.stem.wordnet import WordNetLemmatizer
      wordnet_lemmatizer = WordNetLemmatizer()
      nltk_tokens = nltk.word_tokenize(text)
      for w in nltk_tokens:
           print (“Actual: %s Lemma: %s” % (w,           wordnet_lemmatizer.lemmatize(w)))

  • POS( part of speech) tagging of the tokens and select only significant features/tokens like adjectives, adverbs, and verbs, etc.
    text = nltk.word_tokenize(text)
    print(nltk.pos_tag(text)) 
  • Pass the tokens to a sentiment intensity analyzer which classifies the Facebook comments as positive, negative or neutral.

Here is how vader sentiment analyzer works:

  • VADER uses a combination of A sentiment lexicon which is a list of lexical features (e.g., words) which are generally labeled according to their semantic orientation as either positive or negative.
  • sentiment analyzer not only tells about the Positivity and Negativity score but also tells us about how positive or negative a sentiment is.
  • Then, We used the polarity_scores() method to obtain the polarity indices for the given sentence.
    Then, we build the comment intensity and polarity as:

    sid = SentimentIntensityAnalyzer()
    tokenizer = nltk.data.load(‘tokenizers/punkt/english.pickle’)
    with open(‘kindle.txt’, encoding=’ISO-8859-2′) as f:
         for text in f.read().split(‘\n’):
              print(text)
              scores = sid.polarity_scores(text)
              for key in sorted(scores):
                   print(‘{0}: {1}, ‘.format(key, scores[key]), end=”)
         print()

    Let us to understand what the sentiment code is and how VADER performs on the output of the above code:

     i love my kindle
    compound: 0.6369, neg: 0.0, neu: 0.323, pos: 0.677 
  • The Positive(pos), Negative(neg) and Neutral(neu) scores represent the proportion of text that falls in these categories. This means our sentence was rated as 67% Positive, 32% Neutral and 0% Negative. Hence all these should add up to 1.

  • The Compound score is a metric that calculates the sum of all the lexicon ratings which have been normalized between -1( extreme negative) and +1 ( extreme positive).

  • Finally, sentiment scores of comments are returned.



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.