Facebook Sentiment Analysis using python

This article is a Facebook sentiment analysis using Vader, nowadays many government institutions and companies need to know their customers’ feedback and comment on social media such as Facebook.

What is sentiment analysis?

Sentiment analysis is one of the best modern branches of machine learning, which is mainly used to analyze the data in order to know one’s own idea, nowadays it is used by many companies to their own feedback from customers.

Why should we use sentiment analysis?

Invaluable Marketing:
Using sentiment analysis companies and product owners use can use sentiment analysis to know the demand and supply of their products through comments and feedback from the customers.
Identifying key emotional triggers:
In psychology and other medical treatment institutions, sentiment analysis can be used to detect whether the individuals’ emotion is normal or abnormal, and based on the data record they can decide person health.
Politics:
In the political field, candidates to be elected can use sentiment analysis to predict their political status, to measure people’s acceptance. It can also be used to predict election results for electoral board commissions.
Education:
Universities and other higher institutes like colleges can use sentiment analysis to know their student’s feedback and comment, therefore they can take consideration to revise or improve their education curriculum.

Installations in Anaconda

NLTK:is used for understanding of human natural language.
Installation Using conda command.

 
conda install -c anaconda nltk

Installation Using pip.

pip install nltk

NumPy: is a python package used for scientific and computional methods in python.
Installation Using conda.

conda install -c conda-forge numpy

Using pip.

pip install numpy

Pandas: is a python module used for data preprocessing and analysis .
Installation Using conda

conda install -c anaconda pandas

Installation Using pip.

pip install pandas

Matplotlib: is a python module used for data visualization and and 2D plotting for representation of data.
Installation Using conda.

conda install -c conda-forge matplotlib

Installation Using pip.

pip install matplotlib

Authentication

There are many ways to fetch Facebook comments those are:

Facebook graph API
Direct download from Facebook
Downloading from another dataset provider sites

Among the above methods, we used downloading the Facebook comment dataset from the Kaggle website which is the best dataset provider. For the code we already used kindle.txt for analysis of kindle amazon facebook comment, you can use your own Facebook comment using this code to analyze your own comments or create a file in text format and try it for simplification.
Below is the implementation.

Python3

import time

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import nltk

import io

import unicodedata

import numpy as np

import re

import string

from numpy import linalg

from nltk.sentiment.vader import SentimentIntensityAnalyzer

from nltk.tokenize import sent_tokenize, word_tokenize

from nltk.tokenize import PunktSentenceTokenizer

from nltk.tokenize import PunktSentenceTokenizer

from nltk.corpus import webtext

from nltk.stem.porter import PorterStemmer

from nltk.stem.wordnet import WordNetLemmatizer
 
with open('kindle.txt', encoding ='ISO-8859-2') as f:

    text = f.read()

sent_tokenizer = PunktSentenceTokenizer(text)

sents = sent_tokenizer.tokenize(text)
 
print(word_tokenize(text))

print(sent_tokenize(text))
 
porter_stemmer = PorterStemmer()
 
nltk_tokens = nltk.word_tokenize(text)
 
for w in nltk_tokens:

    print ("Actual: % s Stem: % s" % (w, porter_stemmer.stem(w)))

wordnet_lemmatizer = WordNetLemmatizer()

nltk_tokens = nltk.word_tokenize(text)
 
for w in nltk_tokens:

    print ("Actual: % s Lemma: % s" % (w, wordnet_lemmatizer.lemmatize(w)))

text = nltk.word_tokenize(text)

print(nltk.pos_tag(text))
 
sid = SentimentIntensityAnalyzer() 

tokenizer = nltk.data.load('tokenizers / punkt / english.pickle')
 
with open('kindle.txt', encoding ='ISO-8859-2') as f:

    for text in f.read().split('\n'):

        print(text)

        scores = sid.polarity_scores(text)

        for key in sorted(scores):

            print('{0}: {1}, '.format(key, scores[key]), end ='')

    print()

Output:

here is the  sample output of the code:
['i', 'love', 'my', 'kindle']
['i love my kindle']
Actual: i Stem: i
Actual: love Stem: love
Actual: my Stem: my
Actual: kindle Stem: kindl
Actual: i Lemma: i
Actual: love Lemma: love
Actual: my Lemma: my
Actual: kindle Lemma: kindle
[('i', 'NN'), ('love', 'VBP'), ('my', 'PRP$'), ('kindle', 'NN')]
i love my kindle
compound: 0.6369, neg: 0.0, neu: 0.323, pos: 0.677,

We follow these major steps in our program:

Downloading(fetching) facebook comment from Kaggle site and save it as text format.
Preprocessing the data through SkLearn and nltk libraries .we first tokenize the data and then after tokenizing we stemize and lemmatize.
Parse the comments using Vader library . Classify each comment as positive, negative or neutral.

Now, let us try to understand the above piece of code:

First we open a file named kindle which is downloaded from Kaggle site and saved in local disk.

with open(‘kindle.txt’, encoding=’ISO-8859-2′) as f:

After we open a file we preprocess the text through tokenize, stemize and then lemmatize:
- Tokenize the text, i.e split words from text.

sent_tokenizer = PunktSentenceTokenizer(text)
sents = sent_tokenizer.tokenize(text)
print(word_tokenize(text))
print(sent_tokenize(text))

Stemize and lemmatize the text for normalization of the text:
1) For stemize we use PorterStemmer() function:

from nltk.stem.porter import PorterStemmer
porter_stemmer = PorterStemmer()
nltk_tokens = nltk.word_tokenize(text)
for w in nltk_tokens:
print (“Actual: %s Stem: %s” % (w, porter_stemmer.stem(w)))

2) For lemmatize we use WordNetLemmatizer() function :

from nltk.stem.wordnet import WordNetLemmatizer
wordnet_lemmatizer = WordNetLemmatizer()
nltk_tokens = nltk.word_tokenize(text)
for w in nltk_tokens:
print (“Actual: %s Lemma: %s” % (w, wordnet_lemmatizer.lemmatize(w)))

POS( part of speech) tagging of the tokens and select only significant features/tokens like adjectives, adverbs, and verbs, etc.

text = nltk.word_tokenize(text)
print(nltk.pos_tag(text))

Pass the tokens to a sentiment intensity analyzer which classifies the Facebook comments as positive, negative or neutral.

Here is how vader sentiment analyzer works:

VADER uses a combination of A sentiment lexicon which is a list of lexical features (e.g., words) which are generally labeled according to their semantic orientation as either positive or negative.
sentiment analyzer not only tells about the Positivity and Negativity score but also tells us about how positive or negative a sentiment is.
Then, We used the polarity_scores() method to obtain the polarity indices for the given sentence.
Then, we build the comment intensity and polarity as:

sid = SentimentIntensityAnalyzer()
tokenizer = nltk.data.load(‘tokenizers/punkt/english.pickle’)
with open(‘kindle.txt’, encoding=’ISO-8859-2′) as f:
     for text in f.read().split(‘\n’):
          print(text)
          scores = sid.polarity_scores(text)
          for key in sorted(scores):
               print(‘{0}: {1}, ‘.format(key, scores[key]), end=”)
     print()

Let us to understand what the sentiment code is and how VADER performs on the output of the above code:

 i love my kindle
compound: 0.6369, neg: 0.0, neu: 0.323, pos: 0.677

The Positive(pos), Negative(neg) and Neutral(neu) scores represent the proportion of text that falls in these categories. This means our sentence was rated as 67% Positive, 32% Neutral and 0% Negative. Hence all these should add up to 1.
The Compound score is a metric that calculates the sum of all the lexicon ratings which have been normalized between -1( extreme negative) and +1 ( extreme positive).
Finally, sentiment scores of comments are returned.

Article Tags :

Machine Learning

Python

Python-Miscellaneous