Skip to content
Related Articles

Related Articles

Improve Article
Sentiment Classification Using BERT
  • Last Updated : 02 Sep, 2020

BERT stands for Bidirectional Representation for Transformers, was proposed by researchers at Google AI language in 2018. Although the main aim of that was to improve the understanding of the meaning of queries related to Google Search, BERT becomes one of the most important and complete architecture for various natural language tasks having generated state-of-the-art results on Sentence pair classification task, question-answer task, etc. For more details on the architecture please look at this article


One of the most important features of BERT is that its adaptability to perform different NLP tasks with state-of-the-art accuracy (similar to the transfer learning we used in Computer vision). For that, the paper also proposed the architecture of different tasks. In this post, we will be using BERT architecture for single sentence classification tasks specifically the architecture used for CoLA (Corpus of Linguistic Acceptability) binary classification task. In the previous post about BERT, we discussed BERT architecture in detail, but let’s recap some of the important details of it:

BERT single sentence classification task

BERT has proposed in the two versions:

  • BERT (BASE): 12 layers of encoder stack with 12 bidirectional self-attention heads and 768 hidden units.
  • BERT (LARGE): 24 layers of encoder stack with 24 bidirectional self-attention heads and 1024 hidden units.

For TensorFlow implementation, Google has provided two versions of both the BERT BASE and BERT LARGE: Uncased and Cased. In an uncased version, letters are lowercased before WordPiece tokenization.


  • First, we need to clone the GitHub repo to BERT to make the setup easier.


! git clone / google-research / bert.git
Cloning into 'bert'...
remote: Enumerating objects: 340, done.
remote: Total 340 (delta 0), reused 0 (delta 0), pack-reused 340
Receiving objects: 100% (340/340), 317.20 KiB | 584.00 KiB/s, done.
Resolving deltas: 100% (185/185), done.
  • Now, we need to download the BERTBASE model using the following link and unzip it into the working directory ( or the desired location).


# Download BERT BASE model from tF hub ! wget / bert_models / 2018_10_18 / ! unzip
   creating: uncased_L-12_H-768_A-12/
  inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.meta  
  inflating: uncased_L-12_H-768_A-12/  
  inflating: uncased_L-12_H-768_A-12/vocab.txt  
  inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.index  
  inflating: uncased_L-12_H-768_A-12/bert_config.json  
  • We will be using the TensorFlow 1x version. In Google colab there is a magic function called tensorflow_version that can switch different versions.


% tensorflow_version 1.x
TensorFlow 1.x selected.
  • Now, we will import modules necessary for running this project, we will be using NumPy, scikit-learn and Keras from TensorFlow inbuilt modules. These are already preinstalled in colab, make sure to install these in your environment.


import os
import re
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
import tensorflow as tf
from tensorflow import keras
import csv
from sklearn import metrics
  • Now we will load IMDB sentiments datasets and do some preprocessing before training. For loading the IMDB dataset from TensorFlow Hub, we will follow this tutorial. 


# load data from positive and negative directories and a columns that takes there\
# positive and negative label
def load_directory_data(directory):
  data = {}
  data["sentence"] = []
  data["sentiment"] = []
  for file_path in os.listdir(directory):
    with tf.gfile.GFile(os.path.join(directory, file_path), "r") as f:
      data["sentiment"].append(re.match("\d+_(\d+)\.txt", file_path).group(1))
  return pd.DataFrame.from_dict(data)
# Merge positive and negative examples, add a polarity column and shuffle.
def load_dataset(directory):
  pos_df = load_directory_data(os.path.join(directory, "pos"))
  neg_df = load_directory_data(os.path.join(directory, "neg"))
  pos_df["polarity"] = 1
  neg_df["polarity"] = 0
  return pd.concat([pos_df, neg_df]).sample(frac = 1).reset_index(drop = True)
# Download and process the dataset files.
def download_and_load_datasets(force_download = False):
  dataset = tf.keras.utils.get_file(
      fname ="aclImdb.tar.gz"
      origin =" / data / sentiment / aclImdb_v1.tar.gz"
      extract = True)
  train_df = load_dataset(os.path.join(os.path.dirname(dataset), 
                                       "aclImdb", "train"))
  test_df = load_dataset(os.path.join(os.path.dirname(dataset), 
                                      "aclImdb", "test"))
  return train_df, test_df
train, test = download_and_load_datasets()
train.shape, test.shape
Downloading data from
84131840/84125825 [==============================] - 8s 0us/step
((25000, 3), (25000, 3))
  • This dataset contains 50k reviews 25k for each training and test, we will sample 5k reviews from each test and train. Also, both test and train dataset contains 3 columns whose list is given below


# sample 5k datapoints for both train and test
train = train.sample(5000)
test = test.sample(5000)
# List columns of train and test data
train.columns, test.columns
(Index(['sentence', 'sentiment', 'polarity'], dtype='object'),
 Index(['sentence', 'sentiment', 'polarity'], dtype='object'))
  • Now, we need to convert the specific format that is required by the BERT model to train and predict, for that we will use pandas dataframe. Below are the columns required in BERT training and test format:
    • GUID: An id for the row. Required for both train and test data
    • Class label.: A value of 0 or 1 depending on positive and negative sentiment. 
    • alpha: This is a dummy column for text classification but is expected by BERT during training.
    • text:  The review text of the data point which needed to be classified. Obviously required for both training and test


# code
# Convert training data into BERT format
train_bert = pd.DataFrame({
  'guid': range(len(train)),
 'alpha': ['a']*train.shape[0],
 'text': train['sentence'].replace(r'\n', '', regex = True)
# convert test data into bert format
bert_test = pd.DataFrame({
 'text': test['sentence'].replace(r'\n', ' ', regex = True)
guid    label    alpha    text
14930    0    1    a    William Hurt may not be an American matinee id...
1445    1    1    a    Rock solid giallo from a master filmmaker of t...
16943    2    1    a    This movie surprised me. Some things were "cli...
6391    3    1    a    This film may seem dated today, but remember t...
4526    4    0    a    The Twilight Zone has achieved a certain mytho...
guid    text
20010    0    One of Alfred Hitchcock's three greatest films...
16132    1    Hitchcock once gave an interview where he said...
24947    2    I had nothing to do before going out one night...
5471    3    tell you what that was excellent. Dylan Moran ...
21075    4    I watched this show until my puberty but still...
  • Now, we split the data into three parts: train, dev, and test and save it into tsv file save it into a folder (here “IMDB Dataset”). This is because  run classifier file requires dataset in tsv format.


# split  data into  train and validation set
bert_train, bert_val = train_test_split(train_bert, test_size = 0.1)
# save train, validation and testfile to afolder
bert_train.to_csv('bert / IMDB_dataset / train.tsv', sep ='\t', index = False, header = False)
bert_val.to_csv('bert / IMDB_dataset / dev.tsv', sep ='\t', index = False, header = False)
bert_test.to_csv('bert / IMDB_dataset / test.tsv', sep ='\t', index = False, header = True)
  • In this step, we train the model using the following command, for executing bash commands on colab, we use ! sign in front of the command. The run_classifier file trains the model with the help of given command. Due to time and resource constraints, we will run it only on 3  epochs.


# Most of the arguments  hereare self-explanatory but some  aguments needs  to be explained:
# task name:We have discussed this above .Here we need toperform binary  classification that why we use cola
# vocab file :  A vocab file (vocab.txt) to map WordPiece to word id.
# init checkpoint:  A tensorflow checkpoint required. Here we used downlaoded bert.
# max_seq_length :caps the maximunumber of words  to each reviews
# bert_config_file: file contains hyperparameter settings ! python bert / 
--task_name = cola --do_train = true --do_eval = true 
--data_dir =/content / bert / IMDB_dataset 
--vocab_file =/content / uncased_L-12_H-768_A-12 / vocab.txt
--bert_config_file =/content / uncased_L-12_H-768_A-12 / bert_config.json 
--init_checkpoint =/content / uncased_L-12_H-768_A-12 / bert_model.ckpt 
--max_seq_length = 64 
--train_batch_size = 8 --learning_rate = 2e-5 
--num_train_epochs = 3.0 
--output_dir =/content / bert_output/ 
--do_lower_case = True
--save_checkpoints_steps 10000
# Last few lines
INFO:tensorflow:***** Eval results *****
I0713 06:06:28.966619 139722620139392] ***** Eval results *****
INFO:tensorflow:  eval_accuracy = 0.796
I0713 06:06:28.966814 139722620139392]   eval_accuracy = 0.796
INFO:tensorflow:  eval_loss = 0.95403963
I0713 06:06:28.967138 139722620139392]   eval_loss = 0.95403963
INFO:tensorflow:  global_step = 1687
I0713 06:06:28.967317 139722620139392]   global_step = 1687
INFO:tensorflow:  loss = 0.95741796
I0713 06:06:28.967507 139722620139392]   loss = 0.95741796
  • Now we will use test data to evaluate our model with the following bash script. This script saves the predictions into a tsv file.


# code to predict bert on test.tsv
# here we use  saved training checkpoint as  initial model ! python bert /
--task_name = cola 
--do_predict = true 
--data_dir =/content / bert / IMDB_dataset 
--vocab_file =/content / uncased_L-12_H-768_A-12 / vocab.txt 
--bert_config_file =/content / uncased_L-12_H-768_A-12 / bert_config.json 
--init_checkpoint =/content / bert_output / model.ckpt-0 
--max_seq_length = 128 
--output_dir =/content / bert_output/
INFO:tensorflow:Restoring parameters from /content/bert_output/model.ckpt-1687
I0713 06:08:22.372014 140390020667264] Restoring parameters from /content/bert_output/model.ckpt-1687
INFO:tensorflow:Running local_init_op.
I0713 06:08:23.801442 140390020667264] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0713 06:08:23.859703 140390020667264] Done running local_init_op.
2020-07-13 06:08:24.453814: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library
INFO:tensorflow:prediction_loop marked as finished
I0713 06:10:02.280455 140390020667264] prediction_loop marked as finished
INFO:tensorflow:prediction_loop marked as finished
I0713 06:10:02.280870 140390020667264] prediction_loop marked as finished
  • The code below takes maximum prediction for each row of test data and store it into a list.


# code
import  csv
label_results =[]
with open('/content / bert_output / test_results.tsv') as file:
    rows = csv.reader(file, delimiter ="\t")
    for row in rows:
      data_1 =[float(i) for i in row]
  • The code below calculates accuracy and F1-score.


print("Accuracy", metrics.accuracy_score(test['polarity'], label_results))
print("F1-Score", metrics.f1_score(test['polarity'], label_results))
Accuracy 0.8548
F1-Score 0.8496894409937888
  • We have achieved 85% accuracy and F1-score on the IMDB reviews dataset while training BERT (BASE)  just for 3 epochs which is quite a good result.  Training on more epochs will certainly improve the accuracy.


 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course

My Personal Notes arrow_drop_up
Recommended Articles
Page :