BERT stands for Bidirectional Representation for Transformers, was proposed by researchers at Google AI language in 2018. Although the main aim of that was to improve the understanding of the meaning of queries related to Google Search, BERT becomes one of the most important and complete architecture for various natural language tasks having generated state-of-the-art results on Sentence pair classification task, question-answer task, etc. For more details on the architecture please look at this article
One of the most important features of BERT is that its adaptability to perform different NLP tasks with state-of-the-art accuracy (similar to the transfer learning we used in Computer vision). For that, the paper also proposed the architecture of different tasks. In this post, we will be using BERT architecture for single sentence classification tasks specifically the architecture used for CoLA (Corpus of Linguistic Acceptability) binary classification task. In the previous post about BERT, we discussed BERT architecture in detail, but let’s recap some of the important details of it:
BERT has proposed in the two versions:
- BERT (BASE): 12 layers of encoder stack with 12 bidirectional self-attention heads and 768 hidden units.
- BERT (LARGE): 24 layers of encoder stack with 24 bidirectional self-attention heads and 1024 hidden units.
For TensorFlow implementation, Google has provided two versions of both the BERT BASE and BERT LARGE: Uncased and Cased. In an uncased version, letters are lowercased before WordPiece tokenization.
- First, we need to clone the GitHub repo to BERT to make the setup easier.
Cloning into 'bert'... remote: Enumerating objects: 340, done. remote: Total 340 (delta 0), reused 0 (delta 0), pack-reused 340 Receiving objects: 100% (340/340), 317.20 KiB | 584.00 KiB/s, done. Resolving deltas: 100% (185/185), done.
- Now, we need to download the BERTBASE model using the following link and unzip it into the working directory ( or the desired location).
Archive: uncased_L-12_H-768_A-12.zip creating: uncased_L-12_H-768_A-12/ inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.meta inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.data-00000-of-00001 inflating: uncased_L-12_H-768_A-12/vocab.txt inflating: uncased_L-12_H-768_A-12/bert_model.ckpt.index inflating: uncased_L-12_H-768_A-12/bert_config.json
- We will be using the TensorFlow 1x version. In Google colab there is a magic function called tensorflow_version that can switch different versions.
TensorFlow 1.x selected.
- Now, we will import modules necessary for running this project, we will be using NumPy, scikit-learn and Keras from TensorFlow inbuilt modules. These are already preinstalled in colab, make sure to install these in your environment.
- Now we will load IMDB sentiments datasets and do some preprocessing before training. For loading the IMDB dataset from TensorFlow Hub, we will follow this tutorial.
Downloading data from http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz 84131840/84125825 [==============================] - 8s 0us/step ((25000, 3), (25000, 3))
- This dataset contains 50k reviews 25k for each training and test, we will sample 5k reviews from each test and train. Also, both test and train dataset contains 3 columns whose list is given below
(Index(['sentence', 'sentiment', 'polarity'], dtype='object'), Index(['sentence', 'sentiment', 'polarity'], dtype='object'))
- Now, we need to convert the specific format that is required by the BERT model to train and predict, for that we will use pandas dataframe. Below are the columns required in BERT training and test format:
- GUID: An id for the row. Required for both train and test data
- Class label.: A value of 0 or 1 depending on positive and negative sentiment.
- alpha: This is a dummy column for text classification but is expected by BERT during training.
- text: The review text of the data point which needed to be classified. Obviously required for both training and test
guid label alpha text 14930 0 1 a William Hurt may not be an American matinee id... 1445 1 1 a Rock solid giallo from a master filmmaker of t... 16943 2 1 a This movie surprised me. Some things were "cli... 6391 3 1 a This film may seem dated today, but remember t... 4526 4 0 a The Twilight Zone has achieved a certain mytho... ----- guid text 20010 0 One of Alfred Hitchcock's three greatest films... 16132 1 Hitchcock once gave an interview where he said... 24947 2 I had nothing to do before going out one night... 5471 3 tell you what that was excellent. Dylan Moran ... 21075 4 I watched this show until my puberty but still...
- Now, we split the data into three parts: train, dev, and test and save it into tsv file save it into a folder (here “IMDB Dataset”). This is because run classifier file requires dataset in tsv format.
- In this step, we train the model using the following command, for executing bash commands on colab, we use ! sign in front of the command. The run_classifier file trains the model with the help of given command. Due to time and resource constraints, we will run it only on 3 epochs.
# Last few lines INFO:tensorflow:***** Eval results ***** I0713 06:06:28.966619 139722620139392 run_classifier.py:923] ***** Eval results ***** INFO:tensorflow: eval_accuracy = 0.796 I0713 06:06:28.966814 139722620139392 run_classifier.py:925] eval_accuracy = 0.796 INFO:tensorflow: eval_loss = 0.95403963 I0713 06:06:28.967138 139722620139392 run_classifier.py:925] eval_loss = 0.95403963 INFO:tensorflow: global_step = 1687 I0713 06:06:28.967317 139722620139392 run_classifier.py:925] global_step = 1687 INFO:tensorflow: loss = 0.95741796 I0713 06:06:28.967507 139722620139392 run_classifier.py:925] loss = 0.95741796
- Now we will use test data to evaluate our model with the following bash script. This script saves the predictions into a tsv file.
INFO:tensorflow:Restoring parameters from /content/bert_output/model.ckpt-1687 I0713 06:08:22.372014 140390020667264 saver.py:1284] Restoring parameters from /content/bert_output/model.ckpt-1687 INFO:tensorflow:Running local_init_op. I0713 06:08:23.801442 140390020667264 session_manager.py:500] Running local_init_op. INFO:tensorflow:Done running local_init_op. I0713 06:08:23.859703 140390020667264 session_manager.py:502] Done running local_init_op. 2020-07-13 06:08:24.453814: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 INFO:tensorflow:prediction_loop marked as finished I0713 06:10:02.280455 140390020667264 error_handling.py:101] prediction_loop marked as finished INFO:tensorflow:prediction_loop marked as finished I0713 06:10:02.280870 140390020667264 error_handling.py:101] prediction_loop marked as finished
- The code below takes maximum prediction for each row of test data and store it into a list.
- The code below calculates accuracy and F1-score.
Accuracy 0.8548 F1-Score 0.8496894409937888
- We have achieved 85% accuracy and F1-score on the IMDB reviews dataset while training BERT (BASE) just for 3 epochs which is quite a good result. Training on more epochs will certainly improve the accuracy.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course