Skip to content
Related Articles

Related Articles

Generating Word Cloud in R Programming
  • Last Updated : 28 Jul, 2020

Word Cloud is a data visualization technique used for representing text data in which the size of each word indicates its frequency or importance. Significant textual data points can be highlighted using a word cloud. Word clouds are widely used for analyzing data from social network websites.

Why Word Cloud?

The reasons one should use word clouds to present the text data are:

  • Word clouds add simplicity and clarity. The most used keywords stand out better in a word cloud
  • Word clouds are a potent communication tool. They are easy to understand, to be shared, and are impactful.
  • Word clouds are visually engaging than a table data.

Implementation in R

Here are steps to create a word cloud in R Programming.

Step 1: Create a Text File

Copy and paste the text in a plain text file (e.g:file.txt) and save the file.

Step 2: Install and Load the Required Packages
filter_none

edit
close

play_arrow

link
brightness_4
code

# install the required packages
install.packages("tm")           # for text mining
install.packages("SnowballC")    # for text stemming
install.packages("wordcloud")    # word-cloud generator
install.packages("RColorBrewer") # color palettes
  
# load the packages
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")

chevron_right


Step 3: Text Mining
  • Load the Text:
    The text is loaded using Corpus() function from text mining(tm) package. Corpus is a list of a document.



    1. Start by importing text file created in step 1:
      To import the file saved locally in your computer, type the following R code. You will be asked to choose the text file interactively.

      filter_none

      edit
      close

      play_arrow

      link
      brightness_4
      code

      text = readLines(file.choose())

      chevron_right

      
      

    2. Load the data as a corpus:
      filter_none

      edit
      close

      play_arrow

      link
      brightness_4
      code

      # VectorSource() function 
      # creates a corpus of 
      # character vectors
      docs = Corpus(VectorSource(text))   

      chevron_right

      
      

    3. Text transformation:
      Transformation is performed using tm_map() function to replace, for example, special characters from the text like “@”, “#”, “/”.

      filter_none

      edit
      close

      play_arrow

      link
      brightness_4
      code

      toSpace = content_transformer
                   (function (x, pattern)
                    gsub(pattern, " ", x))
      docs1 = tm_map(docs, toSpace, "/")
      docs1 = tm_map(docs, toSpace, "@")
      docs1 = tm_map(docs, toSpace, "#")

      chevron_right

      
      

  • Cleaning the Text:
    The tm_map() function is used to remove unnecessary white space, to convert the text to lower case, to remove common stopwords. Numbers can be removed using removeNumbers.

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    # Convert the text to lower case
    docs1 = tm_map(docs1, 
            content_transformer(tolower))
      
    # Remove numbers
    docs1 = tm_map(docs1, removeNumbers)
      
    # Remove white spaces
    docs1 = tm_map(docs1, stripWhitespace)

    chevron_right

    
    

Step 4: Build a term-document Matrix

Document matrix is a table containing the frequency of the words. Column names are words and row names are documents. The function TermDocumentMatrix() from text mining package can be used as follows.

filter_none

edit
close

play_arrow

link
brightness_4
code

dtm = TermDocumentMatrix(docs)
m = as.matrix(dtm)
v = sort(rowSums(m), decreasing = TRUE)
d = data.frame(word = names(v), freq = v)
head(d, 10)

chevron_right


Step 5: Generate the Word Cloud

The importance of words can be illustrated as a word cloud as follows.

filter_none

edit
close

play_arrow

link
brightness_4
code

wordcloud(words = d$word, 
          freq = d$freq,
          min.freq = 1
          max.words = 200,
          random.order = FALSE, 
          rot.per = 0.35
          colors = brewer.pal(8, "Dark2"))

chevron_right


The complete code for the word cloud in R is given below.

filter_none

edit
close

play_arrow

link
brightness_4
code

# R program to illustrate
# Generating word cloud
  
# Install the required packages
install.packages("tm")           # for text mining
install.packages("SnowballC")    # for text stemming
install.packages("wordcloud")    # word-cloud generator
install.packages("RColorBrewer") # color palettes
   
# Load the packages
lirary("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
  
# To choose the text file
text = readLines(file.choose())
  
# VectorSource() function 
# creates a corpus of 
# character vectors
docs = Corpus(VectorSource(text))   
  
# Text transformation
toSpace = content_transformer(
              function (x, pattern)
              gsub(pattern, " ", x))
docs1 = tm_map(docs, toSpace, "/")
docs1 = tm_map(docs, toSpace, "@")
docs1 = tm_map(docs, toSpace, "#")
strwrap(docs1)
  
# Cleaning the Text
docs1 = tm_map(docs1, content_transformer(tolower))
docs1 = tm_map(docs1, removeNumbers)
docs1 = tm_map(docs1, stripWhitespace)
  
# Build a term-document matrix
dtm = TermDocumentMatrix(docs)
m = as.matrix(dtm)
v = sort(rowSums(m), 
         decreasing = TRUE)
d = data.frame(word = names(v),
               freq = v)
head(d, 10)
  
# Generate the Word cloud
wordcloud(words = d$word, 
          freq = d$freq,
          min.freq = 1
          max.words = 200,
          random.order = FALSE, 
          rot.per = 0.35
          colors = brewer.pal(8, "Dark2"))

chevron_right


Output:
output screen
output screen

Advantages of Word Clouds

  • Analyzing customer and employee feedback.
  • Identifying new SEO keywords to target.
  • Word clouds are killer visualisation tools. They present text data in a simple and clear format
  • Word clouds are great communication tools. They are incredibly handy for anyone wishing to communicate a basic insight

Drawbacks of Word Clouds

  • Word Clouds are not perfect for every situation.
  • Data should be optimized for context.
  • Word clouds typically fail to give the actionable insights that needs to improve and grow the business.
My Personal Notes arrow_drop_up
Recommended Articles
Page :