Generating Word Cloud in R Programming

Last Updated : 29 Jul, 2021

Word Cloud is a data visualization technique used for representing text data in which the size of each word indicates its frequency or importance. Significant textual data points can be highlighted using a word cloud. Word clouds are widely used for analyzing data from social network websites.

Why Word Cloud?

The reasons one should use word clouds to present the text data are:

Word clouds add simplicity and clarity. The most used keywords stand out better in a word cloud
Word clouds are a potent communication tool. They are easy to understand, to be shared, and are impactful.
Word clouds are visually engaging than a table data.

Implementation in R

Here are steps to create a word cloud in R Programming.

Step 1: Create a Text File

Copy and paste the text in a plain text file (e.g:file.txt) and save the file.

Step 2: Install and Load the Required Packages

Python3

# install the required packages 
install.packages("tm")           # for text mining 
install.packages("SnowballC")    # for text stemming 
install.packages("wordcloud")    # word-cloud generator 
install.packages("RColorBrewer") # color palettes 
  
# load the packages 
library("tm") 
library("SnowballC") 
library("wordcloud") 
library("RColorBrewer") 

Step 3: Text Mining

Load the Text:
The text is loaded using Corpus() function from text mining(tm) package. Corpus is a list of a document.

Start by importing text file created in step 1:
To import the file saved locally in your computer, type the following R code. You will be asked to choose the text file interactively.
Python3
text = readLines(file.choose())
Load the data as a corpus:
Python3
# VectorSource() function

# creates a corpus of

# character vectors

docs = Corpus(VectorSource(text))

Text transformation:
Transformation is performed using tm_map() function to replace, for example, special characters from the text like “@”, “#”, “/”.

Python3

toSpace = content_transformer 
             (function (x, pattern) 
              gsub(pattern, " ", x)) 
docs1 = tm_map(docs, toSpace, "/") 
docs1 = tm_map(docs, toSpace, "@") 
docs1 = tm_map(docs, toSpace, "#") 

Cleaning the Text:
The tm_map() function is used to remove unnecessary white space, to convert the text to lower case, to remove common stopwords. Numbers can be removed using removeNumbers.

Python3

# Convert the text to lower case 
docs1 = tm_map(docs1,  
        content_transformer(tolower)) 
  
# Remove numbers 
docs1 = tm_map(docs1, removeNumbers) 
  
# Remove white spaces 
docs1 = tm_map(docs1, stripWhitespace) 

Step 4: Build a term-document Matrix

Document matrix is a table containing the frequency of the words. Column names are words and row names are documents. The function TermDocumentMatrix() from text mining package can be used as follows.

Python3

dtm = TermDocumentMatrix(docs) 
m = as.matrix(dtm) 
v = sort(rowSums(m), decreasing = TRUE) 
d = data.frame(word = names(v), freq = v) 
head(d, 10) 

Step 5: Generate the Word Cloud

The importance of words can be illustrated as a word cloud as follows.

Python3

wordcloud(words = d$word,  
          freq = d$freq, 
          min.freq = 1,  
          max.words = 200, 
          random.order = FALSE,  
          rot.per = 0.35,  
          colors = brewer.pal(8, "Dark2")) 

The complete code for the word cloud in R is given below.

Python3

# R program to illustrate 
# Generating word cloud 
  
# Install the required packages 
install.packages("tm")           # for text mining 
install.packages("SnowballC")    # for text stemming 
install.packages("wordcloud")    # word-cloud generator 
install.packages("RColorBrewer") # color palettes 
   
# Load the packages 
library("tm") 
library("SnowballC") 
library("wordcloud") 
library("RColorBrewer") 
  
# To choose the text file 
text = readLines(file.choose()) 
  
# VectorSource() function  
# creates a corpus of  
# character vectors 
docs = Corpus(VectorSource(text))    
  
# Text transformation 
toSpace = content_transformer( 
              function (x, pattern) 
              gsub(pattern, " ", x)) 
docs1 = tm_map(docs, toSpace, "/") 
docs1 = tm_map(docs, toSpace, "@") 
docs1 = tm_map(docs, toSpace, "#") 
strwrap(docs1) 
  
# Cleaning the Text 
docs1 = tm_map(docs1, content_transformer(tolower)) 
docs1 = tm_map(docs1, removeNumbers) 
docs1 = tm_map(docs1, stripWhitespace) 
  
# Build a term-document matrix 
dtm = TermDocumentMatrix(docs) 
m = as.matrix(dtm) 
v = sort(rowSums(m),  
         decreasing = TRUE) 
d = data.frame(word = names(v), 
               freq = v) 
head(d, 10) 
  
# Generate the Word cloud 
wordcloud(words = d$word,  
          freq = d$freq, 
          min.freq = 1,  
          max.words = 200, 
          random.order = FALSE,  
          rot.per = 0.35,  
          colors = brewer.pal(8, "Dark2")) 

Output:

output screen

Advantages of Word Clouds

Analyzing customer and employee feedback.
Identifying new SEO keywords to target.
Word clouds are killer visualisation tools. They present text data in a simple and clear format
Word clouds are great communication tools. They are incredibly handy for anyone wishing to communicate a basic insight

Drawbacks of Word Clouds

Word Clouds are not perfect for every situation.
Data should be optimized for context.
Word clouds typically fail to give the actionable insights that needs to improve and grow the business.

Suggest improvement

Fligner-Killeen Test in R Programming

Bootstrap Confidence Interval with R Programming

Share your thoughts in the comments

Generating Word Cloud in R Programming

Why Word Cloud?

Implementation in R

Python3

Python3

Python3

Python3

Python3

Python3

Python3

Python3

Advantages of Word Clouds

Drawbacks of Word Clouds

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?