Data visualization with R and ggplot2
ggplot2 package in R Programming Language also termed as Grammar of Graphics is a free, open-source, and easy-to-use visualization package widely used in R. It is the most powerful visualization package written by Hadley Wickham.
It includes several layers on which it is governed. The layers are as follows:
Building Blocks of layers with the grammar of graphics
- Data: The element is the data set itself
- Aesthetics: The data is to map onto the Aesthetics attributes such as x-axis, y-axis, color, fill, size, labels, alpha, shape, line width, line type
- Geometrics: How our data being displayed using point, line, histogram, bar, boxplot
- Facets: It displays the subset of the data using Columns and rows
- Statistics: Binning, smoothing, descriptive, intermediate
- Coordinates: the space between data and display using Cartesian, fixed, polar, limits
- Themes: Non-data link
Dataset Used
mtcars(motor trend car road test) comprise fuel consumption and 10 aspects of automobile design and performance for 32 automobiles and come pre-installed with dplyr package in R.
R
# Installing the package install.packages ( "dplyr" ) # Loading package library (dplyr) # Summary of dataset in package summary (mtcars) |
Output:
mpg cyl disp hp Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 Median :19.20 Median :6.000 Median :196.3 Median :123.0 Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0 drat wt qsec vs Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000 Median :3.695 Median :3.325 Median :17.71 Median :0.0000 Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000 Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000 am gear carb Min. :0.0000 Min. :3.000 Min. :1.000 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000 Median :0.0000 Median :4.000 Median :2.000 Mean :0.4062 Mean :3.688 Mean :2.812 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000 Max. :1.0000 Max. :5.000 Max. :8.000
Example of ggplot2 package in R Programming
We devise visualizations on mtcars dataset which includes 32 car brands and 11 attributes using ggplot2 layers.
Data Layer:
In the data Layer we define the source of the information to be visualize, let’s use the mtcars dataset in the ggplot2 package
R
# Loading packages library (ggplot2) library (dplyr) # Data Layer ggplot (data = mtcars) |
Output:
Aesthetic Layer:
Here we will display and map dataset into certain aesthetics.
R
# Aesthetic Layer ggplot (data = mtcars, aes (x = hp, y = mpg, col = disp)) |
Output:
Geometric layer:
In geometric layer control the essential elements, see how our data being displayed using point, line, histogram, bar, boxplot
R
# Geometric layer ggplot (data = mtcars, aes (x = hp, y = mpg, col = disp)) + geom_point () |
Output:
Geometric layer: Adding Size, color, and shape and then plotting Histogram plot
R
# Adding size ggplot (data = mtcars, aes (x = hp, y = mpg, size = disp)) + geom_point () # Adding color and shape ggplot (data = mtcars, aes (x = hp, y = mpg, col = factor (cyl), shape = factor (am))) + geom_point () # Histogram plot ggplot (data = mtcars, aes (x = hp)) + geom_histogram (binwidth = 5) |
Output:
Facet Layer:
It is used to split the data up into subsets of the entire dataset and it allows the subsets to be visualized on the same plot. Here we separate rows according to transmission type and Separate columns according to cylinders
R
# Facet Layer p <- ggplot (data = mtcars, aes (x = hp, y = mpg, shape = factor (cyl))) + geom_point () # Separate rows according to transmission type p + facet_grid (am ~ .) # Separate columns according to cylinders p + facet_grid (. ~ cyl) |
Output:
Statistics layer
In this layer, we transform our data using binning, smoothing, descriptive, intermediate
R
# Statistics layer ggplot (data = mtcars, aes (x = hp, y = mpg)) + geom_point () + stat_smooth (method = lm, col = "red" ) |
Output:
Coordinates layer:
In these layers, data coordinates are mapped together to the mentioned plane of the graphic and we adjust the axis and changes the spacing of displayed data with Control plot dimensions.
R
# Coordinates layer: Control plot dimensions ggplot (data = mtcars, aes (x = wt, y = mpg)) + geom_point () + stat_smooth (method = lm, col = "red" ) + scale_y_continuous ( "mpg" , limits = c (2, 35), expand = c (0, 0)) + scale_x_continuous ( "wt" , limits = c (0, 25), expand = c (0, 0)) + coord_equal () |
Output:
Coord_cartesian() to proper zoom in:
R
# Add coord_cartesian() to proper zoom in ggplot (data = mtcars, aes (x = wt, y = hp, col = am)) + geom_point () + geom_smooth () + coord_cartesian (xlim = c (3, 6)) |
Output:
Theme Layer:
This layer controls the finer points of display like the font size and background color properties.
Example 1: Theme layer – element_rect() function
R
# Theme layer ggplot (data = mtcars, aes (x = hp, y = mpg)) + geom_point () + facet_grid (. ~ cyl) + theme (plot.background = element_rect ( fill = "black" , colour = "gray" )) |
Output:
Example 2:
R
ggplot (data = mtcars, aes (x = hp, y = mpg)) + geom_point () + facet_grid (am ~ cyl) + theme_gray () |
Output:
ggplot2 provides various types of visualizations. More parameters can be used included in the package as the package gives greater control over the visualizations of data. Many packages can integrate with the ggplot2 package to make the visualizations interactive and animated.
Please Login to comment...