Dataexplorer in R

Last Updated : 28 Aug, 2023

Data explorer is a part of modeling and it is a package in R programming. It is used for data analysis. This package, which might be what we are referring to, is designed to provide a convenient interface to explore and visualize data, especially for initial exploratory data analysis (EDA) tasks.

Installation of Data Explorer

R

#Insralling required package
  install.packages("DataExplorer")
#load libraries 
  library(DataExplorer)

Here we are taking the dataset: penguins, from the Palmer penguins package, and also loading the package for usage by typing the following commands:

R

#taking data set and libraries for that dataset
install.packages("palmerpenguins")
library(palmerpenguins)

Exploratory Data Analysis:

The introduce () function gives the basic information about our dataset i.e., penguins.

R

#Gives the basic details about dataset
str(penguins)

Output:

tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
 $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
 $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
 $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
 $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
 $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
 $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
 $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...

R

#visualize the dataset
plot_intro(penguins)

Output:

To visualize the data which is shown above introduce the () function.

Plot the Missing values:

R

#missing plot values 
plot_missing(penguins)

Output:

ing

Continuous value columns with Plot_Histogram()

R

#visualize of data in histograms
plot_histogram(penguins)

Output:

ing

Correlation Plots

R

#heatmap visualisation
plot_correlation(penguins,type = "all")

Output:

Visualizes the heatmap, in this, we use the argument called type=” ” .,

Data Report of Data Explorer

create_report() is used to create the report on a dataset. and this will generate a file on our computer.

R

#creating Report of Data
create_report(
  penguins,
  output_file = "report_example.html",
  output_dir = getwd(),
  config = configure_report(),
  report_title = "Data Report"
)

From this, we get a report in HTML format that will show the complete information of the data.

Here,

data: the dataset which you want to work with
output_file: The name of the output file.
output_dir: location where the rendered
config: configuration of output
report_title: Title of the report.

Adding a ggplot2 Theme to the dataset

ggtheme() adds a ggplot2 theme, to the plot.

For example, we are taking the theme_minimal() theme.

As well title() adds title to the plot.

R

#adds theme title and theme to the plot
plot_intro(penguins,
           title = "Missing Penguin Data Plot Title",
           ggtheme = theme_minimal())

Output:

New-Doc-2023-07-19-224016_1(2)

Extra Theme Configuration

theme_config() is used to customize the elements in the plot.

plot. title is used to add color to the plot.

R

# plot_intro() with a theme and title
plot_intro(
  penguins,
  ggtheme = theme_minimal(),
  title = "A Plot Title",
  theme_config = theme(plot.title = element_text(color = "orange"))
)

Output:

First, install the Dataexplorer from CRAN:

R

#install packages
 install.packages("DataExplorer")

Report:

To get the report of dataset: air quality we have to use create _ report.

R

#installing library 
library(DataExplorer)
#report creation 
create_report(airquality)

From this, we get a report in HTML format that will show the complete information of the data.

R

# View basic description for airquality data
 introduce(airquality)

Output:

  rows columns discrete_columns continuous_columns all_missing_columns
1  153       6                0                  6                   0
  total_missing_values complete_rows total_observations memory_usage
1                   44           111                918         6376

Visual representation of our dataset

R

# Plot basic description for airquality data
 plot_intro(airquality)

Output:

New-Doc-2023-07-23-154416_1

Missing values representation

R

# View missing value distribution for airquality data
 plot_missing(airquality)

Output:

New-Doc-2023-07-23-154620_1

Histogram representation

R

#View histogram of all continuous variables
 plot_histogram(diamonds)

Output:

New-Doc-2023-07-23-154851_1

Heatmap representation

R

# View overall correlation heatmap
 plot_correlation(diamonds)

Output:

New-Doc-2023-07-23-155129_1

Bivariate continuous distribution using ‘cut’.

R

#View bivariate continuous distribution based on `cut`
 plot_boxplot(diamonds, by = "cut")

Output:

New-Doc-2023-07-23-155553_1

Estimated continuous distribution

R

# View estimated density distribution of all continuous variables
 plot_density(diamonds)

Output:

New-Doc-2023-07-23-155834_1

Suggest improvement

Write Data Into Excel Using R

Share your thoughts in the comments

Dataexplorer in R

Installation of Data Explorer

R

R

Exploratory Data Analysis:

R

R

Plot the Missing values:

R

Continuous value columns with Plot_Histogram()

R

Correlation Plots

R

Data Report of Data Explorer

R

Adding a ggplot2 Theme to the dataset

R

Extra Theme Configuration

R

First, install the Dataexplorer from CRAN:

R

R

R

Visual representation of our dataset

R

Missing values representation

R

Histogram representation

R

Heatmap representation

R

Bivariate continuous distribution using ‘cut’.

R

Estimated continuous distribution

R

Please Login to comment...

What kind of Experience do you want to share?