Open In App

Descriptive Statistics in Julia

Julia is an appropriate programming language to perform data analysis. It has various built-in statistical functions and packages to support descriptive statistics. Descriptive Statistics helps in understanding the characteristics of the given data and to obtain a quick summary of it.

Packages required for performing Descriptive Statistics in Julia:



Steps to perform Descriptive Statistics in Julia:

Step 1: Installing Required Packages



The following command can be used to install the required packages:

Using Pkg
Pkg.add(“Distributions”)
Pkg.add(“StatsBase”)
Pkg.add(“CSV”)
Pkg.add(“Dataframes”)
Pkg.add(“StatsPlots”)

Step 2: Importing the Required Packages




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions 
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV 
  
# For creation of Data Structures 
using DataFrames  
  
# For representing various plots
using StatsPlots

Step 3: Creating stimulated Data (Random Variables)

Let’s create various variables with random data values

Example:




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions 
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV 
  
# For creation of Data Structures 
using DataFrames  
  
# For representing various plots
using StatsPlots 
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);

Step 4: Performing Descriptive statistics

The common statistical functions in Julia include mean(), median(), var(), and std() for calculating mean, median, variance and standard deviation of the data respectively. The more convenient functions aredescribe(), summarystats() from StatsBase package to perform descriptive statistics.

Example:




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100); 
  
# mean of Age variable
mean(Age)
  
# median of Age variable
median(Age)
  
# Variance of Age variable
var(Age)
  
# Standard deviation of Age variable
std(Age)
  
# Descriptive statistics of Age variable
describe(Age)
  
# summarystats function excludes type
summarystats(Age)

Output:

Step 5: Creating data frames from the stimulated data

Stimulated data should be stored in data frame objects for performing manipulation operations easily.

Example:




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# number of rows and columns
size(DF)
  
# First 5 rows
head(DF, 5)
  
# Last 5 rows
tail(DF, 5)
  
# Selecting specific data only
# Data in which BGRP=AB is printed
DFAB = DF[DF[:BGRP] .=="AB", :] 
  
# Data in which AGE>50 is printed
DF50 = DF[DF[:AGE] .>90, :]

Output:

Step 6: Descriptive Statistics using DataFrame Objects

Example:




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Perform descriptive statistics of data frame
describe(DF)

Output:

Example:




# Descriptive Statistics in Julia
# Importing required packages 
#to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Counting the number of rows 
# with blood groups A,B,O,AB
by(DF, :BGRP, DF-> DataFrame(Total = size(DF, 1)))
  
# Counting the number of rows
# with blood groups A, B, O, AB 
# using size argument
by(DF, :BGRP, size)

Output:

Example:




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Mean AGE of Blood groups A, B, AB, O
by(DF, :BGRP, DF->mean(DF.AGE))
  
# Using the describe function 
# we can get the complete descriptive statistics
by(DF, :BGRP, DF->describe(DF.AGE))

Output:

Step 7: Visualizing Data using Plots

DataFrames package works well with the Plots package using the macro functions. In the following code:

Example:




# Descriptive Statistics in Julia
# Importing required packages 
# to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Plotting density plot
@df DF density(
   :AGE,
   group = :BGRP,
   xlab = "Age",
   ylab = "Distribution"    
)

Output:

Example:




# Descriptive Statistics in Julia
# Importing required packages to perform descriptive statistics
  
# For random variable creation
using Distributions  
  
# For basic statistical operations
using StatsBase
  
# For reading and writing CSV files
using CSV  
  
# For creation of Data Structures  
using DataFrames  
  
# For representing various plots
using StatsPlots  
  
# Uniform Distribution
Age = rand(10:95, 100);  
  
# Weighted Uniform Distribution
BloodGrp = rand(["A", "B", "O", "AB"], 100);  
  
# Creation of data frame
DF = DataFrame(AGE = Age, BGRP = BloodGrp);
  
# Plotting Box plot
@df DF boxplot(
  :AGE,
  xlab = ”Age”,
  ylab = ”Distribution”    
)

Output:


Article Tags :