Open In App

Introduction to the Pokemon data in R

Besides being a beloved property, that has attracted millions of fans around the globe, Pokémon is also an enormous amount of data that we are about to find out. In this article, we explore the finer points of Pokémon characteristics using an R programming language and Alberto Barradas’Pokémon data set.

Take a look at the magical world of Pokemon with R! Pokemon data, jampacked with numbers, types, Pokemonan, and evolutions, which is a goldmine for data lovers and prospective. dive into the procedures, and use eye-catching visuals to unlock this data’s hidden potential.



Pokemon data in R

Dataset Link: Introduction to the Pokemon data in R




library(ggplot2)
library(gridExtra)
library(plotly)
# Load the data
pokemon_data <- read.csv("Pokemon.csv")
# View the first few rows
head(pokemon_data)

Output:



  X.                  Name Type.1 Type.2 Total HP Attack Defense Sp..Atk
1  1             Bulbasaur  Grass Poison   318 45     49      49      65
2  2               Ivysaur  Grass Poison   405 60     62      63      80
3  3              Venusaur  Grass Poison   525 80     82      83     100
4  3 VenusaurMega Venusaur  Grass Poison   625 80    100     123     122
5  4            Charmander   Fire          309 39     52      43      60
6  5            Charmeleon   Fire          405 58     64      58      80
  Sp..Def Speed Generation Legendary
1      65    45          1     False
2      80    60          1     False
3     100    80          1     False
4     120    80          1     False
5      50    65          1     False
6      65    80          1     False




# View the structure of the data
str(pokemon_data)

Output:

'data.frame':    800 obs. of  13 variables:
 $ X.        : int  1 2 3 3 4 5 6 6 6 7 ...
 $ Name      : Factor w/ 800 levels "Abomasnow","AbomasnowMega Abomasnow",..:
 $ Type.1    : Factor w/ 18 levels "Bug","Dark","Dragon",..: 10 10 10 10 7 7 7 7 7 18 ...
 $ Type.2    : Factor w/ 19 levels "","Bug","Dark",..: 15 15 15 15 1 1 9 4 9 1 ...
 $ Total     : int  318 405 525 625 309 405 534 634 634 314 ...
 $ HP        : int  45 60 80 80 39 58 78 78 78 44 ...
 $ Attack    : int  49 62 82 100 52 64 84 130 104 48 ...
 $ Defense   : int  49 63 83 123 43 58 78 111 78 65 ...
 $ Sp..Atk   : int  65 80 100 122 60 80 109 130 159 50 ...
 $ Sp..Def   : int  65 80 100 120 50 65 85 85 115 64 ...
 $ Speed     : int  45 60 80 80 65 80 100 100 100 43 ...
 $ Generation: int  1 1 1 1 1 1 1 1 1 1 ...
 $ Legendary : Factor w/ 2 levels "False","True": 1 1 1 1 1 1 1 1 1 1 ...

To display the pokemon_data data frame structure, the str function is used. The names and data types of each column, such as numerics, numbers, characters or factors, shall also be printed. This also gives insight into data organisation as a whole.




# Summary of the data
summary(pokemon_data)

Output:

       X.                             Name         Type.1         Type.2   
 Min.   :  1.0   Abomasnow              :  1   Water  :112           :386  
 1st Qu.:184.8   AbomasnowMega Abomasnow:  1   Normal : 98   Flying  : 97  
 Median :364.5   Abra                   :  1   Grass  : 70   Ground  : 35  
 Mean   :362.8   Absol                  :  1   Bug    : 69   Poison  : 34  
 3rd Qu.:539.2   AbsolMega Absol        :  1   Psychic: 57   Psychic : 33  
 Max.   :721.0   Accelgor               :  1   Fire   : 52   Fighting: 26  
                 (Other)                :794   (Other):342   (Other) :189  
     Total             HP             Attack       Defense      
 Min.   :180.0   Min.   :  1.00   Min.   :  5   Min.   :  5.00  
 1st Qu.:330.0   1st Qu.: 50.00   1st Qu.: 55   1st Qu.: 50.00  
 Median :450.0   Median : 65.00   Median : 75   Median : 70.00  
 Mean   :435.1   Mean   : 69.26   Mean   : 79   Mean   : 73.84  
 3rd Qu.:515.0   3rd Qu.: 80.00   3rd Qu.:100   3rd Qu.: 90.00  
 Max.   :780.0   Max.   :255.00   Max.   :190   Max.   :230.00  
                                                                
    Sp..Atk          Sp..Def          Speed          Generation   
 Min.   : 10.00   Min.   : 20.0   Min.   :  5.00   Min.   :1.000  
 1st Qu.: 49.75   1st Qu.: 50.0   1st Qu.: 45.00   1st Qu.:2.000  
 Median : 65.00   Median : 70.0   Median : 65.00   Median :3.000  
 Mean   : 72.82   Mean   : 71.9   Mean   : 68.28   Mean   :3.324  
 3rd Qu.: 95.00   3rd Qu.: 90.0   3rd Qu.: 90.00   3rd Qu.:5.000  
 Max.   :194.00   Max.   :230.0   Max.   :180.00   Max.   :6.000  
                                                                  
 Legendary  
 False:735  
 True : 65 

Summarizing the dataset: The summary function shows a summary of the data, such as: statistics of numeric columns, mean, median, quartile, minimum, maximum and frequency counts.




# Explore specific columns
# View names of Pokemon
head(pokemon_data$Name) 
# View primary types
head(pokemon_data$Type.1)

Output:

[1] Bulbasaur             Ivysaur               Venusaur             
[4] VenusaurMega Venusaur Charmander            Charmeleon           
800 Levels: Abomasnow AbomasnowMega Abomasnow Abra Absol ... Zygarde50% Forme

[1] Grass Grass Grass Grass Fire  Fire 
18 Levels: Bug Dark Dragon Electric Fairy Fighting Fire Flying ... Water

Printing the values: These lines of code gives the output of the names of the particular pokemon and their primary types.

Various Visualizations of pokemon data

This line of code shown the histogram of the attack column of the pokemon data.




# Histogram of attack values
hist(pokemon_data$Attack)

Output:

Introduction to the Pokemon data in R

This line of code shows the boxplots of the Defense property for different types of pokemon (type.1).




library(ggplot2)
library(gridExtra)
library(plotly)
 
 
# Bar plot of Pokemon types
type_distribution <- table(pokemon_data$Type.1)
barplot(type_distribution, main = "Distribution of Pokemon Types",
        xlab = "Type", ylab = "Count", col = rainbow(length(type_distribution)))

Output:

Introduction to the Pokemon data in R

This code generates individual scatter plots for each pair of attributes (HP vs Speed) with color-coding based on Pokémon types. This approach should be more manageable in terms of computation time. Adjustments can be made based on your specific needs and preferences.




# Install and load necessary packages
# install.packages("ggplot2")
library(ggplot2)
library(gridExtra)
library(plotly)
 
# Select a subset of Pokemon for illustration
selected_pokemon <- pokemon_data[sample(1:nrow(pokemon_data), 200), ]
 
# Create scatter plots for each pair of attributes
scatter_plots <- list(
  ggplot(selected_pokemon, aes(x = HP, y = Speed, color = Type.1)) +
    geom_area()+
    labs(title = "Area Plot: HP vs Speed") +
    theme_minimal()
   
)
grid.arrange(grobs = scatter_plots, ncol = 1)

Output:

Introduction to the Pokemon data in R

A quick glance at the ratio of legendary to unlegendary Pokémon can be found in this visualisation.




# Pie chart of Legendary Status
legendary_distribution <- table(pokemon_data$Legendary)
pie(legendary_distribution, main = "Proportion of Legendary Pokemon",
    labels = c("Non-Legendary", "Legendary"), col = c("skyblue", "lightcoral"))

Output:

Pie chart visualization




# Pie chart of Legendary Status
legendary_distribution <- table(pokemon_data$Generation)
pie(legendary_distribution, main = "Proportion of Generations of Pokemon",
    labels = c("Generation 1", "Generation 2", "Generation 3","Generation 4",
               "Generation 5","Generation 6"),
    col = c("blue","pink","yellow","violet","orange","green"))

Output:

Generations of pokemons




# Select a subset of Pokemon for illustration
selected_pokemon <- pokemon_data[sample(1:nrow(pokemon_data), 800), ]
 
# Create box plots for each pair of attributes
box_plots <- list(
  ggplot(selected_pokemon, aes(x = Sp..Atk, y = Sp..Def, color = Type.1)) +
    geom_boxplot()+
    labs(title = "Box plot: Sp.Attack and Sp.Defense")+
    theme_minimal()
   
)
 
grid.arrange(grobs = box_plots, ncol = 1)

Output:

Introduction to the Pokemon data in R




# Select a subset of Pokemon for illustration
selected_pokemon <- pokemon_data[sample(1:nrow(pokemon_data), 30), ]
 
# Create scatter plots for each pair of attributes
scatter_plots <- list(
  ggplot(selected_pokemon, aes(x = Type.1, y = Name, color = Type.2)) +
    geom_point() +
    labs(title = "Scatter Plot: Name vs Types") +
    theme_minimal()
   
)
grid.arrange(grobs = scatter_plots, ncol = 1)

Output:

Introduction to the Pokemon data in R




# Select a subset of Pokemon for illustration
selected_pokemon <- pokemon_data[sample(1:nrow(pokemon_data), 800), ]
 
# Create scatter plots for each pair of attributes
scatter_plots <- list(
  ggplot(selected_pokemon, aes(x = Speed, y = Defense, color = Type.1)) +
    geom_col()+
    labs(title = "Column Plot: Speed vs Defense") +
    theme_minimal()
   
)
grid.arrange(grobs = scatter_plots, ncol = 1)

Output:

Introduction to the Pokemon data in R




# Select a subset of Pokemon for illustration
selected_pokemon <- pokemon_data[sample(1:nrow(pokemon_data), 800), ]
 
# Create scatter plots for each pair of attributes
scatter_plots <- list(
  ggplot(selected_pokemon, aes(x = HP, y = Attack, color = Type.1)) +
    geom_step()+
    labs(title = "Step Plot: HP vs Attack") +
    theme_minimal()
   
)
grid.arrange(grobs = scatter_plots, ncol = 1)

Output:

Introduction to the Pokemon data in R

A wealth of data was available from Alberto Barradas, and our basic visualizations gave us a look at the distribution and relationships between characteristics in Pokémon. Consider extending the analysis to include more visualisations, statistics tests and more complex analyses in order to further investigate this issue.


Article Tags :