Introduction to the Pokemon data in R
Last Updated :
02 Feb, 2024
Besides being a beloved property, that has attracted millions of fans around the globe, Pokémon is also an enormous amount of data that we are about to find out. In this article, we explore the finer points of Pokémon characteristics using an R programming language and Alberto Barradas’Pokémon data set.
Take a look at the magical world of Pokemon with R! Pokemon data, jampacked with numbers, types, Pokemonan, and evolutions, which is a goldmine for data lovers and prospective. dive into the procedures, and use eye-catching visuals to unlock this data’s hidden potential.
Pokemon data in R
- Each Pokemon’s attributes, which are listed in Pokedex and include HP, attack, defense, specific statistics, types(type 1 and type 2 (optional), special attack, special defense, Generations of Pokemon, and whether the Pokemon are legendary or not) determine its power and potential.
- Explain how to keep track of the dataset, and remove outliers, inconsistent data, and omitted values before submitting. One method of doing so is data wrangling.
- Look at the core of the data with the aid of exploratory data analysis (EDA), revealing hidden patterns and correlations
- In the Pokedex universe, sometimes referred to as the Clustering Universe, you can create unique tribes by grouping Pokemon together on the basis of common characteristics.
- Train your algorithms to predict evolution, type efficiency and even the ultimate champion.
- Use experimental data analysis methods, e.g. histograms, boxplots and scatter plots, in order to detect trends, associations or anomalies.
- creation of separate groups, like “fire blasters” or “wall tanks,” by grouping Pokemon based on common traits using clustering analysis.
- prediction models are being used to predict champion matches, type matches or evolutions through the use of automated learning algorithms.
Dataset Link: Introduction to the Pokemon data in R
R
library (ggplot2)
library (gridExtra)
library (plotly)
pokemon_data <- read.csv ( "Pokemon.csv" )
head (pokemon_data)
|
Output:
X. Name Type.1 Type.2 Total HP Attack Defense Sp..Atk
1 1 Bulbasaur Grass Poison 318 45 49 49 65
2 2 Ivysaur Grass Poison 405 60 62 63 80
3 3 Venusaur Grass Poison 525 80 82 83 100
4 3 VenusaurMega Venusaur Grass Poison 625 80 100 123 122
5 4 Charmander Fire 309 39 52 43 60
6 5 Charmeleon Fire 405 58 64 58 80
Sp..Def Speed Generation Legendary
1 65 45 1 False
2 80 60 1 False
3 100 80 1 False
4 120 80 1 False
5 50 65 1 False
6 65 80 1 False
Output:
'data.frame': 800 obs. of 13 variables:
$ X. : int 1 2 3 3 4 5 6 6 6 7 ...
$ Name : Factor w/ 800 levels "Abomasnow","AbomasnowMega Abomasnow",..:
$ Type.1 : Factor w/ 18 levels "Bug","Dark","Dragon",..: 10 10 10 10 7 7 7 7 7 18 ...
$ Type.2 : Factor w/ 19 levels "","Bug","Dark",..: 15 15 15 15 1 1 9 4 9 1 ...
$ Total : int 318 405 525 625 309 405 534 634 634 314 ...
$ HP : int 45 60 80 80 39 58 78 78 78 44 ...
$ Attack : int 49 62 82 100 52 64 84 130 104 48 ...
$ Defense : int 49 63 83 123 43 58 78 111 78 65 ...
$ Sp..Atk : int 65 80 100 122 60 80 109 130 159 50 ...
$ Sp..Def : int 65 80 100 120 50 65 85 85 115 64 ...
$ Speed : int 45 60 80 80 65 80 100 100 100 43 ...
$ Generation: int 1 1 1 1 1 1 1 1 1 1 ...
$ Legendary : Factor w/ 2 levels "False","True": 1 1 1 1 1 1 1 1 1 1 ...
To display the pokemon_data data frame structure, the str function is used. The names and data types of each column, such as numerics, numbers, characters or factors, shall also be printed. This also gives insight into data organisation as a whole.
Output:
X. Name Type.1 Type.2
Min. : 1.0 Abomasnow : 1 Water :112 :386
1st Qu.:184.8 AbomasnowMega Abomasnow: 1 Normal : 98 Flying : 97
Median :364.5 Abra : 1 Grass : 70 Ground : 35
Mean :362.8 Absol : 1 Bug : 69 Poison : 34
3rd Qu.:539.2 AbsolMega Absol : 1 Psychic: 57 Psychic : 33
Max. :721.0 Accelgor : 1 Fire : 52 Fighting: 26
(Other) :794 (Other):342 (Other) :189
Total HP Attack Defense
Min. :180.0 Min. : 1.00 Min. : 5 Min. : 5.00
1st Qu.:330.0 1st Qu.: 50.00 1st Qu.: 55 1st Qu.: 50.00
Median :450.0 Median : 65.00 Median : 75 Median : 70.00
Mean :435.1 Mean : 69.26 Mean : 79 Mean : 73.84
3rd Qu.:515.0 3rd Qu.: 80.00 3rd Qu.:100 3rd Qu.: 90.00
Max. :780.0 Max. :255.00 Max. :190 Max. :230.00
Sp..Atk Sp..Def Speed Generation
Min. : 10.00 Min. : 20.0 Min. : 5.00 Min. :1.000
1st Qu.: 49.75 1st Qu.: 50.0 1st Qu.: 45.00 1st Qu.:2.000
Median : 65.00 Median : 70.0 Median : 65.00 Median :3.000
Mean : 72.82 Mean : 71.9 Mean : 68.28 Mean :3.324
3rd Qu.: 95.00 3rd Qu.: 90.0 3rd Qu.: 90.00 3rd Qu.:5.000
Max. :194.00 Max. :230.0 Max. :180.00 Max. :6.000
Legendary
False:735
True : 65
Summarizing the dataset: The summary function shows a summary of the data, such as: statistics of numeric columns, mean, median, quartile, minimum, maximum and frequency counts.
R
head (pokemon_data$Name)
head (pokemon_data$Type.1)
|
Output:
[1] Bulbasaur Ivysaur Venusaur
[4] VenusaurMega Venusaur Charmander Charmeleon
800 Levels: Abomasnow AbomasnowMega Abomasnow Abra Absol ... Zygarde50% Forme
[1] Grass Grass Grass Grass Fire Fire
18 Levels: Bug Dark Dragon Electric Fairy Fighting Fire Flying ... Water
Printing the values: These lines of code gives the output of the names of the particular pokemon and their primary types.
Various Visualizations of pokemon data
This line of code shown the histogram of the attack column of the pokemon data.
R
hist (pokemon_data$Attack)
|
Output:
Introduction to the Pokemon data in R
This line of code shows the boxplots of the Defense property for different types of pokemon (type.1).
R
library (ggplot2)
library (gridExtra)
library (plotly)
type_distribution <- table (pokemon_data$Type.1)
barplot (type_distribution, main = "Distribution of Pokemon Types" ,
xlab = "Type" , ylab = "Count" , col = rainbow ( length (type_distribution)))
|
Output:
Introduction to the Pokemon data in R
This code generates individual scatter plots for each pair of attributes (HP vs Speed) with color-coding based on Pokémon types. This approach should be more manageable in terms of computation time. Adjustments can be made based on your specific needs and preferences.
R
library (ggplot2)
library (gridExtra)
library (plotly)
selected_pokemon <- pokemon_data[ sample (1: nrow (pokemon_data), 200), ]
scatter_plots <- list (
ggplot (selected_pokemon, aes (x = HP, y = Speed, color = Type.1)) +
geom_area ()+
labs (title = "Area Plot: HP vs Speed" ) +
theme_minimal ()
)
grid.arrange (grobs = scatter_plots, ncol = 1)
|
Output:
Introduction to the Pokemon data in R
A quick glance at the ratio of legendary to unlegendary Pokémon can be found in this visualisation.
R
legendary_distribution <- table (pokemon_data$Legendary)
pie (legendary_distribution, main = "Proportion of Legendary Pokemon" ,
labels = c ( "Non-Legendary" , "Legendary" ), col = c ( "skyblue" , "lightcoral" ))
|
Output:
Pie chart visualization
R
legendary_distribution <- table (pokemon_data$Generation)
pie (legendary_distribution, main = "Proportion of Generations of Pokemon" ,
labels = c ( "Generation 1" , "Generation 2" , "Generation 3" , "Generation 4" ,
"Generation 5" , "Generation 6" ),
col = c ( "blue" , "pink" , "yellow" , "violet" , "orange" , "green" ))
|
Output:
Generations of pokemons
R
selected_pokemon <- pokemon_data[ sample (1: nrow (pokemon_data), 800), ]
box_plots <- list (
ggplot (selected_pokemon, aes (x = Sp..Atk, y = Sp..Def, color = Type.1)) +
geom_boxplot ()+
labs (title = "Box plot: Sp.Attack and Sp.Defense" )+
theme_minimal ()
)
grid.arrange (grobs = box_plots, ncol = 1)
|
Output:
Introduction to the Pokemon data in R
R
selected_pokemon <- pokemon_data[ sample (1: nrow (pokemon_data), 30), ]
scatter_plots <- list (
ggplot (selected_pokemon, aes (x = Type.1, y = Name, color = Type.2)) +
geom_point () +
labs (title = "Scatter Plot: Name vs Types" ) +
theme_minimal ()
)
grid.arrange (grobs = scatter_plots, ncol = 1)
|
Output:
Introduction to the Pokemon data in R
R
selected_pokemon <- pokemon_data[ sample (1: nrow (pokemon_data), 800), ]
scatter_plots <- list (
ggplot (selected_pokemon, aes (x = Speed, y = Defense, color = Type.1)) +
geom_col ()+
labs (title = "Column Plot: Speed vs Defense" ) +
theme_minimal ()
)
grid.arrange (grobs = scatter_plots, ncol = 1)
|
Output:
Introduction to the Pokemon data in R
R
selected_pokemon <- pokemon_data[ sample (1: nrow (pokemon_data), 800), ]
scatter_plots <- list (
ggplot (selected_pokemon, aes (x = HP, y = Attack, color = Type.1)) +
geom_step ()+
labs (title = "Step Plot: HP vs Attack" ) +
theme_minimal ()
)
grid.arrange (grobs = scatter_plots, ncol = 1)
|
Output:
Introduction to the Pokemon data in R
A wealth of data was available from Alberto Barradas, and our basic visualizations gave us a look at the distribution and relationships between characteristics in Pokémon. Consider extending the analysis to include more visualisations, statistics tests and more complex analyses in order to further investigate this issue.
Share your thoughts in the comments
Please Login to comment...