Open In App

Analyzing Data in Subsets Using R

Last Updated : 27 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will explore various methods to analyze data in subsets using R Programming Language.

How to analyze data in the subsets

Analyzing data encompasses employing diverse methodologies to acquire insights, recognize patterns, and draw significant conclusions from datasets. This encompasses activities such as computing summary statistics, visualizing data, and identifying trends within the dataset. R language offers various methods or functions to analyze data in the subsets. By using these methods, can work more efficiently. Some of the methods are:

Analyzing data in subsets by using subset() Function

subset(x, subset, select, . . . .)

This method is used to analyze the data present in the subsets. In the below example, we created a data frame and analyzed the data in the subsets.

R
# Example data
data <- data.frame(
  ID = 1:10,
  Category = rep(c("A", "B"), each = 5),
  Value = rnorm(10)
)
print(data)

# Subsetting using subset() function
subset_A <- subset(data, Category == "A")
subset_B <- subset(data, Category == "B")


print("Analyzing the data in subsets")
print(subset_A)           # Print subsets
print(subset_B)

Output:

  ID Category      Value
1   1        A  1.5658719
2   2        A  0.3142731
3   3        A -1.4552153
4   4        A  0.9014216
5   5        A -0.2758858
6   6        B  1.3345081
7   7        B -1.0618629
8   8        B  1.1188082
9   9        B -1.3202145
10 10        B  1.2453632

[1] "Analyzing the data in subsets"
  ID Category      Value
1  1        A  1.5658719
2  2        A  0.3142731
3  3        A -1.4552153
4  4        A  0.9014216
5  5        A -0.2758858

   ID Category     Value
6   6        B  1.334508
7   7        B -1.061863
8   8        B  1.118808
9   9        B -1.320214
10 10        B  1.245363

In the below example, we created a data frame and analyzed the data in the subsets.

R
# creating data frame
data <- data.frame(
  ID = 1:6,
  Name = rep(c("X", "Y"), each = 3),
  Value = rnorm(6)
)
print(data)

# Subsetting using subset() function
subset_X <- subset(data, Name == "X")
subset_Y <- subset(data, Name == "Y")


print(" Analyzing the data in subsets")
print(subset_X)          
print(subset_Y)

Output:

 ID Name       Value
1  1    X -0.02737704
2  2    X  0.31270382
3  3    X -0.92980339
4  4    Y  0.43035869
5  5    Y  0.30612408
6  6    Y  0.89034199

[1] " Analyzing the data in subsets"
  ID Name       Value
1  1    X -0.02737704
2  2    X  0.31270382
3  3    X -0.92980339

  ID Name     Value
4  4    Y 0.4303587
5  5    Y 0.3061241
6  6    Y 0.8903420

Subsetting the data Frame

These method is used to analyze the data present in subsets. In the below example, we created a data frame and analyzed the data.

R
# Sample data frame
df <- data.frame(
  student_id = 1:10,
  test_score = c(80, 85, 90, 75, 95, 82, 78, 88, 92, 70),
  gender = c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F")
)

# Subset of male students
male_students <- df[df$gender == "M", ]
print(male_students)

print("Analyzing the data ")
# Summary statistics for male students
summary(male_students$test_score)

Output:

 student_id test_score gender
1          1         80      M
3          3         90      M
5          5         95      M
7          7         78      M
9          9         92      M

[1] "Analyzing the data "
      Min.  1st Qu.  Median    Mean  3rd Qu.    Max. 
       70.0    78.5    84.0         84.2     90.5    95.0 

In the below example, we created a data frame and analyzed the data in the subsets.

R
# Sample sales data
sales_data <- data.frame(
  transaction_id = 1:24,
  product_category = rep(c("Electronics", "Clothing", "Books"), each = 8),
  sales_amount = c(150, 200, 100, 120, 180, 80, 70, 90, 110, 95, 250, 300, 280, 320,
                   270, 40, 60, 50, 55, 45, 65, 78, 89, 34)
)

# Subset of sales data for Electronics category
electronics_sales <- sales_data[sales_data$product_category == "Electronics", ]

# Displaying the subset
print(electronics_sales)

Output:

  transaction_id product_category sales_amount
1 1 Electronics 150
2 2 Electronics 200
3 3 Electronics 100
4 4 Electronics 120
5 5 Electronics 180
6 6 Electronics 80
7 7 Electronics 70
8 8 Electronics 90

Conclusion

In Conclusion, we learned various methods to analyze the data in subsets. R language offers versatile tools to analyze the data in subsets.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads