Analyzing Data in Subsets Using R

Last Updated : 27 Mar, 2024

In this article, we will explore various methods to analyze data in subsets using R Programming Language.

How to analyze data in the subsets

Analyzing data encompasses employing diverse methodologies to acquire insights, recognize patterns, and draw significant conclusions from datasets. This encompasses activities such as computing summary statistics, visualizing data, and identifying trends within the dataset. R language offers various methods or functions to analyze data in the subsets. By using these methods, can work more efficiently. Some of the methods are:

Analyzing data in subsets by using subset() Function

subset(x, subset, select, . . . .)

This method is used to analyze the data present in the subsets. In the below example, we created a data frame and analyzed the data in the subsets.

# Example data
data <- data.frame(
  ID = 1:10,
  Category = rep(c("A", "B"), each = 5),
  Value = rnorm(10)
)
print(data)

# Subsetting using subset() function
subset_A <- subset(data, Category == "A")
subset_B <- subset(data, Category == "B")


print("Analyzing the data in subsets")
print(subset_A)           # Print subsets
print(subset_B)

Output:

  ID Category      Value
1   1        A  1.5658719
2   2        A  0.3142731
3   3        A -1.4552153
4   4        A  0.9014216
5   5        A -0.2758858
6   6        B  1.3345081
7   7        B -1.0618629
8   8        B  1.1188082
9   9        B -1.3202145
10 10        B  1.2453632

[1] "Analyzing the data in subsets"
  ID Category      Value
1  1        A  1.5658719
2  2        A  0.3142731
3  3        A -1.4552153
4  4        A  0.9014216
5  5        A -0.2758858

   ID Category     Value
6   6        B  1.334508
7   7        B -1.061863
8   8        B  1.118808
9   9        B -1.320214
10 10        B  1.245363

In the below example, we created a data frame and analyzed the data in the subsets.

# creating data frame
data <- data.frame(
  ID = 1:6,
  Name = rep(c("X", "Y"), each = 3),
  Value = rnorm(6)
)
print(data)

# Subsetting using subset() function
subset_X <- subset(data, Name == "X")
subset_Y <- subset(data, Name == "Y")


print(" Analyzing the data in subsets")
print(subset_X)          
print(subset_Y)

Output:

 ID Name       Value
1  1    X -0.02737704
2  2    X  0.31270382
3  3    X -0.92980339
4  4    Y  0.43035869
5  5    Y  0.30612408
6  6    Y  0.89034199

[1] " Analyzing the data in subsets"
  ID Name       Value
1  1    X -0.02737704
2  2    X  0.31270382
3  3    X -0.92980339

  ID Name     Value
4  4    Y 0.4303587
5  5    Y 0.3061241
6  6    Y 0.8903420

Subsetting the data Frame

These method is used to analyze the data present in subsets. In the below example, we created a data frame and analyzed the data.

# Sample data frame
df <- data.frame(
  student_id = 1:10,
  test_score = c(80, 85, 90, 75, 95, 82, 78, 88, 92, 70),
  gender = c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F")
)

# Subset of male students
male_students <- df[df$gender == "M", ]
print(male_students)

print("Analyzing the data ")
# Summary statistics for male students
summary(male_students$test_score)

Output:

 student_id test_score gender
1          1         80      M
3          3         90      M
5          5         95      M
7          7         78      M
9          9         92      M

[1] "Analyzing the data "
      Min.  1st Qu.  Median    Mean  3rd Qu.    Max. 
       70.0    78.5    84.0         84.2     90.5    95.0

In the below example, we created a data frame and analyzed the data in the subsets.

# Sample sales data
sales_data <- data.frame(
  transaction_id = 1:24,
  product_category = rep(c("Electronics", "Clothing", "Books"), each = 8),
  sales_amount = c(150, 200, 100, 120, 180, 80, 70, 90, 110, 95, 250, 300, 280, 320,
                   270, 40, 60, 50, 55, 45, 65, 78, 89, 34)
)

# Subset of sales data for Electronics category
electronics_sales <- sales_data[sales_data$product_category == "Electronics", ]

# Displaying the subset
print(electronics_sales)

Output:

  transaction_id product_category sales_amount
1              1      Electronics          150
2              2      Electronics          200
3              3      Electronics          100
4              4      Electronics          120
5              5      Electronics          180
6              6      Electronics           80
7              7      Electronics           70
8              8      Electronics           90

Conclusion

In Conclusion, we learned various methods to analyze the data in subsets. R language offers versatile tools to analyze the data in subsets.

Suggest improvement

Subset Data Frames Using Logical Conditions In R

Share your thoughts in the comments

Analyzing Data in Subsets Using R

How to analyze data in the subsets

Analyzing data in subsets by using subset() Function

Subsetting the data Frame

Conclusion

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?