R Programming Language is widely used by data scientists and analysts. This language provides various packages and libraries that are user-friendly making analysis easier. The rich set of functions this language provides helps in data manipulation and transformation. In this article, we will understand R’s powerful functions and its uses.
Difference between Sorting, Ordering, and Ranking Function in R
sort()
|
This function is used to manage or arrange the dataset in ascending or descending order.
|
order()
|
This function returns the order or permutation of the dataset. It works on indexing.
|
rank()
|
This function works with various other methods like ‘min’, and ‘average’, and it gives the rank of the dataset.
|
Sorting
Sorting Data frame is a fundamental function in data analysis, it helps us in handling data and arranging it in a meaningful order. We can understand this with the help of multiple examples mentioned below. sort() function is used to arrange our data frame. To use this function first we need to check if our dataset is sorted or not.
How to check if the data is sorted?
We can use is.unsorted() function to check if our dataset is sorted or not.
R
sample_data <- data.frame (
ID = c (101, 102, 103, 104, 105),
Value = c (15, 22, 30, 18, 25)
)
cat ( "Original Data Frame:\n" )
print (sample_data)
cat ( "\nIs 'Value' column sorted in ascending order?\n" )
print ( is.unsorted (sample_data$Value))
|
Output:
Original Data Frame:
ID Value
1 101 15
2 102 22
3 103 30
4 104 18
5 105 25
[1] TRUE
As we can see it is already sorted, so the output is TRUE.
Sorting Numeric Vector
Let’s assume we have a fictional dataset representing the marks of students and we want to arrange it in order so that we can get a topper of the class.
R
scores <- c (85, 92, 78, 95, 89)
sorted_scores <- sort (scores)
print (sorted_scores)
|
Output:
[1] 78 85 89 92 95
Sorting Character Vector
This function also helps us sort character vectors alphabetically in order.
R
cities <- c ( "New York" , "London" , "Paris" , "Tokyo" , "Sydney" )
sorted_cities <- sort (cities)
print (sorted_cities)
|
Output:
[1] "London" "New York" "Paris" "Sydney" "Tokyo"
Sorting Data Frame
We can also use this function to sort data frames.
R
employee_data <- data.frame (
Name = c ( "Alice" , "Bob" , "Charlie" , "David" ),
Salary = c (60000, 75000, 50000, 90000)
)
sorted_employee_data <- employee_data[ order (employee_data$Salary), ]
print (sorted_employee_data)
|
Output:
Name Salary
3 Charlie 50000
1 Alice 60000
2 Bob 75000
4 David 90000
Sorting Laptop Dataset
We will use an external dataset based on Laptop Price.
Loading and understanding the dataset
We will first load the dataset using the read.csv() function, make sure you replace it with the original path of your dataset. the head() function is used to display the first 6 rows of the dataset.
R
data<- read.csv ( 'C:\\Users\\GFG19565\\Downloads\\Laptop_price.csv' )
head (data)
|
Output:
X Manufacturer Category Screen GPU OS CPU_core Screen_Size_cm CPU_frequency RAM_GB
1 0 Acer 4 IPS Panel 2 1 5 35.560 1.6 8
2 1 Dell 3 Full HD 1 1 3 39.624 2.0 4
3 2 Dell 3 Full HD 1 1 7 39.624 2.7 8
4 3 Dell 4 IPS Panel 2 1 5 33.782 1.6 8
5 4 HP 4 Full HD 2 1 7 39.624 1.8 8
6 5 Dell 3 Full HD 1 1 5 39.624 1.6 8
Storage_GB_SSD Weight_kg Price
1 256 1.60 978
2 256 2.20 634
3 256 2.20 946
4 128 1.22 1244
5 256 1.91 837
6 256 2.20 1016
Checking if the dataset is sorted or not?
R
cat ( "\nIs the dataframe sorted by Price?\n" )
print ( all ( diff (data$Price) >= 0))
sorted_data_price_asc <- data[ order (data$Price), ]
cat ( "\nSorted Data (Ascending Order by Price):\n" )
print (sorted_data_price_asc$Price)
|
Output:
Is the data frame sorted by Price?
[1] FALSE
Sorted Data (Ascending Order by Price):
[1] 527 558 616 634 634 685 697 710 723 727 733 735 761 761 761 786
[17] 786 800 808 812 837 837 860 866 876 883 888 888 888 888 892 896
[33] 913 913 922 925 934 935 939 939 939 946 951 951 975 977 978 989
[49] 1000 1002 1003 1010 1013 1016 1023 1053 1053 1054 1057 1066 1068 1075 1085 1089
[65] 1091 1091 1092 1105 1117 1117 1117 1117 1118 1119 1123 1129 1142 1142 1142 1142
[81] 1146 1157 1167 1167 1172 1179 1184 1188 1192 1195 1198 1200 1200 1206 1206 1206
[97] 1208 1213 1219 1219 1236 1241 1244 1244 1245 1251 1255 1256 1268 1269 1269 1283
[113] 1286 1294 1306 1310 1325 1327 1333 1333 1334 1371 1374 1383 1390 1392 1394 1396
[129] 1396 1396 1396 1404 1418 1419 1420 1421 1442 1452 1453 1460 1480 1498 1498 1499
[145] 1501 1507 1513 1515 1518 1523 1524 1531 1541 1544 1548 1561 1562 1598 1607 1611
[161] 1626 1632 1641 1648 1650 1656 1696 1702 1709 1714 1714 1714 1731 1739 1749 1763
[177] 1777 1777 1777 1813 1813 1815 1841 1842 1855 1861 1870 1872 1874 1880 1891 1904
[193] 1904 1904 1905 1950 1950 1953 1983 2006 2012 2031 2069 2082 2095 2095 2096 2096
[209] 2120 2124 2125 2147 2158 2208 2223 2236 2240 2255 2285 2312 2323 2340 2349 2361
[225] 2361 2414 2417 2509 2509 2623 2655 2712 3059 3073 3301 3665 3810 3810
Sorting in Descending Order
We can also sort the price in descending order to check the most costly laptop.
R
sorted_data_price_desc <- data[ order (-data$Price), ]
cat ( "\nSorted Data (Descending Order by Price):\n" )
head (sorted_data_price_desc)
|
Output:
X Manufacturer Category Screen GPU OS CPU_core Screen_Size_cm CPU_frequency RAM_GB
65 64 Asus 1 Full HD 3 1 7 43.942 2.9 16
145 144 Lenovo 3 IPS Panel 3 1 7 43.180 2.8 8
78 77 Dell 5 Full HD 3 1 7 43.942 2.9 16
160 159 Razer 1 Full HD 3 1 7 35.560 2.8 16
181 180 HP 5 Full HD 3 1 7 39.624 2.8 16
122 121 Dell 5 Full HD 3 1 7 39.624 2.8 8
Storage_GB_SSD Weight_kg Price
65 256 3.60 3810
145 256 3.40 3810
78 256 3.42 3665
160 256 1.95 3301
181 256 2.60 3073
122 256 1.78 3059
Sorting Screen Size
R
sorted_data_screen_size_asc <- data[ order (data$Screen_Size_cm), ]
cat ( "\nSorted Data (Ascending Order by Screen Size):\n" )
head (sorted_data_screen_size_asc)
|
Output:
X Manufacturer Category Screen GPU OS CPU_core Screen_Size_cm CPU_frequency RAM_GB
115 114 Lenovo 4 IPS Panel 2 1 5 30.48 2.5 8
162 161 Lenovo 4 IPS Panel 2 1 7 30.48 2.7 8
226 225 Lenovo 4 IPS Panel 2 1 7 30.48 2.7 8
236 235 Lenovo 4 IPS Panel 2 1 5 30.48 2.6 8
85 84 HP 4 Full HD 2 1 7 31.75 2.7 8
187 186 Dell 4 Full HD 2 1 7 31.75 2.8 16
Storage_GB_SSD Weight_kg Price
115 256 1.36 1815
162 256 1.36 2012
226 256 1.36 2096
236 256 1.36 2236
85 256 1.26 1696
187 256 1.18 2361
Ordering
In R, the order() function is used to obtain the order of the elements in a vector or data frame.
R
x <- c (5, 2, 8, 1, 3)
order_result <- order (x)
print (order_result)
|
Output:
[1] 4 2 5 1 3
4 represents that the smallest digit is on the 4th index and the biggest digit is on the 3rd index. This returns the order of the vector.
Sorting a Data Frame Using Order
R
df <- data.frame (ID = c (101, 102, 103, 104),
Value = c (25, 18, 32, 12))
sorted_df <- df[ order (df$Value), ]
print (sorted_df)
|
Output:
ID Value
4 104 12
2 102 18
1 101 25
3 103 32
Using Order for Descending Order
We can also use the order function to sort the dataset using an index. This takes argument decreasing=TRUE.
R
x <- c (5, 2, 8, 1, 3)
order_desc <- order (x, decreasing = TRUE )
print (order_desc)
|
Output:
[1] 3 1 5 2 4
Difference between Order and Sort
We can also find the difference between these two functions with the help of an example.
R
x <- c (5, 2, 8, 1, 3)
order_result <- order (x)
sort_result <- sort (x)
print ( "Order Result:" )
print (x[order_result])
print ( "Sort Result:" )
print (sort_result)
|
Output:
[1] "Order Result:"
[1] 1 2 3 5 8
[1] "Sort Result:"
[1] 1 2 3 5 8
sort function directly returns the sorted list whereas the order function first sorts it based on the index.
Ordering Amazon Dataset
In this example, we will take an external dataset on Amazon Seller- Order Status Dataset. You can download this from the Kaggle website: https://www.kaggle.com/datasets/pranalibose/amazon-seller-order-status-prediction
Loading and Exploring Dataset
R
library (readxl)
data<- read_xlsx ( "C:\\Users\\GFG19565\\Downloads\\orders_data.xlsx" )
head (data)
|
Output:
A tibble: 6 × 12
order_no order_date buyer ship_city ship_state sku description quantity item_total
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 405-9763961-5211537 Sun, 18 J… Mr. CHANDIGA… CHANDIGARH SKU:… 100% Leath… 1 ₹449.00
2 404-3964908-7850720 Tue, 19 O… Minam PASIGHAT, ARUNACHAL… SKU:… Women's Se… 1 ₹449.00
3 171-8103182-4289117 Sun, 28 N… yati… PASIGHAT, ARUNACHAL… SKU:… Women's Se… 1 ₹449.00
4 405-3171677-9557154 Wed, 28 J… aciya DEVARAKO… TELANGANA SKU:… Pure 100% … 1 NA
5 402-8910771-1215552 Tue, 28 S… Susm… MUMBAI, MAHARASHT… SKU:… Pure Leath… 1 ₹1,099.00
6 406-9292208-6725123 Thu, 17 J… Subi… HOWRAH, WEST BENG… SKU:… Women's Tr… 1 ₹200.00
Order by a Specific Column in Ascending Order
We can sort by the order date to check the latest orders of our dataset.
R
data_ordered_date <- data[ order (data$order_date), ]
head (data_ordered_date)
|
Output:
A tibble: 6 × 12
order_no order_date buyer ship_city ship_state sku description quantity item_total
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 402-8678022-3083562 Fri, 1 Oc… Heena MUMBAI, MAHARASHT… SKU:… 100% Pure … 1 ₹399.00
2 402-6701060-6592325 Fri, 1 Oc… Heena MUMBAI, MAHARASHT… SKU:… Women's Pu… 1 ₹399.00
3 405-4776641-5401922 Fri, 1 Oc… Rath… AHMEDABA… GUJARAT SKU:… Pure 100% … 1 ₹250.00
4 171-7361479-0297146 Fri, 10 D… Amol PUNE, MAHARASHT… SKU:… Women's Se… 4 ₹1,796.00
5 402-2278272-1998728 Fri, 10 D… Dalr… BENGALUR… KARNATAKA SKU:… Women's Se… 1 ₹449.00
6 171-3733329-6916359 Fri, 10 D… Shah… MUMBAI, MAHARASHT… SKU:… Women's Se… 1 ₹449.00
Order by a Specific Column in Descending Order
We can also sort item total column in descending order using the order function.
R
data_ordered_item_total_desc <- data[ order (data$item_total, decreasing = TRUE ), ]
head (data_ordered_item_total_desc)
|
Output:
A tibble: 6 × 12
order_no order_date buyer ship_city ship_state sku description quantity item_total
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 403-9089686-7304307 Mon, 6 De… J BENGALUR… KARNATAKA SKU:… Stunning W… 1 ₹899.00
2 408-6770537-3774707 Sun, 17 O… Paro… Mumbai, MAHARASHT… SKU:… Women's Se… 2 ₹898.00
3 405-6918787-5602743 Wed, 25 A… Mosin MAHALING… KARNATAKA SKU:… Ultra Slim… 1 ₹649.00
4 402-2054361-4513137 Mon, 6 Se… Rame… JALESWAR, ODISHA SKU:… Ultra Slim… 1 ₹649.00
5 405-1111150-1834754 Sun, 5 Se… Jai HYDERABA… TELANGANA SKU:… Ultra Slim… 1 ₹649.00
6 171-5917046-2682765 Thu, 7 Oc… Anku GUWAHATI, ASSAM SKU:… Ultra Slim… 1 ₹649.00
Order by Multiple Columns
We can also order multiple columns simultaneously. Here we will ship state in ascending whereas order date in descending order.
R
order_ship_state <- order (data$ship_state)
order_order_date_desc <- order (data$order_date, decreasing = TRUE )
final_order <- order_ship_state[order_order_date_desc]
data_ordered_state_date_desc <- data[final_order, ]
head (data_ordered_state_date_desc)
|
Output:
A tibble: 6 × 12
order_no order_date buyer ship_city ship_state sku description quantity item_total
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 408-9435263-6891514 Thu, 9 De… Shar… NOIDA, UTTAR PRA… SKU:… Traditiona… 1 ₹1,299.00
2 408-0358198-6688308 Wed, 21 J… S. Tuticori… TAMIL NADU SKU:… Bright & C… 1 ₹549.00
3 406-6774677-4553965 Tue, 13 J… Priy… HYDERABA… TELANGANA SKU:… 100% Leath… 1 ₹349.00
4 404-5515061-6165137 Fri, 15 O… Arpi… KOLKATA, WEST BENG… SKU:… Set of 2 P… 1 ₹399.00
5 404-6883107-8347508 Wed, 4 Au… chir… RAIA, GOA SKU:… Women's Se… 1 ₹449.00
6 407-1526604-7803547 Fri, 13 A… Jolly GUWAHATI, ASSAM SKU:… Pure Leath… 1 ₹1,099.00
Ranking
The rank () function in R is used to compute the ranks of the elements present in a vector. The rank of an element is the position in a sorted order. Syntax to use the rank function.
R
scores <- c (80, 95, 80, 72, 90)
ranked_scores <- rank (scores)
cat ( "Original Scores:" , scores, "\n" )
cat ( "Ranked Scores:" , ranked_scores, "\n" )
|
OUTPUT:
Original Scores: 80 95 80 72 90
Ranked Scores: 2.5 5 2.5 1 4
The tied values (80 in this case) received an average rank of 2.5.
Additional Parameters
the na.last parameter in this function specifies if we want to place NA values at the last or not. This helps in dealing with the missing values of the function
R
scores_with_na <- c (80, 95, NA , 72, 90)
ranked_scores_with_na <- rank (scores_with_na, na.last = TRUE )
scores_with_na
ranked_scores_with_na
|
Output:
[1] 80 95 NA 72 90
[1] 2 4 5 1 3
Ties Handling
We can use the ties.method to specify how we want to handle the rank.
R
scores <- c (80, 95, 80, 72, 90)
ranked_scores_first <- rank (scores, ties.method = "first" )
ranked_scores_last <- rank (scores, ties.method = "last" )
ranked_scores_random <- rank (scores, ties.method = "random" )
cat ( "Ranked Scores (First):" , ranked_scores_first, "\n" )
cat ( "Ranked Scores (Last):" , ranked_scores_last, "\n" )
cat ( "Ranked Scores (Random):" , ranked_scores_random, "\n" )
|
Output:
[1] 80 95 80 72 90
Ranked Scores (First): 2 5 3 1 4
Ranked Scores (Last): 3 5 2 1 4
Ranked Scores (Random): 3 5 2 1 4
ranked_scores_first <- rank(scores, ties.method = "first")
: This line ranks the scores using the “first” method for handling tied ranks. The “first” method assigns the same rank to tied values based on their order of appearance in the vector.
ranked_scores_last <- rank(scores, ties.method = "last")
: This line ranks the scores using the “last” method for handling tied ranks. The “last” method assigns the same rank to tied values based on their last occurrence in the vector.
ranked_scores_random <- rank(scores, ties.method = "random")
: This line ranks the scores using the “random” method for handling tied ranks. The “random” method randomly assigns ranks to tied values.
Conclusion
In this article, we understood how to use rank, sort, and order functions in R with the help of different examples. We explored how these functions make data handling and manipulation easier.
Share your thoughts in the comments
Please Login to comment...