In this article, we will see how to sort data frame rows based on the values of a vector with a specific order. There are two functions by which we can sort data frame rows based on the values of a vector.
- match() function
- left_join() function
data <- data.frame(x1 = 1:5, x2 = letters[1:5], x3 = 6:10) data x1 x2 x3 1 1 a 6 2 2 b 7 3 3 c 8 4 4 d 9 5 5 e 10
Vector with specific ordering:
vec <- c("b", "e", "a", "c", "d") vec # "b" "e" "a" "c" "d"
Method 1: Using match() function to Sort Data Frame According to Vector.
Match returns a vector of the positions of (first) matches of its first argument in its second.
Syntax: match(x, table, nomatch = NA_integer_, incomparables = NULL)
- X: Vector or NULL: the values to be matched. Long vectors are supported.
- table: vector or NULL: the values to be matched against. Long vectors are not supported.
- nomatch: the value to be returned in the case when no match is found. Note that it is coerced to integer.
- incomparables: A vector of values that cannot be matched. Any value in x matching a value in this vector is assigned the nomatch value. For historical reasons, FALSE is equivalent to NULL.
x1 x2 x3 2 2 b 7 5 5 e 10 1 1 a 6 3 3 c 8 4 4 d 9
As we can see from the above output the new data frame is sorted based on the values of the vector.
Method 2: Using left_join() Function of dplyr Package:
First, we have to install and load the dplyr package: now we can use left_join() method to sort the data frame based on the values on the vector.
Syntax: left_join(x, y, by = NULL, copy = FALSE, suffix = c(“.x”, “.y”), …)
- x, y: tbls to join
- by: a character vector of variables to join by. If NULL, the default, *_join() will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they’re right (to suppress the message, simply explicitly list the variables that you want to join).
- copy: If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.
- suffix: If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.
x2 x1 x3 1 b 2 7 2 e 5 10 3 a 1 6 4 c 3 8 5 d 4 9