Open In App

Extract unique rows from a matrix using R

Last Updated : 15 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

A matrix is a rectangular representation of elements that are put in rows and columns. The rows represent the horizontal data while the columns represent the vertical data in R Programming Language.

Matrix in R

In R we can create a matrix using the function called matrix(). We have to pass some arguments to the function which represents the set of elements in the vector. It takes arguments like the number of rows, columns, and entry of elements either in a row-wise fashion or column-wise.

The basic syntax of the matrix in R is

matrix(data, nrow, ncol, byrow = FALSE, dimnames = NULL)

  • data: it is the input data we want in the matrix.
  • nrow: defines the number of rows in matrix
  • ncol: defines the number of columns in matrix
  • byrow: A logical value indicating whether the matrix should be filled by rows (default is by columns).
  • dimnames: for providing names to the rows and columns.
R
# Create a 6x3 matrix with row and column names
mat <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9), 
              nrow = 6, ncol = 3, 
              dimnames = list(c("Row1", "Row2", "Row3", "Row4", "Row5", "Row6"), 
                              c("Col1", "Col2", "Col3")))
print(mat)

Output:

     Col1 Col2 Col3
Row1 1 7 4
Row2 2 8 5
Row3 3 9 6
Row4 4 1 7
Row5 5 2 8
Row6 6 3 9

Extracting unique rows from a matrix using R

Method 1: Using unique() function

Now lets change the matrix so that the rows 1-4 and 3-6 are same respectively, so that we can use unique() function to remove duplicate rows as per the given problem. Given below is the generalized syntax of unique function.

unique(x, incomparables, fromLast, nmax, …,MARGIN)

  • x: This argument can be vector/data frame/array/matrix/NULL etc.
  • incomparables:A vector of values that cannot be compared makes up this parameter. The only value that might be acceptable for methods other than the default is FALSE, which indicates that all values can be compared. It can get coerced internally to the same type as x.
  • fromLast:This parameter determines whether or not duplicates should be sorted from last, meaning that the components that are identical to the right will be retained. Its value is logical; that is, it can be true or false.
  • nmax: it tells us about the maximum number of unique items expected.
  • MARGIN: it defines the margin of the array to be fixed to some particular value.

The unique() function returns the unique elements of a vector, data frame, or matrix. In the case of a matrix, it extracts out unique rows and considers each row as a separate entity for uniqueness.

R
# Create a 6x3 matrix with row and column names
mat <- matrix(c(1, 2, 3, 1, 5, 3, 7, 8, 9, 7, 2, 9, 4, 5, 6, 4, 8, 6), 
              nrow = 6, ncol = 3, 
              dimnames = list(c("Row1", "Row2", "Row3", "Row4", "Row5", "Row6"), 
                              c("Col1", "Col2", "Col3")))

print(mat)
unique(mat)

Output:

     Col1 Col2 Col3
Row1 1 7 4
Row2 2 8 5
Row3 3 9 6
Row4 1 7 4
Row5 5 2 8
Row6 3 9 6
Col1 Col2 Col3
Row1 1 7 4
Row2 2 8 5
Row3 3 9 6
Row5 5 2 8

Method 2: Using duplicated() function

It is used to identify duplicate elements in a vector, matrix, or data frame. The basic syntax for the duplicated() function in R is as follows.

duplicated(x, incomparables = FALSE, fromLast = FALSE)

  • x: The vector, matrix, or data frame in which duplicity is to be identified.
  • incomparable: An optional argument specifying values that should be treated as incomparable, meaning they will not be considered as duplicates. This argument is often not used, and its default value is FALSE.
  • fromLast: It is a logical value depicting whether to consider duplicates from the last occurrence. If TRUE, the last occurrence of each duplicate is considered unique; if FALSE (the default), the first occurrence is considered unique.

Determining and removing duplicated rows of a matrix

R
#  creating a 3*4 matrix
mat <- matrix(c(1, 2, 1, 1, 3, 1, 1, 4, 1, 1, 5, 1), ncol = 4,
              dimnames = list(c("Row1", "Row2", "Row3"), 
                              c("Col1", "Col2", "Col3", "Col4")))
# Printing the matrix
mat

# Now let us create a logical vector which will store boolean quantities
# and each value will correspond to a row in the matrix
duplicated_rows <- duplicated(mat)

# Printing the logical vector
duplicated_rows

# Extracting unique rows from the matrix 
unique_mat <- mat[!duplicated_rows, ]
# Printing the unique matrix
unique_mat

Output:

     Col1 Col2 Col3 Col4
Row1 1 1 1 1
Row2 2 3 4 5
Row3 1 1 1 1
Duplicated_rows
Row1 Row2 Row3
FALSE FALSE TRUE
Col1 Col2 Col3 Col4
Row1 1 1 1 1
Row2 2 3 4 5

duplicated(mat): The result is a logical vector where each value represents respective row of the matrix, and the value of the element as TRUE shows that the the row is same as of any previously occurred row and FALSE indicates that the row is unique till the index.

  • !duplicated(mat): The negation operator “!” negates the value from TRUE to FALSE and vice versa. So our resultant vector now shows that the values with TRUE indicates unique rows and FALSE denotes duplicate rows.
  • mat[!duplicated(mat), ]: This part of the code uses the logical vector to index the rows of the matrix mat. It selects only those rows for which the corresponding value in the logical vector is TRUE, effectively excluding the duplicate rows.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads