In this article, we are going to select variables or columns in R programming language using dplyr library.
Dataset in use:
Select column with column name
Here we will use select() method to select column by its name
Syntax:
select(dataframe,column1,column2,.,column n)
Here, data frame is the input dataframe and columns are the columns in the dataframe to be displayed
Example 1: R program to select columns
# load the library library (dplyr)
# create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' ,
'rohith' , 'pinkey' , 'dhanush' ,
'sravan' , 'gnanesh' , 'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' ,
'guntur' , 'hyd' , 'tenali' , 'hyd' ))
# select id column from the dataframe by # column name print ( select (data1,id))
# select name column from the dataframe by # column name print ( select (data1,name))
|
Output:
Example 2: R program to select multiple columns
# load the library library (dplyr)
# create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' ,
'rohith' , 'pinkey' , 'dhanush' ,
'sravan' , 'gnanesh' , 'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' ,
'guntur' , 'hyd' , 'tenali' , 'hyd' ))
# select multiple columns from the dataframe # by column name print ( select (data1,id,name,address))
|
Output:
Select column(s) by position
We can also use the column position and get the column using select() method. Position starts with 1.
Syntax:
select(dataframe,column1_position,column2_position,.,column n_position)
where, dataframe is the input dataframe and column position is an column number
For selecting multiple columns we can use range operator “;” to select columns by their position
Syntax:
select(dataframe,start_position:end_position)
where, dataframe is the input dataframe, start_position is a column number starting position and end_position is a column number ending position
Example 1: R program to select particular column by column position
# load the library library (dplyr)
# create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' ,
'rohith' , 'pinkey' , 'dhanush' ,
'sravan' , 'gnanesh' , 'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' ,
'guntur' , 'hyd' , 'tenali' , 'hyd' ))
# select first column by column position print ( select (data1,1))
# select third column by column position print ( select (data1,3))
|
Output:
Example 2: R program to select multiple columns by positions
# load the library library (dplyr)
# create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' ,
'rohith' , 'pinkey' , 'dhanush' , 'sravan' ,
'gnanesh' , 'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' , 'guntur' ,
'hyd' , 'tenali' , 'hyd' ))
# select multiple column by column position print ( select (data1,1,2))
|
Output:
Example 3: R program to select multiple columns by position with range operator
# load the library library (dplyr)
# create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' ,
'rohith' , 'pinkey' , 'dhanush' , 'sravan' ,
'gnanesh' , 'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' , 'guntur' ,
'hyd' , 'tenali' , 'hyd' ))
# select multiple column by column # position with : operator print ( select (data1,1:3))
|
Output:
Select column which contains a value or matches a pattern
Here, we will display the column values based on values or pattern present in the column
Method 1: Using contains()
Display the column that contains the given sub string
Syntax:
select(dataframe,contains(‘sub_string’))
Here, dataframe is the input dataframe and sub_string is the string present in the column name
Example: R program to select column based on substring
# load the library library (dplyr)
# create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' ,
'rohith' , 'pinkey' , 'dhanush' , 'sravan' ,
'gnanesh' , 'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' , 'guntur' ,
'hyd' , 'tenali' , 'hyd' ))
# select column that contains am print ( select (data1, contains ( 'am' )))
# select column that contains d print ( select (data1, contains ( 'd' )))
# select column that contains dd print ( select (data1, contains ( 'dd' )))
|
Output:
Method 2: Using matches()
It will check and display the column that contains the given sub string
select(dataframe,matches(‘sub_string’))
Here, dataframe is the input dataframe and sub_string is the string present in the column name
Example: R program to select column based on substring
# load the library library (dplyr)
# create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' ,
'rohith' , 'pinkey' , 'dhanush' , 'sravan' ,
'gnanesh' , 'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' , 'guntur' ,
'hyd' , 'tenali' , 'hyd' ))
# select column that matches with am print ( select (data1, matches ( 'am' )))
# select column that matches with d print ( select (data1, matches ( 'd' )))
# select column that matches with dd print ( select (data1, matches ( 'dd' )))
|
Output:
Select column which starts with or ends with certain character
Here we can also select columns based on starting and ending characters.
- starts_with() is used to return the column that starts with the given character.
Syntax:
select(dataframe,starts_with(‘substring’))
Where, dataframe is the input dataframe and substring is the character/string that starts with it
- ends_with() is used to return the column that ends with the given character.
Syntax:
select(dataframe,ends_with(‘substring’))
where, dataframe is the input dataframe and substring is the character/string that ends with it
Example 1: R program to display columns that starts with a character/substring
# load the library library (dplyr)
# create dataframe with 3 columns id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' ,
'rohith' , 'pinkey' , 'dhanush' , 'sravan' ,
'gnanesh' , 'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' , 'guntur' ,
'hyd' , 'tenali' , 'hyd' ))
# select column that starts with n print ( select (data1, starts_with ( 'n' )))
# select column that starts with add print ( select (data1, starts_with ( 'add' )))
|
Output:
Example 2: R program to select column that ends with a given string or character
# load the library library (dplyr)
# create dataframe with 3 columns id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' ,
'rohith' , 'pinkey' , 'dhanush' , 'sravan' ,
'gnanesh' , 'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' , 'vijayawada' ,
'vijayawada' , 'guntur' , 'hyd' , 'tenali' , 'hyd' ))
# select column that ends with ss print ( select (data1, ends_with ( 'ss' )))
# select column that ends with d print ( select (data1, ends_with ( 'd' )))
|
Output:
Select all columns
We can select all the columns in the data frame by using everything() method.
Syntax:
select(dataframe,everything())
Example: R program to select all columns
# load the library library (dplyr)
# create dataframe with 3 columns # id,name and address data1= data.frame (id= c (1,2,3,4,5,6,7,1,4,2),
name= c ( 'sravan' , 'ojaswi' , 'bobby' , 'gnanesh' ,
'rohith' , 'pinkey' , 'dhanush' , 'sravan' ,
'gnanesh' , 'ojaswi' ),
address= c ( 'hyd' , 'hyd' , 'ponnur' , 'tenali' ,
'vijayawada' , 'vijayawada' , 'guntur' ,
'hyd' , 'tenali' , 'hyd' ))
# select all columns using everything method print ( select (data1, everything ()))
|
Output: