How to randomly select rows from Pandas DataFrame

Let’s discuss how to randomly select rows from Pandas DataFrame. A random selection of rows from a DataFrame can be achieved in different ways.

Create a simple dataframe with dictionary of lists.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Import pandas package
import pandas as pd
   
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
  
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)
  
# select all columns
df

chevron_right




Mathod #1: Using sample() method

Sample method returns a random sample of items from an axis of object and this object of same type as your caller.

Example 1:

filter_none

edit
close

play_arrow

link
brightness_4
code

# Selects one row randomaly using sample() 
# without give any parameters.
  
# Import pandas package
import pandas as pd
   
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
  
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)
  
# Select one row randomaly using sample()
# without give any parameters
df.sample()

chevron_right


Output:

Example 2: Using parameter n, which selects n numbers of rows randomly.

Select n numbers of rows randomly using sample(n) or sample(n=n). Each time you run this, you get n different rows.

filter_none

edit
close

play_arrow

link
brightness_4
code

# To get 3 random rows
# each time it gives 3 different rows
  
# df.sample(3) or
df.sample(n = 3)

chevron_right


Output:

Example 3: Using frac parameter.

One can do fraction of axis items and get rows. For example, if frac= .5 then sample method return 50% of rows.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Fraction of rows
  
# here you get .50 % of the rows
df.sample(frac = 0.5)

chevron_right


Output:


Example 4:
First selects 70% rows of whole df dataframe and put in another dataframe df1 after that we select 50% frac from df1.

filter_none

edit
close

play_arrow

link
brightness_4
code

# fraction of rows
  
# here you get 70 % row from the df
# make put into another dataframe df1
df1 = df.sample(frac =.7)
  
# Now select 50 % rows from df1
df1.sample(frac =.50)

chevron_right


Output:

Example 5: Select some rows randomly with replace = false

Parameter replace give permission to select one rows many time(like). Default value of replace parameter of sample() method is False so you never select more than total number of rows.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Dataframe df has only 4 rows
  
# if we try to select more than 4 row then will come error 
# Cannot take a larger sample than population when 'replace = False'
df1.sample(n = 3, replace = False)

chevron_right


Output:

Example 6: Select more than n rows where n is total number of rows with the help of replace.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Select more than rows with using replace
# default it is False 
df1.sample(n = 6, replace = True)

chevron_right


Output:

Example 7: Using weights

filter_none

edit
close

play_arrow

link
brightness_4
code

# Weights will be re-normalized automatically
test_weights = [0.2, 0.2, 0.2, 0.4]
  
df1.sample(n = 3, weights = test_weights)

chevron_right


Output:

Example 8: Using axis


The axis accepts number or name. sample() method also allows users to sample columns instead of rows using the axis argument.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Accepts axis number or name. 
  
# sample also allows users to sample columns
# instead of rows using the axis argument.
df1.sample(axis = 0)

chevron_right


Output:

Example 9: Using random_state

With a given DataFrame, the sample will always fetch same rows. If random_state is None or np.random, then a randomly-initialized RandomState object is returned.

filter_none

edit
close

play_arrow

link
brightness_4
code

# With a given seed, the sample will always draw the same rows.
  
# If random_state is None or np.random,
# then a randomly-initialized
# RandomState object is returned.
df1.sample(n = 2, random_state = 2)

chevron_right


Output:

 
Method #2: Using NumPy

Numpy chose how many index include for random selection and we can allow replacement.

filter_none

edit
close

play_arrow

link
brightness_4
code

# Import pandas & Numpy package
import numpy as np
import pandas as pd
   
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
  
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)
  
# Chose how many index include for random selection
chosen_idx = np.random.choice(4, replace = True, size = 6)
  
df2 = df.iloc[chosen_idx]
  
df2

chevron_right


Output:



My Personal Notes arrow_drop_up

Strategy Path planning and Destination matters in success No need to worry about in between temporary failures

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.