How to randomly select rows from Pandas DataFrame

Last Updated : 01 Dec, 2023

In Pandas, we can randomly select any row from the Pandas DataFrame. In this article, we are going to see how to randomly select rows from Pandas Dataframe.

Creating Sample Pandas DataFrame

First, we will create a sample Pandas DataFrame that we will use further in our article.

Python3

# Import pandas package
import pandas as pd
  
# Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age':[27, 24, 22, 32, 15],
        'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd', '10th']}
 
# Convert the dictionary into DataFrame 
df = pd.DataFrame(data)
 
# select all columns
df

Output:

    Name    Age    Address    Qualification
0    Jai    27    Delhi    Msc
1    Princi    24    Kanpur    MA
2    Gaurav    22    Allahabad    MCA
3    Anuj    32    Kannauj    Phd
4    Geeku    15    Noida    10th

Randomly Select Rows from Pandas DataFrame

Let’s discuss how to randomly select rows from Pandas DataFrame. A random selection of rows from a DataFrame can be achieved in different ways. Below are the ways by which we can randomly select rows from Pandas DataFrame:

Using sample() Method
Using parameter n
Using frac parameter
Using Fraction of Rows
Using replace = false
Selecting more than n rows
Using weights
Using axis
Using random_state
Using NumPy

Select rows from Pandas DataFrame Using sample() method

In this example, we are using sample() method to randomly select rows from Pandas DataFram. Sample method returns a random sample of items from an axis of object and this object of same type as your caller.

Python3

# Import pandas package
import pandas as pd
 
# Define a dictionary containing employee data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age': [27, 24, 22, 32, 15],
        'Address': ['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification': ['Msc', 'MA', 'MCA', 'Phd', '10th']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Select one row randomly using sample()
# without give any parameters
df.sample()

Output:

Name    Age    Address    Qualification
1    Princi    24    Kanpur    MA

Randomly Select Rows Using parameter n

Select n numbers of rows randomly using sample(n) or sample(n=n). Each time you run this, you get n different rows.

Python3

# To get 3 random rows
# each time it gives 3 different rows
 
# df.sample(3) or
df.sample(n=3)

Output:

    Name    Age    Address    Qualification
2    Gaurav    22    Allahabad    MCA
4    Geeku    15    Noida    10th
3    Anuj    32    Kannauj    Phd

Randomly Select Rows Using frac Parameter

One can do fraction of axis items and get rows. For example, if frac= .5 then sample method return 50% of rows.

Python3

# Fraction of rows
 
# here you get .50 % of the rows
df.sample(frac=0.5)

Output:

    Name    Age    Address    Qualification
1    Princi    24    Kanpur    MA
0    Jai    27    Delhi    Msc

Using Fraction of Rows

First selects 70% rows of whole df dataframe and put in another dataframe df1 after that we select 50% frac from df1.

Python3

# fraction of rows
 
# here you get 70 % row from the df
# make put into another dataframe df1
df1 = df.sample(frac=.7)
 
# Now select 50 % rows from df1
df1.sample(frac=.50)

Output:

    Name    Age    Address    Qualification
3    Anuj    32    Kannauj    Phd
1    Princi    24    Kanpur    MA

Select Rows Randomly with replace = false

Parameter replace give permission to select one rows many time(like). Default value of replace parameter of sample() method is False so you never select more than total number of rows.

Python3

# Dataframe df has only 4 rows
 
# if we try to select more than 4 row then will come error
# Cannot take a larger sample than population when 'replace = False'
df1.sample(n=3, replace=False)

Output:

Name    Age    Address    Qualification
2    Gaurav    22    Allahabad    MCA
1    Princi    24    Kanpur    MA
4    Geeku    15    Noida    10th

Select More than n Rows

Select more than n rows where n is total number of rows with the help of replace.

Python3

# Select more than rows with using replace
# default it is False
df1.sample(n=6, replace=True)

Output:

Name    Age    Address    Qualification
2    Gaurav    22    Allahabad    MCA
2    Gaurav    22    Allahabad    MCA
1    Princi    24    Kanpur    MA
2    Gaurav    22    Allahabad    MCA
4    Geeku    15    Noida    10th
1    Princi    24    Kanpur    MA

Randomly Select Rows from Pandas DataFrame Using weights

In this example, the rows are selected with probabilities according to the specified weights. The weights are automatically normalized to ensure they sum to 1. Adjust the values in the test_weights list based on your desired probability distribution.

Python3

# Weights will be re-normalized automatically
test_weights = [0.2, 0.2, 0.2, 0.4]
 
df1.sample(n=3, weights=test_weights)

Output:

Name    Age    Address    Qualification
2    Gaurav    22    Allahabad    MCA
1    Princi    24    Kanpur    MA
3    Anuj    32    Kannauj    Phd

Randomly Select Rows from Pandas DataFrame Using axis

The axis accepts number or name. sample() method also allows users to sample columns instead of rows using the axis argument.

Python3

# Accepts axis number or name.
 
# sample also allows users to sample columns
# instead of rows using the axis argument.
df1.sample(axis=0)

Output:

Name    Age    Address    Qualification
3    Anuj    32    Kannauj    Phd

Randomly Select Rows from Pandas DataFrame Using random_state

With a given DataFrame, the sample will always fetch same rows. If random_state is None or np.random, then a randomly-initialized RandomState object is returned.

Python3

# With a given seed, the sample will always draw the same rows.
 
# If random_state is None or np.random,
# then a randomly-initialized
# RandomState object is returned.
df1.sample(n=2, random_state=2)

Output:

    Name    Age    Address    Qualification
1    Princi    24    Kanpur    MA
2    Gaurav    22    Allahabad    MCA

Select rows from Pandas Using NumPy

Numpy choose how many index include for random selection and we can allow replacement.

Python3

# Import pandas & Numpy package
import numpy as np
import pandas as pd
 
# Define a dictionary containing employee data
data = {'Name': ['Jai', 'Princi', 'Gaurav', 'Anuj', 'Geeku'],
        'Age': [27, 24, 22, 32, 15],
        'Address': ['Delhi', 'Kanpur', 'Allahabad', 'Kannauj', 'Noida'],
        'Qualification': ['Msc', 'MA', 'MCA', 'Phd', '10th']}
 
# Convert the dictionary into DataFrame
df = pd.DataFrame(data)
 
# Choose how many index include for random selection
chosen_idx = np.random.choice(4, replace=True, size=6)
 
df2 = df.iloc[chosen_idx]
 
df2

Output:

Name    Age    Address    Qualification
3    Anuj    32    Kannauj    Phd
1    Princi    24    Kanpur    MA
1    Princi    24    Kanpur    MA
0    Jai    27    Delhi    Msc
3    Anuj    32    Kannauj    Phd
0    Jai    27    Delhi    Msc

Suggest improvement

Convert a column to row name/index in Pandas

How to print an entire Pandas DataFrame in Python?

Share your thoughts in the comments

Pandas DataFrame Practice Exercises

Pandas Dataframe Rows Practice Exercise

Pandas Dataframe Columns Practice Exercise

Pandas Series Practice Exercise

Pandas Date and Time Practice Exercise

DataFrame String Manipulation

Accessing and Manipulating Data in DataFrame

DataFrame Visualization and Exporting

Data Aggregation and Grouping

Merging and Joining

Filtering and Selecting Data

Select Rows With Multiple Filters in Pandas

Selection and Slicing

Miscellaneous DataFrame Operations

Data Cleaning and Manipulation

Concatenation and Manipulation

DataFrame Sorting and Reordering

DataFrame Transformation and Conversion

DataFrame Filtering and Selection

DataFrame Conversion and Reshaping

How to randomly select rows from Pandas DataFrame

Creating Sample Pandas DataFrame

Python3

Randomly Select Rows from Pandas DataFrame

Select rows from Pandas DataFrame Using sample() method

Python3

Randomly Select Rows Using parameter n

Python3

Randomly Select Rows Using frac Parameter

Python3

Using Fraction of Rows

Python3

Select Rows Randomly with replace = false

Python3

Select More than n Rows

Python3

Randomly Select Rows from Pandas DataFrame Using weights

Python3

Randomly Select Rows from Pandas DataFrame Using axis

Python3

Randomly Select Rows from Pandas DataFrame Using random_state

Python3

Select rows from Pandas Using NumPy

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?