PySpark – orderBy() and sort()

Last Updated : 06 Jun, 2021

In this article, we will see how to sort the data frame by specified columns in PySpark. We can make use of orderBy() and sort() to sort the data frame in PySpark

OrderBy() Method:

OrderBy() function is used to sort an object by its index value.

Syntax: DataFrame.orderBy(cols, args)

Parameters :

cols: List of columns to be ordered

args: Specifies the sorting order i.e (ascending or descending) of columns listed in cols

Return type: Returns a new DataFrame sorted by the specified columns.

Dataframe Creation: Create a new SparkSession object named spark then create a data frame with the custom data.

Python3

# Importing necessary libraries 
from pyspark.sql import SparkSession 
from pyspark.sql import functions as f 
  
# Create a spark session 
spark = SparkSession.builder.appName( 
  'pyspark - example join').getOrCreate() 
  
# Define data in  a dataframe 
dataframe = [ 
    ("Sam", "Software Engineer", "IND", 10000), 
    ("Raj", "Data Scientist", "US", 41000), 
    ("Jonas", "Sales Person", "UK", 230000), 
    ("Peter", "CTO", "Ireland", 50000), 
    ("Hola", "Data Analyst", "Australia", 111000), 
    ("Ram", "CEO", "Iran", 300000), 
    ("Lekhana", "Advertising", "UK", 250000), 
    ("Thanos", "Marketing", "UIND", 114000), 
    ("Nick", "Data Engineer", "Ireland", 680000), 
    ("Wade", "Data Engineer", "IND", 70000) 
] 
  
# Column names of dataframe 
columns = ["Name", "Job", "Country", "salary"] 
  
# Create the spark dataframe 
df = spark.createDataFrame(data=dataframe, schema=columns) 
  
# Printing the dataframe 
df.show() 

Output :

Example 1: Sorting the data frame by a single column

Sort the data frame by the ascending order of ‘Salary’ of employees in the data frame.

Python3

# Order the data by ascending order  
# of Salary 
df.orderBy(['Salary'], ascending = [True]).show() 
  
# or 
# df.orderBy(f.col("Salary").asc()).show() 
  
# or 
# df.orderBy(['Salary']).show()

Output :

Example 2: Sorting the data frame in decreasing order.

Python3

# Order the data by dec order  
# of Salary 
df.orderBy(['Salary'], ascending = [False]).show()

Output:

Example 3: Sorting the data frame by more than one column

Sort the data frame by the descending order of ‘Job’ and ascending order of ‘Salary’ of employees in the data frame. When there is a conflict between two rows having the same ‘Job’, then it’ll be resolved by listing rows in the ascending order of ‘Salary’.

Python3

# Sort the dataframe by descending order 
# of 'Job' and whenever there is conflict 
# in 'Job', it'll be resolved by ordering  
# based on ascending order of 'Salary' 
df.orderBy(f.col("Job").desc(),f.col("Salary").asc()).show() 
  
# or 
# df.orderBy(["Job", "Salary"],ascending = [False, True]).show() 

Output :

Sort() method:

It takes the Boolean value as an argument to sort in ascending or descending order.

Syntax:
sort(x, decreasing, na.last)

Parameters:
x: list of Column or column names to sort by
decreasing: Boolean value to sort in descending order
na.last: Boolean value to put NA at the end

Example 1: Sort the data frame by the ascending order of the “Name” of the employee.

Python3

# Sort the dataframe by ascending  
# order of 'Name' 
df.sort(["Name"],ascending = [True]).show() 

Output :

Example 2: Sort the column in decreasing order.

Python3

# Sort the dataframe by scendding order of 'Name' 
df.sort(["Name"],ascending = [False]).show() 

Output:

Example 3: Sort multiple columns in ascending order.

Python3

# Sort the dataframe by acendding order of 'Name' 
df.sort(["Name","salary"],ascending = [True]).show() 

Output:

Suggest improvement

PySpark - GroupBy and sort DataFrame in descending order

Share your thoughts in the comments

PySpark – orderBy() and sort()

OrderBy() Method:

Python3

Python3

Python3

Python3

Sort() method:

Python3

Python3

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?