Open In App

PySpark – orderBy() and sort()

Last Updated : 06 Jun, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will see how to sort the data frame by specified columns in PySpark. We can make use of orderBy() and sort()  to sort the data frame in PySpark

OrderBy() Method:

OrderBy() function is used to sort an object by its index value.

Syntax: DataFrame.orderBy(cols, args)

Parameters :

  • cols: List of columns to be ordered
  • args: Specifies the sorting order i.e (ascending or descending) of columns listed in cols

Return type: Returns a new DataFrame sorted by the specified columns.

Dataframe Creation: Create a new SparkSession object named spark then create a data frame with the custom data.

Python3




# Importing necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql import functions as f
  
# Create a spark session
spark = SparkSession.builder.appName(
  'pyspark - example join').getOrCreate()
  
# Define data in  a dataframe
dataframe = [
    ("Sam", "Software Engineer", "IND", 10000),
    ("Raj", "Data Scientist", "US", 41000),
    ("Jonas", "Sales Person", "UK", 230000),
    ("Peter", "CTO", "Ireland", 50000),
    ("Hola", "Data Analyst", "Australia", 111000),
    ("Ram", "CEO", "Iran", 300000),
    ("Lekhana", "Advertising", "UK", 250000),
    ("Thanos", "Marketing", "UIND", 114000),
    ("Nick", "Data Engineer", "Ireland", 680000),
    ("Wade", "Data Engineer", "IND", 70000)
]
  
# Column names of dataframe
columns = ["Name", "Job", "Country", "salary"]
  
# Create the spark dataframe
df = spark.createDataFrame(data=dataframe, schema=columns)
  
# Printing the dataframe
df.show()


Output :

Example 1: Sorting the data frame by a single column

Sort the data frame by the ascending order of ‘Salary’ of employees in the data frame.

Python3




# Order the data by ascending order 
# of Salary
df.orderBy(['Salary'], ascending = [True]).show()
  
# or
# df.orderBy(f.col("Salary").asc()).show()
  
# or
# df.orderBy(['Salary']).show()


Output :

Example 2: Sorting the data frame in decreasing order.

Python3




# Order the data by dec order 
# of Salary
df.orderBy(['Salary'], ascending = [False]).show()


Output:

Example 3: Sorting the data frame by more than one column

Sort the data frame by the descending order of ‘Job’ and ascending order of ‘Salary’ of employees in the data frame. When there is a conflict between two rows having the same ‘Job’, then it’ll be resolved by listing rows in the ascending order of ‘Salary’.

Python3




# Sort the dataframe by descending order
# of 'Job' and whenever there is conflict
# in 'Job', it'll be resolved by ordering 
# based on ascending order of 'Salary'
df.orderBy(f.col("Job").desc(),f.col("Salary").asc()).show()
  
# or
# df.orderBy(["Job", "Salary"],ascending = [False, True]).show()


Output :

Sort() method:

It takes the Boolean value as an argument to sort in ascending or descending order.

Syntax:
sort(x, decreasing, na.last)

Parameters:
x: list of Column or column names to sort by
decreasing: Boolean value to sort in descending order
na.last: Boolean value to put NA at the end

Example 1: Sort the data frame by the ascending order of the “Name” of the employee.

Python3




# Sort the dataframe by ascending 
# order of 'Name'
df.sort(["Name"],ascending = [True]).show()


Output :

Example 2: Sort the column in decreasing order.

Python3




# Sort the dataframe by scendding order of 'Name'
df.sort(["Name"],ascending = [False]).show()


Output:

Example 3: Sort multiple columns in ascending order.

Python3




# Sort the dataframe by acendding order of 'Name'
df.sort(["Name","salary"],ascending = [True]).show()


Output:



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads