Open In App

PySpark – Order by multiple columns

Last Updated : 19 Dec, 2021
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we are going to see how to orderby multiple columns in  PySpark DataFrames through Python.

Create the dataframe for demonstration:

Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
  
# specify column names
columns = ['ID', 'NAME', 'Company']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
dataframe.show()


Output:

orderby means we are going to sort the dataframe by multiple columns in ascending or descending order. we can do this by using the following methods.

Method 1 : Using orderBy()

This function will return the dataframe after ordering the multiple columns. It will sort first based on the column name given.

Syntax:

  • Ascending order: dataframe.orderBy([‘column1′,’column2′,……,’column n’], ascending=True).show()
  • Descending Order: dataframe.orderBy([‘column1′,’column2′,……,’column n’], ascending=False).show()

where:

  • dataframe is the Pyspark Input dataframe
  • ascending=True specifies to sort the dataframe in ascending order
  • ascending=False specifies to sort the dataframe in descending order

Example 1: Sort the PySpark dataframe in ascending order with orderBy().

Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
  
# specify column names
columns = ['ID', 'NAME', 'Company']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
# orderBy dataframe in asc order
dataframe.orderBy(['Name', 'ID', 'Company'],
                  ascending=True).show()


Output:

Example 2: Sort the PySpark dataframe in descending order with orderBy().

Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
  
# specify column names
columns = ['ID', 'NAME', 'Company']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
# orderBy dataframe in desc order
dataframe.orderBy(['Name', 'ID', 'Company'], 
                  ascending=False).show()


Output:

Method 2: Using sort()

This function will return the dataframe after ordering the multiple columns. It will sort first based on the column name given.

Syntax:

  • Ascending order: dataframe.sort([‘column1′,’column2′,……,’column n’], ascending=True).show()
  • Descending Order: dataframe.sort([‘column1′,’column2′,……,’column n’], ascending=False).show()

where,

  1. dataframe is the Pyspark Input dataframe
  2. ascending=True specifies to sort the dataframe in ascending order
  3. ascending=False specifies to sort the dataframe in descending order

Example 1: Sort PySpark dataframe in ascending order

Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
  
# specify column names
columns = ['ID', 'NAME', 'Company']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
# orderBy dataframe in asc order
dataframe.sort(['Name', 'ID', 'Company'],
               ascending=True).show()


Output:

Example 2: Sort the PySpark dataframe in descending order

Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
  
# specify column names
columns = ['ID', 'NAME', 'Company']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
# orderBy dataframe in desc order
dataframe.sort(['Name', 'ID', 'Company'],
               ascending=False).show()


Output:



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads