How to Order PysPark DataFrame by Multiple Columns ?
Last Updated :
17 Jun, 2021
In this article, we are going to order the multiple columns by using orderBy() functions in pyspark dataframe. Ordering the rows means arranging the rows in ascending or descending order, so we are going to create the dataframe using nested list and get the distinct data.
orderBy() function that sorts one or more columns. By default, it orders by ascending.
Syntax: orderBy(*cols, ascending=True)
Parameters:
- cols: Columns by which sorting is needed to be performed.
- ascending: Boolean value to say that sorting is to be done in ascending order
Example Program to create dataframe with student data as information:
Python3
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate()
data = [[ "1" , "sravan" , "vignan" ],
[ "2" , "ojaswi" , "vvit" ],
[ "3" , "rohith" , "vvit" ],
[ "4" , "sridevi" , "vignan" ],
[ "1" , "sravan" , "vignan" ],
[ "5" , "gnanesh" , "iit" ]]
columns = [ 'student ID' , 'student NAME' , 'college' ]
dataframe = spark.createDataFrame(data,columns)
print ( "Actual data in dataframe" )
dataframe.show()
|
Output:
Actual data in dataframe
+----------+------------+-------+
|student ID|student NAME|college|
+----------+------------+-------+
| 1| sravan| vignan|
| 2| ojaswi| vvit|
| 3| rohith| vvit|
| 4| sridevi| vignan|
| 1| sravan| vignan|
| 5| gnanesh| iit|
+----------+------------+-------+
Example 1: Python program to show dataframe by sorting the dataframe based on two columns in descending order using orderby() function
Python3
dataframe.orderBy([ 'student ID' , 'student NAME' ],
ascending = False ).show()
|
Output:
+----------+------------+-------+
|student ID|student NAME|college|
+----------+------------+-------+
| 5| gnanesh| iit|
| 4| sridevi| vignan|
| 3| rohith| vvit|
| 2| ojaswi| vvit|
| 1| sravan| vignan|
| 1| sravan| vignan|
+----------+------------+-------+
Example 2: Python program to show dataframe by sorting the dataframe based on two columns in ascending order using orderby() function
Python3
dataframe.orderBy([ 'student ID' , 'student NAME' ],
ascending = True ).show()
|
Output:
+----------+------------+-------+
|student ID|student NAME|college|
+----------+------------+-------+
| 1| sravan| vignan|
| 1| sravan| vignan|
| 2| ojaswi| vvit|
| 3| rohith| vvit|
| 4| sridevi| vignan|
| 5| gnanesh| iit|
+----------+------------+-------+
Share your thoughts in the comments
Please Login to comment...