How to Order PysPark DataFrame by Multiple Columns ?
In this article, we are going to order the multiple columns by using orderBy() functions in pyspark dataframe. Ordering the rows means arranging the rows in ascending or descending order, so we are going to create the dataframe using nested list and get the distinct data.
orderBy() function that sorts one or more columns. By default, it orders by ascending.
Syntax: orderBy(*cols, ascending=True)
Parameters:
- cols: Columns by which sorting is needed to be performed.
- ascending: Boolean value to say that sorting is to be done in ascending order
Example Program to create dataframe with student data as information:
Python3
# importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate() # list of students data data = [[ "1" , "sravan" , "vignan" ], [ "2" , "ojaswi" , "vvit" ], [ "3" , "rohith" , "vvit" ], [ "4" , "sridevi" , "vignan" ], [ "1" , "sravan" , "vignan" ], [ "5" , "gnanesh" , "iit" ]] # specify column names columns = [ 'student ID' , 'student NAME' , 'college' ] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data,columns) print ( "Actual data in dataframe" ) # show dataframe dataframe.show() |
Output:
Actual data in dataframe +----------+------------+-------+ |student ID|student NAME|college| +----------+------------+-------+ | 1| sravan| vignan| | 2| ojaswi| vvit| | 3| rohith| vvit| | 4| sridevi| vignan| | 1| sravan| vignan| | 5| gnanesh| iit| +----------+------------+-------+
Example 1: Python program to show dataframe by sorting the dataframe based on two columns in descending order using orderby() function
Python3
# show dataframe by sorting the dataframe based # on two columns in descending order using orderby() function dataframe.orderBy([ 'student ID' , 'student NAME' ], ascending = False ).show() |
Output:
+----------+------------+-------+ |student ID|student NAME|college| +----------+------------+-------+ | 5| gnanesh| iit| | 4| sridevi| vignan| | 3| rohith| vvit| | 2| ojaswi| vvit| | 1| sravan| vignan| | 1| sravan| vignan| +----------+------------+-------+
Example 2: Python program to show dataframe by sorting the dataframe based on two columns in ascending order using orderby() function
Python3
# show dataframe by sorting the dataframe # based on two columns in ascending order # using orderby() function dataframe.orderBy([ 'student ID' , 'student NAME' ], ascending = True ).show() |
Output:
+----------+------------+-------+ |student ID|student NAME|college| +----------+------------+-------+ | 1| sravan| vignan| | 1| sravan| vignan| | 2| ojaswi| vvit| | 3| rohith| vvit| | 4| sridevi| vignan| | 5| gnanesh| iit| +----------+------------+-------+
Please Login to comment...