How to sort by value in PySpark?
Last Updated :
18 Jul, 2021
In this article, we are going to sort by value in PySpark.
Creating RDD for demonstration:
Python3
from pyspark.sql import SparkSession, Row
spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate()
data = Row(First_name = "Sravan" , Last_name = "Kumar" , age = 23 ),
Row(First_name = "Ojaswi" , Last_name = "Pinkey" , age = 16 ),
Row(First_name = "Rohith" , Last_name = "Devi" , age = 7 )
rdd = spark.sparkContext.parallelize(data)
rdd.collect()
|
Output:
[Row(First_name='Sravan', Last_name='Kumar', age=23),
Row(First_name='Ojaswi', Last_name='Pinkey', age=16),
Row(First_name='Rohith', Last_name='Devi', age=7)]
Method 1: Using sortBy()
sortBy() is used to sort the data by value efficiently in pyspark. It is a method available in rdd.
Syntax: rdd.sortBy(lambda expression)
It uses a lambda expression to sort the data based on columns.
lambda expression: lambda x: x[column_index]
Example 1: Sort the data by values based on column 1
Python3
rdd.sortBy( lambda x: x[ 0 ]).collect()
|
Output:
[Row(First_name='Ojaswi', Last_name='Pinkey', age=16),
Row(First_name='Rohith', Last_name='Devi', age=7),
Row(First_name='Sravan', Last_name='Kumar', age=23)]
Example 2: Sort data based on column 2 values
Python3
rdd.sortBy( lambda x: x[ 2 ]).collect()
|
Output:
[Row(First_name='Rohith', Last_name='Devi', age=7),
Row(First_name='Ojaswi', Last_name='Pinkey', age=16),
Row(First_name='Sravan', Last_name='Kumar', age=23)]
Method 2: Using takeOrdered()
It is the method available in RDD, this is used to sort values based on values in a particular column.
Syntax: rdd.takeOrdered(n,lambda expression)
where, n is the total rows to be displayed after sorting
Sort values based on a particular column using takeOrdered function
Python3
print (rdd.takeOrdered( 3 , lambda x: x[ 0 ]))
print (rdd.takeOrdered( 3 , lambda x: x[ 2 ]))
|
Output:
[Row(First_name=’Ojaswi’, Last_name=’Pinkey’, age=16), Row(First_name=’Rohith’, Last_name=’Devi’, age=7), Row(First_name=’Sravan’, Last_name=’Kumar’, age=23)]
[Row(First_name=’Rohith’, Last_name=’Devi’, age=7), Row(First_name=’Ojaswi’, Last_name=’Pinkey’, age=16), Row(First_name=’Sravan’, Last_name=’Kumar’, age=23)]
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...