# How to sort by value in PySpark?

• Last Updated : 18 Jul, 2021

In this article, we are going to sort by value in PySpark.

Creating RDD for demonstration:

## Python3

 `# importing module``from` `pyspark.sql ``import` `SparkSession, Row`` ` `# creating sparksession and giving an app name``spark ``=` `SparkSession.builder.appName(``'sparkdf'``).getOrCreate()`` ` `# create 2 Rows with 3 columns``data ``=` `Row(First_name``=``"Sravan"``, Last_name``=``"Kumar"``, age``=``23``),``Row(First_name``=``"Ojaswi"``, Last_name``=``"Pinkey"``, age``=``16``),``Row(First_name``=``"Rohith"``, Last_name``=``"Devi"``, age``=``7``)`` ` `# create row on rdd``rdd ``=` `spark.sparkContext.parallelize(data)`` ` `# display data``rdd.collect()`

Output:

```[Row(First_name='Sravan', Last_name='Kumar', age=23),
Row(First_name='Ojaswi', Last_name='Pinkey', age=16),
Row(First_name='Rohith', Last_name='Devi', age=7)]```

### Method 1: Using sortBy()

sortBy() is used to sort the data by value efficiently in pyspark. It is a method available in rdd.

Syntax: rdd.sortBy(lambda expression)

It uses a lambda expression to sort the data based on columns.

lambda expression: lambda x: x[column_index]

Example 1: Sort the data by values based on column 1

## Python3

 `# sort the data by values based on column 1``rdd.sortBy(``lambda` `x: x[``0``]).collect()`

Output:

```[Row(First_name='Ojaswi', Last_name='Pinkey', age=16),
Row(First_name='Rohith', Last_name='Devi', age=7),
Row(First_name='Sravan', Last_name='Kumar', age=23)]```

Example 2: Sort data based on column 2 values

## Python3

 `# sort the data by values based on column 2``rdd.sortBy(``lambda` `x: x[``2``]).collect()`

Output:

```[Row(First_name='Rohith', Last_name='Devi', age=7),
Row(First_name='Ojaswi', Last_name='Pinkey', age=16),
Row(First_name='Sravan', Last_name='Kumar', age=23)]```

### Method 2: Using takeOrdered()

It is the method available in RDD, this is used to sort values based on values in a particular column.

Syntax: rdd.takeOrdered(n,lambda expression)

where, n is the total rows to be displayed after sorting

Sort values based on a particular column using takeOrdered function

## Python3

 `# sort values based on``# column 1 using takeOrdered function``print``(rdd.takeOrdered(``3``,``lambda` `x: x[``0``]))`` ` `# sort values based on``# column 3 using takeOrdered function``print``(rdd.takeOrdered(``3``,``lambda` `x: x[``2``]))`

Output:

[Row(First_name=’Ojaswi’, Last_name=’Pinkey’, age=16), Row(First_name=’Rohith’, Last_name=’Devi’, age=7), Row(First_name=’Sravan’, Last_name=’Kumar’, age=23)]

[Row(First_name=’Rohith’, Last_name=’Devi’, age=7), Row(First_name=’Ojaswi’, Last_name=’Pinkey’, age=16), Row(First_name=’Sravan’, Last_name=’Kumar’, age=23)]

My Personal Notes arrow_drop_up