Skip to content
Related Articles

Related Articles

Improve Article

Extract First and last N rows from PySpark DataFrame

  • Last Updated : 06 Jun, 2021

In this article, we are going to get the extract first N rows and Last N rows from the dataframe using PySpark in Python. To do our task first we will create a sample dataframe.

We have to create a spark object with the help of the spark session and give the app name by using getorcreate() method.

spark = SparkSession.builder.appName('sparkdf').getOrCreate()

Finally, after creating the data with the list and column list to the method:

dataframe = spark.createDataFrame(data, columns)

Python3




# importing module
import pyspark
  
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
  
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
  
# list  of employee data with 5 row values
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 2"],
        ["3", "bobby", "company 3"],
        ["4", "rohith", "company 2"],
        ["5", "gnanesh", "company 1"]]
  
# specify column names
columns = ['Employee ID', 'Employee NAME', 'Company Name']
  
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
  
print('Actual data in dataframe')
dataframe.show()

Output:



Extracting first N rows

We can extract the first N rows by using several methods which are discussed below with the help of some examples:

Method 1: Using head()

This function is used to extract top N rows in the given dataframe

Syntax: dataframe.head(n)

where, 

  • n specifies the number of rows to be extracted from first
  • dataframe is the dataframe name created from the nested lists using pyspark.

Python3




print("Top 2 rows ")
  
# extract top 2 rows
a = dataframe.head(2)
print(a)
  
print("Top 1 row ")
  
# extract top 1 row
a = dataframe.head(1)
print(a)

Output:



Top 2 rows  

[Row(Employee ID=’1′, Employee NAME=’sravan’, Company Name=’company 1′), 

Row(Employee ID=’2′, Employee NAME=’ojaswi’, Company Name=’company 2′)]

Top 1 row  

[Row(Employee ID=’1′, Employee NAME=’sravan’, Company Name=’company 1′)]

Method 2: Using first()

This function is used to extract only one row in the dataframe.

Syntax: dataframe.first()

  • It doesn’t take any parameter
  • dataframe is the dataframe name created from the nested lists using pyspark

Python3




print("Top row ")
  
# extract top  row
a = dataframe.first()
print(a)

Output:



Top row  

Row(Employee ID=’1′, Employee NAME=’sravan’, Company Name=’company 1′)

Method 3: Using show() 

Used to display the dataframe from top to bottom by default.

Syntax: dataframe.show(n)

where,

  • dataframe is the input dataframe
  • n is the number of rows to be displayed from the top ,if n is not specified it will print entire rows in the dataframe

Python3




# show() function to get 
# 2 rows
dataframe.show(2)

Output:

Extracting Last N rows

Extracting the last rows means getting the last N rows from the given dataframe. For this, we are using tail() function and can get the last N rows



Syntax: dataframe.tail(n)

where,

  • n is the number to get last n rows
  • data frame is the input dataframe

Example:

Python3




print("Last 2 rows ")
  
# extract last 2 rows
a = dataframe.tail(2)
print(a)
  
print("Last 1 row ")
  
# extract last 1 row
a = dataframe.tail(1)
print(a)

Output:

Last 2 rows  

[Row(Employee ID=’4′, Employee NAME=’rohith’, Company Name=’company 2′), 

Row(Employee ID=’5′, Employee NAME=’gnanesh’, Company Name=’company 1′)]

Last 1 row  

[Row(Employee ID=’5′, Employee NAME=’gnanesh’, Company Name=’company 1′)]

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course




My Personal Notes arrow_drop_up
Recommended Articles
Page :