How to Check if PySpark DataFrame is empty?

In this article, we are going to check if the Pyspark DataFrame or Dataset is Empty or Not.

At first, let’s create a dataframe

Python3

# import modules 

from pyspark.sql import SparkSession 

from pyspark.sql.types import StructType, StructField, StringType 

# defining schema 

schema = StructType([ 

    StructField('COUNTRY', StringType(), True), 

    StructField('CITY', StringType(), True), 

    StructField('CAPITAL', StringType(), True) 
]) 

# Create Spark Object 

spark = SparkSession.builder.appName("TestApp").getOrCreate() 

# Create Empty DataFrame with Schema. 

df = spark.createDataFrame([], schema) 

# Show schema and data 
df.printSchema() 

df.show(truncate=False)

Output:

Checking dataframe is empty or not

We have Multiple Ways by which we can Check :

Method 1: isEmpty()

The isEmpty function of the DataFrame or Dataset returns true when the DataFrame is empty and false when it’s not empty. If the dataframe is empty, invoking “isEmpty” might result in NullPointerException.

Note : calling df.head() and df.first() on empty DataFrame returns java.util.NoSuchElementException: next on empty iterator exception.

Python3

print(df.head(1).isEmpty) 

print(df.first(1).isEmpty) 

print(df.rdd.isEmpty()) 

Output:

True
True
True

Method 2: count()

It calculates the count from all partitions from all nodes

Code:

Python3

print(df.count() > 0) 

print(df.count() == 0)

False
True

Article Tags :

Python

Python-Pyspark