How to Check if PySpark DataFrame is empty?
In this article, we are going to check if the Pyspark DataFrame or Dataset is Empty or Not.
At first, let’s create a dataframe
Python3
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType
schema = StructType([
StructField( 'COUNTRY' , StringType(), True ),
StructField( 'CITY' , StringType(), True ),
StructField( 'CAPITAL' , StringType(), True )
])
spark = SparkSession.builder.appName( "TestApp" ).getOrCreate()
df = spark.createDataFrame([], schema)
df.printSchema()
df.show(truncate = False )
|
Output:
Checking dataframe is empty or not
We have Multiple Ways by which we can Check :
Method 1: isEmpty()
The isEmpty function of the DataFrame or Dataset returns true when the DataFrame is empty and false when it’s not empty. If the dataframe is empty, invoking “isEmpty” might result in NullPointerException.
Note : calling df.head() and df.first() on empty DataFrame returns java.util.NoSuchElementException: next on empty iterator exception.
Python3
print (df.head( 1 ).isEmpty)
print (df.first( 1 ).isEmpty)
print (df.rdd.isEmpty())
|
Output:
True
True
True
Method 2: count()
It calculates the count from all partitions from all nodes
Code:
Python3
print (df.count() > 0 )
print (df.count() = = 0 )
|
False
True
Last Updated :
30 May, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...