How to create an empty PySpark DataFrame ?

Last Updated : 11 Aug, 2021

In this article, we are going to see how to create an empty PySpark dataframe. Empty Pysaprk dataframe is a dataframe containing no data and may or may not specify the schema of the dataframe.

Creating an empty RDD without schema

We’ll first create an empty RDD by specifying an empty schema.

emptyRDD() method creates an RDD without any data.
createDataFrame() method creates a pyspark dataframe with the specified data and schema of the dataframe.

Code:

Python3

# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.types import *
 
# Create a spark session
spark = SparkSession.builder.appName('Empty_Dataframe').getOrCreate()
 
# Create an empty RDD
emp_RDD = spark.sparkContext.emptyRDD()
 
# Create empty schema
columns = StructType([])
 
# Create an empty RDD with empty schema
data = spark.createDataFrame(data = emp_RDD,
                             schema = columns)
 
# Print the dataframe
print('Dataframe :')
data.show()
 
# Print the schema
print('Schema :')
data.printSchema()

Output:

Dataframe :
++
||
++
++

Schema :
root

Creating an emptyRDD with schema

It is possible that we will not get a file for processing. However, we must still manually create a DataFrame with the appropriate schema.

Specify the schema of the dataframe as columns = [‘Name’, ‘Age’, ‘Gender’].
Create an empty RDD with an expecting schema.

Code:

Python3

# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.types import *
 
# Create a spark session
spark = SparkSession.builder.appName('Empty_Dataframe').getOrCreate()
 
# Create an empty RDD
emp_RDD = spark.sparkContext.emptyRDD()
 
# Create an expected schema
columns = StructType([StructField('Name',
                                  StringType(), True),
                    StructField('Age',
                                StringType(), True),
                    StructField('Gender',
                                StringType(), True)])
 
# Create an empty RDD with expected schema
df = spark.createDataFrame(data = emp_RDD,
                           schema = columns)
 
# Print the dataframe
print('Dataframe :')
df.show()
 
# Print the schema
print('Schema :')
df.printSchema()

Output :

Dataframe :
+----+---+------+
|Name|Age|Gender|
+----+---+------+
+----+---+------+

Schema :
root
 |-- Name: string (nullable = true)
 |-- Age: string (nullable = true)
 |-- Gender: string (nullable = true)

Creating an empty dataframe without schema

Create an empty schema as columns.
Specify data as empty([]) and schema as columns in CreateDataFrame() method.

Code:

Python3

# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.types import *
 
# Create a spark session
spark = SparkSession.builder.appName('Empty_Dataframe').getOrCreate()
 
# Create an empty schema
columns = StructType([])
 
# Create an empty dataframe with empty schema
df = spark.createDataFrame(data = [],
                           schema = columns)
 
# Print the dataframe
print('Dataframe :')
df.show()
 
# Print the schema
print('Schema :')
df.printSchema()

Output:

Dataframe :
++
||
++
++

Schema :
root

Creating an empty dataframe with schema

Specify the schema of the dataframe as columns = [‘Name’, ‘Age’, ‘Gender’].
Specify data as empty([]) and schema as columns in CreateDataFrame() method.

Code:

Python3

# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.types import *
 
# Create a spark session
spark = SparkSession.builder.appName('Empty_Dataframe').getOrCreate()
 
# Create an expected schema
columns = StructType([StructField('Name', 
                                  StringType(), True),
                    StructField('Age',
                                StringType(), True),
                    StructField('Gender',
                                StringType(), True)])
 
# Create a dataframe with expected schema
df = spark.createDataFrame(data = [],
                           schema = columns)
 
# Print the dataframe
print('Dataframe :')
df.show()
 
# Print the schema
print('Schema :')
df.printSchema()

Output :

Dataframe :
+----+---+------+
|Name|Age|Gender|
+----+---+------+
+----+---+------+

Schema :
root
 |-- Name: string (nullable = true)
 |-- Age: string (nullable = true)
 |-- Gender: string (nullable = true)

Suggest improvement

How to create an empty DataFrame in R ?

Share your thoughts in the comments

How to create an empty PySpark DataFrame ?

Creating an empty RDD without schema

Python3

Creating an emptyRDD with schema

Python3

Creating an empty dataframe without schema

Python3

Creating an empty dataframe with schema

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?