Open In App

How to create an empty PySpark DataFrame ?

In this article, we are going to see how to create an empty PySpark dataframe. Empty Pysaprk dataframe is a dataframe containing no data and may or may not specify the schema of the dataframe.

Creating an empty RDD without schema

We’ll first create an empty RDD by specifying an empty schema.



Code:




# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.types import *
 
# Create a spark session
spark = SparkSession.builder.appName('Empty_Dataframe').getOrCreate()
 
# Create an empty RDD
emp_RDD = spark.sparkContext.emptyRDD()
 
# Create empty schema
columns = StructType([])
 
# Create an empty RDD with empty schema
data = spark.createDataFrame(data = emp_RDD,
                             schema = columns)
 
# Print the dataframe
print('Dataframe :')
data.show()
 
# Print the schema
print('Schema :')
data.printSchema()

Output: 



Dataframe :
++
||
++
++

Schema :
root

Creating an emptyRDD with schema

It is possible that we will not get a file for processing. However, we must still manually create a DataFrame with the appropriate schema.

Code:




# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.types import *
 
# Create a spark session
spark = SparkSession.builder.appName('Empty_Dataframe').getOrCreate()
 
# Create an empty RDD
emp_RDD = spark.sparkContext.emptyRDD()
 
# Create an expected schema
columns = StructType([StructField('Name',
                                  StringType(), True),
                    StructField('Age',
                                StringType(), True),
                    StructField('Gender',
                                StringType(), True)])
 
# Create an empty RDD with expected schema
df = spark.createDataFrame(data = emp_RDD,
                           schema = columns)
 
# Print the dataframe
print('Dataframe :')
df.show()
 
# Print the schema
print('Schema :')
df.printSchema()

Output :

Dataframe :
+----+---+------+
|Name|Age|Gender|
+----+---+------+
+----+---+------+

Schema :
root
 |-- Name: string (nullable = true)
 |-- Age: string (nullable = true)
 |-- Gender: string (nullable = true)

Creating an empty dataframe without schema

Code:




# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.types import *
 
# Create a spark session
spark = SparkSession.builder.appName('Empty_Dataframe').getOrCreate()
 
# Create an empty schema
columns = StructType([])
 
# Create an empty dataframe with empty schema
df = spark.createDataFrame(data = [],
                           schema = columns)
 
# Print the dataframe
print('Dataframe :')
df.show()
 
# Print the schema
print('Schema :')
df.printSchema()

Output:

Dataframe :
++
||
++
++

Schema :
root

Creating an empty dataframe with schema

Code:




# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.types import *
 
# Create a spark session
spark = SparkSession.builder.appName('Empty_Dataframe').getOrCreate()
 
# Create an expected schema
columns = StructType([StructField('Name',
                                  StringType(), True),
                    StructField('Age',
                                StringType(), True),
                    StructField('Gender',
                                StringType(), True)])
 
# Create a dataframe with expected schema
df = spark.createDataFrame(data = [],
                           schema = columns)
 
# Print the dataframe
print('Dataframe :')
df.show()
 
# Print the schema
print('Schema :')
df.printSchema()

Output :

Dataframe :
+----+---+------+
|Name|Age|Gender|
+----+---+------+
+----+---+------+

Schema :
root
 |-- Name: string (nullable = true)
 |-- Age: string (nullable = true)
 |-- Gender: string (nullable = true)

Article Tags :