Skip to content
Related Articles

Related Articles

Improve Article

How to create an empty PySpark DataFrame ?

  • Last Updated : 11 Aug, 2021

In this article, we are going to see how to create an empty PySpark dataframe. Empty Pysaprk dataframe is a dataframe containing no data and may or may not specify the schema of the dataframe.

Creating an empty RDD without schema

We’ll first create an empty RDD by specifying an empty schema.

  • emptyRDD() method creates an RDD without any data.
  • createDataFrame() method creates a pyspark dataframe with the specified data and schema of the dataframe.

Code:

Python3




# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.types import *
 
# Create a spark session
spark = SparkSession.builder.appName('Empty_Dataframe').getOrCreate()
 
# Create an empty RDD
emp_RDD = spark.sparkContext.emptyRDD()
 
# Create empty schema
columns = StructType([])
 
# Create an empty RDD with empty schema
data = spark.createDataFrame(data = emp_RDD,
                             schema = columns)
 
# Print the dataframe
print('Dataframe :')
data.show()
 
# Print the schema
print('Schema :')
data.printSchema()

Output: 



Dataframe :
++
||
++
++

Schema :
root

Creating an emptyRDD with schema

It is possible that we will not get a file for processing. However, we must still manually create a DataFrame with the appropriate schema.

  • Specify the schema of the dataframe as columns = [‘Name’, ‘Age’, ‘Gender’].
  • Create an empty RDD with an expecting schema.

Code:

Python3




# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.types import *
 
# Create a spark session
spark = SparkSession.builder.appName('Empty_Dataframe').getOrCreate()
 
# Create an empty RDD
emp_RDD = spark.sparkContext.emptyRDD()
 
# Create an expected schema
columns = StructType([StructField('Name',
                                  StringType(), True),
                    StructField('Age',
                                StringType(), True),
                    StructField('Gender',
                                StringType(), True)])
 
# Create an empty RDD with expected schema
df = spark.createDataFrame(data = emp_RDD,
                           schema = columns)
 
# Print the dataframe
print('Dataframe :')
df.show()
 
# Print the schema
print('Schema :')
df.printSchema()

Output :

Dataframe :
+----+---+------+
|Name|Age|Gender|
+----+---+------+
+----+---+------+

Schema :
root
 |-- Name: string (nullable = true)
 |-- Age: string (nullable = true)
 |-- Gender: string (nullable = true)

Creating an empty dataframe without schema

  • Create an empty schema as columns.
  • Specify data as empty([]) and schema as columns in CreateDataFrame() method.

Code:

Python3




# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.types import *
 
# Create a spark session
spark = SparkSession.builder.appName('Empty_Dataframe').getOrCreate()
 
# Create an empty schema
columns = StructType([])
 
# Create an empty dataframe with empty schema
df = spark.createDataFrame(data = [],
                           schema = columns)
 
# Print the dataframe
print('Dataframe :')
df.show()
 
# Print the schema
print('Schema :')
df.printSchema()

Output:

Dataframe :
++
||
++
++

Schema :
root

Creating an empty dataframe with schema

  • Specify the schema of the dataframe as columns = [‘Name’, ‘Age’, ‘Gender’].
  • Specify data as empty([]) and schema as columns in CreateDataFrame() method.

Code:

Python3




# Import necessary libraries
from pyspark.sql import SparkSession
from pyspark.sql.types import *
 
# Create a spark session
spark = SparkSession.builder.appName('Empty_Dataframe').getOrCreate()
 
# Create an expected schema
columns = StructType([StructField('Name',
                                  StringType(), True),
                    StructField('Age',
                                StringType(), True),
                    StructField('Gender',
                                StringType(), True)])
 
# Create a dataframe with expected schema
df = spark.createDataFrame(data = [],
                           schema = columns)
 
# Print the dataframe
print('Dataframe :')
df.show()
 
# Print the schema
print('Schema :')
df.printSchema()

Output :

Dataframe :
+----+---+------+
|Name|Age|Gender|
+----+---+------+
+----+---+------+

Schema :
root
 |-- Name: string (nullable = true)
 |-- Age: string (nullable = true)
 |-- Gender: string (nullable = true)

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course




My Personal Notes arrow_drop_up
Recommended Articles
Page :