Open In App

How to create an empty dataframe in Scala?

Last Updated : 29 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will learn how to create an empty dataframe in Scala. We can create an empty dataframe in Scala by using the createDataFrame method provided by the SparkSession object.

Syntax to create an empty DataFrame:

val df = spark.emptyDataFrame

Example of How to create an empty dataframe in Scala:

Scala
import org.apache.spark.sql.{SparkSession, DataFrame}
import org.apache.spark.sql.types.{StructType, StructField, StringType}

// Create SparkSession
val spark = SparkSession.builder()
  .appName("EmptyDataFrameExample")
  .getOrCreate()

// Define schema for the empty DataFrame
val schema = new StructType(Array(
  StructField("column_name", StringType, true)
))

// Create an empty DataFrame using createDataFrame 
// method with an empty RDD and the schema
val emptyDF: DataFrame = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)

// Show the schema of the empty DataFrame
emptyDF.printSchema()

Output:

sc

In this output we can see that empty dataframe is created in scala

Explanation of the above example:

  1. Import necessary classes from the org.apache.spark.sql package, including SparkSession, DataFrame, StructType, StructField, and StringType.
  2. Create a SparkSession object named spark.
  3. Define a schema for the empty DataFrame. In this example, we’re creating a DataFrame with a single column named “column_name” of type StringType. You can define your schema according to your requirements.
  4. Use the createDataFrame method of the SparkSession object (spark) to create an empty DataFrame. Pass an empty RDD of type Row and the schema you defined earlier.
  5. The resulting DataFrame (emptyDF) will have the schema defined earlier and no rows.
  6. Print the schema of the empty DataFrame using the printSchema method.

Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads