Open In App

How to check dataframe size in Scala?

Last Updated : 27 Mar, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will learn how to check dataframe size in Scala. To check the size of a DataFrame in Scala, you can use the count() function, which returns the number of rows in the DataFrame.

Here’s how you can do it:

Syntax:

val size = dataframe.count()

Example #1:

Scala
import org.apache.spark.sql.{DataFrame, SparkSession}

object DataFrameSizeCheck {
  def main(args: Array[String]): Unit = {
    // Create SparkSession
    val spark = SparkSession.builder()
      .appName("DataFrameSizeCheck")
      .master("local[*]")
      .getOrCreate()

    // Sample DataFrame (replace this with your actual DataFrame)
    val dataframe: DataFrame = spark.emptyDataFrame

    // Get the size of DataFrame
    val size = dataframe.count()

    // Print the size
    println(s"DataFrame size: $size")

    // Stop SparkSession
    spark.stop()
  }
}

Output:

DataFrame size: 0

Explanation:

  • We create a SparkSession.
  • Define a sample DataFrame. You should replace this with your actual DataFrame.
  • We use the count() function to get the size of the DataFrame, i.e., the number of rows it contains.

Example #2:

Scala
import org.apache.spark.sql.{DataFrame, SparkSession}
import org.apache.spark.sql.types.{IntegerType, StructField, StructType}

object DataFrameSizeCheck {
  def main(args: Array[String]): Unit = {
    // Create SparkSession
    val spark = SparkSession.builder()
      .appName("DataFrameSizeCheck")
      .master("local[*]")
      .getOrCreate()

    // Sample data for DataFrame
    val data = Seq(
      (1, "John"),
      (2, "Alice"),
      (3, "Bob")
    )

    // Define the schema
    val schema = StructType(
      Seq(
        StructField("ID", IntegerType, nullable = false),
        StructField("Name", StringType, nullable = false)
      )
    )

    // Create DataFrame
    val dataframe: DataFrame = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)

    // Get the size of DataFrame
    val size = dataframe.count()

    // Print the size
    println(s"DataFrame size: $size")

    // Stop SparkSession
    spark.stop()
  }
}

Output:

DataFrame size: 3

Explanation:

  • We create a SparkSession.
  • Define some sample data in the form of tuples.
  • Define the schema for the DataFrame, specifying the data types of each column.
  • Create a DataFrame using the createDataFrame method and passing the sample data and schema.
  • Use the count() function to get the size of the DataFrame.
  • Print the size of the DataFrame.
  • Finally, we stop the SparkSession.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads