Open In App

How to print dataframe in Scala?

Last Updated : 01 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Scala stands for scalable language. It was developed in 2003 by Martin Odersky. It is an object-oriented language that provides support for functional programming approach as well. Everything in scala is an object e.g. – values like 1,2 can invoke functions like toString(). Scala is a statically typed language although unlike other statically typed languages like C, C++, or Java, it doesn’t require type information while writing the code. The type verification is done at the compile time. Static typing allows to building of safe systems by default. Smart built-in checks and actionable error messages, combined with thread-safe data structures and collections, prevent many tricky bugs before the program first runs.

Understanding Dataframe and Spark

Before building a dataframe, let’s take a brief introduction about it. A dataframe is a data structure in the Spark Language. Spark is used to develop distributed products i.e. a code that can be run on many machines at the same time. The main purpose of such products is to process large data for business analysis. The dataframe is a tabular structure that can store structured and semi-structured data. For unstructured data, we need to modify it to fit in the dataframe. Dataframes are built on the core API of Spark called RDDs to provide type-safety, optimization, and other things.

Building Sample Dataframe

Let us build a sample dataframe to print from in Scala.

Scala
import org.apache.spark.sql.SparkSession

val spark: SparkSession = SparkSession.builder().master("local[1]").getOrCreate()

val columns = Seq("Id", "Name")
val data    = Seq(("1", "Dhruv"), ("2", "Akash"), ("3", "Aayush"))

val class_df = spark.createDataFrame(data).toDF(columns:_*)

Here we have just a simple dataframe and filled in some values.

Print Dataframe

We can easily display the dataframe using the show() command. Its syntax is as follows

show()
show(numRows : scala.Int)
show(truncate : scala.Boolean)
show(numRows : scala.Int, truncate : scala.Boolean)
show(numRows : scala.Int, truncate : scala.Int)
show(numRows : scala.Int, truncate : scala.Int, vertical : scala.Boolean)

We can run show command on our dataframe as follows.

Scala
class_df.show()

Output:

file

Output

To understand the meaning of the arguments of show command, let us build a dataframe with more number of rows and larger names. Use the following code to build the required dataframe.

Scala
import org.apache.spark.sql.SparkSession
import scala.util.Random

val spark: SparkSession = SparkSession.builder().master("local[1]").getOrCreate()

val columns = Seq("Id", "Code_Name")
var data = Seq[(String, String)]()

for (i <- 1 to 30) {
  val randomString = Random.alphanumeric.take(30).mkString
  data = data :+ (i.toString, randomString)
}

val class_df = spark.createDataFrame(data).toDF(columns:_*)
class_df.show()

Output:

file

Output

Here the output gives us too many rows and even the rows are itself truncated. Now let us see the various ways of using the show command to give us better formatted displays of the dataframe

Example 1: Using numRows

This will print only the specified number of rows in the output.

Scala
class_df.show(3)

Output:

file

Output


Method 2: Using truncate (as Boolean)

This will print the data without truncating any values.

Scala
class_df.show(numRows = 3, truncate = false)

Output:

file

Output


Here the code_name column’s values are fully displayed.

Example 3: Using truncate (as Integer)

We can provide a numeric value to truncate to specify the maximum number of characters to be displayed for each value. We will restrict the code_name to its first 6 characters as follows.

Scala
class_df.show(numRows = 3, truncate = 9)

Output:

file

Output


Example 4: Using vertical

The vertical argument shows each row in a vertical manner by printing each column in a new line. The vertical argument can only be specified both the other arguments are specified. Let us print the first three rows vertically in the following example.

Scala
class_df.show(numRows = 3, truncate = 9, vertical = true)

Output:

file

Output


As it can be seen each column in each row is printed in a new row. This is just another format of printing the dataframe.

Conclusion

We have seen that we can use the show command to print the dataframe. The show command is very powerful and can display the dataframe in a number of ways as per the requirement of the user. As show above, it has three arguments namely numRows, truncate and vertical. Each of these three arguments provides a way to control the display of the dataframe and in combination they cover all the requirements for displaying a dataframe.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads