Open In App

How to Print RDD in scala?

Last Updated : 01 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Scala stands for scalable language. It was developed in 2003 by Martin Odersky. It is an object-oriented language that provides support for functional programming approach as well. Everything in scala is an object e.g. – values like 1,2 can invoke functions like toString(). Scala is a statically typed language although unlike other statically typed languages like C, C++, or Java, it doesn’t require type information while writing the code. The type verification is done at the compile time. Static typing allows to building of safe systems by default. Smart built-in checks and actionable error messages, combined with thread-safe data structures and collections, prevent many tricky bugs before the program first runs.

Understanding RDD and Spark

Before building an RDD, let’s take a brief introduction about it. An RDD is the base object of Spark Language. Spark is used to develop distributed products i.e. a code that can be run on many machines at the same time. The main purpose of such products is to process large data for business analysis. The RDD is a collection of partitioned elements that can be operated in parallel. RDD stands for Resilient Distributed Dataset. Resilient means that the data structure will persist even after any failure that could result in data loss like a power outage. Distributed means that the processing of large datasets will be broken into smaller chunks to process. The RDD has now become an old API of the Spark Language, as its successors like DataFrame and DataSet have come up which are more optimized and provide type-safety to build better code.

Building Sample RDD

Let us build a sample rdd to print from in Scala.

Scala
import org.apache.spark.sql.SparkSession

val spark: SparkSession = SparkSession.builder().master("local[1]").getOrCreate()

val rdd=spark.sparkContext.parallelize(Seq(("Tutorials", "Print Rdd"),
  ("Language", "Scala"), ("Platform", "Gfg")))

Here we have just a simple rdd and filled in some values.

How to Print RDD in Scala?

Method 1: Using collect

The collect method accumulates all the values from the partitions and returns them as an array of rows. We can then print the returned array to display the rdd.

Scala
val collected_rdd = rdd.collect()
collected_rdd.foreach(println)

Output:

file

Output


As seen above the rdd is printed line by line.

The collect method should be used only on small datasets and not on large datasets. This is because the method collects the data from all the partitions into the memory. Thus large datasets might not fit into the memory and cause errors.

Method 2: Using foreach

We can loop through the rdd and print each row using the foreach function. Let us try to print the dataframe using the foreach function.

Scala
rdd.foreach(println)

Output:

file

Output


As seen above the rdd was printed without any issue.

Method 3: Using toDF

We can use the show function of the dataframe api of spark. For that, we need to convert the rdd to a dataframe. We will do that using the toDF function which can be imported from implicits. The show function is powerful with alot of different arguments to control the display of the dataframe. Let’s use the toDF to print our rdd.

Scala
import spark.implicits._
val df = rdd.toDF()
df.show()

Output:

file

Output


As seen above, the rdd is converted into a tabular structure and the data is printed with some default column names.

Conclusion

We can print the rdd using alot of methods. The first being the collect method which accumulates all the data into a single array and returns it. The second is the foreach method which loops over the entire rdd and prints each row one by one. The last one being the toDF function which converts the rdd to a dataframe and then we can use the show function to display the dataframe. The show function is very powerful and can display the dataframe in a variety of ways using its arguments.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads