Open In App

How to create Spark session in Scala?

Scala stands for scalable language. It was developed in 2003 by Martin Odersky. It is an object-oriented language that provides support for functional programming approach as well. Everything in scala is an object e.g. - values like 1,2 can invoke functions like toString(). Scala is a statically typed language although unlike other statically typed languages like C, C++, or Java, it doesn't require type information while writing the code. The type verification is done at the compile time. Static typing allows to building of safe systems by default. Smart built-in checks and actionable error messages, combined with thread-safe data structures and collections, prevent many tricky bugs before the program first runs.

Understanding Spark

The official definition of Spark on its website is "Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters".

Let us dive deeper and better understand what Spark means.

Understanding SparkSession

The SparkSession class is the entry point into all functionality in Spark. It was introduced in Spark 2.0. It serves as a bridge to access all of Spark's core features, encompassing RDDs, DataFrames, and Datasets, offering a cohesive interface for handling structured data processing. When developing a Spark SQL application, it is typically one of the initial objects you instantiate.

Let us dive deeper and better understand what SparkSession means.

Creating SparkSession

Method 1 - Using builder API

The SparkSession object can be created using the builder API as follows.

import org.apache.spark.sql.SparkSession

object createSparkSession {
  def main(args: Array[String]): Unit = {
    val spark: SparkSession = SparkSession.builder()
        .master("local[1]")
        .appName("CreatingSparkSession")
        .getOrCreate()
    println(spark)
  }
}


file

The SparkSession object


Above, we used the builder function available in the SparkSession object (scala companion object, not a normal object of the class) that can create a sparksession object.

We can even access the sparkcontext and sqlcontext from this sparksession object. Let's see how to extract these from the sparksession object.

println(spark.sparkContext)
println(spark.sqlContext)


file

Accessing sparkcontext and sqlcontext


Here we have access the sparkcontext and sqlcontext objects present inside the sparksession.

Adding configuration to the sparksession object.

We can even add configuration options to the sparksession object to change its behaviour according to our needs. For this, we need to call the config function. Let us see how to provide the configuration while creating sparksession.

val spark: SparkSession = SparkSession.builder()
      .master("local[1]")
      .appName("CreatingSparkSession")
      .config("spark.sql.warehouse.dir", "<path>/spark-warehouse")
      .getOrCreate()

Method 2 - From existing sparksession

We can create a new sparksession from an existing sparksession. Let us see how to do

import org.apache.spark.sql.SparkSession

object createSparkSession {
  def main(args: Array[String]): Unit = {
    val spark: SparkSession = SparkSession.builder()
      .master("local[1]")
      .appName("CreatingSparkSession")
      .getOrCreate()
    val spark_2 = spark.newSession()
    println(spark_2)
  }
}


file

SparkSession object


Here we created a new sparksession object from an existing sparksession object. We can create as many sparksessions as we want. However, to use this method there must be an existing sparksession object to create a new sparksession from.

Article Tags :