Pyspark – Converting JSON to DataFrame
In this article, we are going to convert JSON String to DataFrame in Pyspark.
Method 1: Using read_json()
We can read JSON files using pandas.read_json. This method is basically used to read JSON files through pandas.
Syntax: pandas.read_json(“file_name.json”)
Here we are going to use this JSON file for demonstration:
Code:
Python3
import pandas as pd
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate()
dataframe = spark.createDataFrame(pd.read_json( 'student.json' ))
dataframe.show()
|
Output:
Method 2: Using spark.read.json()
This is used to read a json data from a file and display the data in the form of a dataframe
Syntax: spark.read.json(‘file_name.json’)
JSON file for demonstration:
Code:
Python3
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName( 'sparkdf' ).getOrCreate()
data = spark.read.json( 'college.json' )
data.show()
|
Output:
Last Updated :
29 Jun, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...