Open In App

Pandas | Parsing JSON Dataset

Parsing of JSON Dataset using pandas is much more convenient. Pandas allow you to convert a list of lists into a Dataframe and specify the column names separately. A JSON parser transforms a JSON text into another representation must accept all texts that conform to the JSON grammar. It may accept non-JSON forms or extensions. An implementation may set the following:

Working with large JSON datasets can be deteriorating, particularly when they are too large to fit into memory. In cases like this, a combination of command line tools and Python can make for an efficient way to explore and analyze the data.



Importing JSON Files:

Manipulating the JSON is done using the Python Data Analysis Library, called pandas.

import pandas as pd

Now you can read the JSON and save it as a pandas data structure, using the command read_json



Parsing JSON Dataset Syntax

pandas.read_json (path_or_buf=None, orient = None, typ=’frame’, dtype=True, convert_axes=True, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, lines=False, chunksize=None, compression=’infer’)

Parsing in Pandas JSON Dataset Example

Here, we have various method of parsing in Pandas JSON dataset . So there we explaing some generally used method for parsing in Pandas JSON Dataset those are following.

In this example code creates a DataFrame ‘df’ and outputs its JSON representation using ‘split’ orientation, displaying row and column information. Another JSON representation is shown with ‘index’ orientation, emphasizing index-based organization in the output.




import pandas as pd
# Creating Dataframe 
df = pd.DataFrame([['a', 'b'], ['c', 'd']],
                  index =['row 1', 'row 2'],
                  columns =['col 1', 'col 2'])
  
# Indication of expected JSON string format
print(df.to_json(orient ='split'))
  
print(df.to_json(orient ='index'))

Output
{"columns":["col 1", "col 2"],
 "index":["row 1", "row 2"],
 "data":[["a", "b"], ["c", "d"]]}

{"row 1":{"col 1":"a", "col 2":"b"},
 "row 2":{"col 1":"c", "col 2":"d"}}

Convert the object to a JSON string using dataframe.to_json

DataFrame.to_json(path_or_buf=None, orient=None, date_format=None, double_precision=10, force_ascii=True, date_unit=’ms’, default_handler=None, lines=False, compression=’infer’, index=True)

Read the JSON File directly from Dataset

In this example below code reads and prints JSON data from the specified API endpoint (‘http://api.population.io/1.0/population/India/today-and-tomorrow/?format=json’) using the pandas library in Python.




import pandas as pd
  
print(data)

Output
total_population
0  {'date': '2019-03-18', 'population': 1369169250}
1  {'date': '2019-03-19', 'population': 1369211502}

Nested JSON Parsing with Pandas

Nested JSON files can be time consuming and difficult process to flatten and load into Pandas.
We are using nested ”’raw_nyc_phil.json.”’ to create a flattened pandas data frame from one nested array then unpack a deeply nested array. Let’s unpack the works column into a standalone dataframe. We’ll also grab the flat columns.

JSON Data Normalization and DataFrame Creation with Pandas : In this example code uses the ‘json’ and ‘pandas’ libraries to read a JSON file from a GitHub URL and loads it into a Python dictionary. It then normalizes the nested JSON data under the ‘programs’ key and creates a pandas DataFrame named ‘nycphil.




import json 
import pandas as pd 
from pandas.io.json import json_normalize 
  
    d = json.load(f)
  
# lets put the data into a pandas df
# clicking on raw_nyc_phil.json under "Input Files"
# tells us parent node is 'programs'
nycphil = json_normalize(d['programs'])
nycphil.head(3)

Output:

JSON Normalization For Works Data

Let’s unpack the works column into a standalone dataframe using




works_data = json_normalize(data = d['programs'],
                            record_path ='works'
                            meta =['id', 'orchestra', 'programID', 'season'])
works_data.head(3)

Output:

JSON Normalization For Soloists Data

Let’s flatten the ‘soloists’ data here by passing a list. Since soloists is nested in work.




soloist_data = json_normalize(data = d['programs'],
                              record_path =['works', 'soloists'],
                              meta =['id'])
  
soloist_data.head(3)

Output:


Article Tags :