Pandas | Parsing JSON Dataset
Parsing of JSON Dataset using pandas is much more convenient. Pandas allow you to convert a list of lists into a Dataframe and specify the column names separately. A JSON parser transforms a JSON text into another representation must accept all texts that conform to the JSON grammar. It may accept non-JSON forms or extensions. An implementation may set the following:
- limits on the size of texts that it accepts,
- limits on the maximum depth of nesting,
- limits on the range and precision of numbers,
- set limits on the length and character contents of strings.
Working with large JSON datasets can be deteriorating, particularly when they are too large to fit into memory. In cases like this, a combination of command line tools and Python can make for an efficient way to explore and analyze the data.
Importing JSON Files:
Manipulating the JSON is done using the Python Data Analysis Library, called pandas.
import pandas as pd
Now you can read the JSON and save it as a pandas data structure, using the command read_json
Parsing JSON Dataset Syntax
pandas.read_json (path_or_buf=None, orient = None, typ=’frame’, dtype=True, convert_axes=True, convert_dates=True, keep_default_dates=True, numpy=False, precise_float=False, date_unit=None, encoding=None, lines=False, chunksize=None, compression=’infer’)
Parsing in Pandas JSON Dataset Example
Here, we have various method of parsing in Pandas JSON dataset . So there we explaing some generally used method for parsing in Pandas JSON Dataset those are following.
In this example code creates a DataFrame ‘df’ and outputs its JSON representation using ‘split’ orientation, displaying row and column information. Another JSON representation is shown with ‘index’ orientation, emphasizing index-based organization in the output.
Python3
import pandas as pd
df = pd.DataFrame([[ 'a' , 'b' ], [ 'c' , 'd' ]],
index = [ 'row 1' , 'row 2' ],
columns = [ 'col 1' , 'col 2' ])
print (df.to_json(orient = 'split' ))
print (df.to_json(orient = 'index' ))
|
Output
{"columns":["col 1", "col 2"],
"index":["row 1", "row 2"],
"data":[["a", "b"], ["c", "d"]]}
{"row 1":{"col 1":"a", "col 2":"b"},
"row 2":{"col 1":"c", "col 2":"d"}}
Convert the object to a JSON string using dataframe.to_json
DataFrame.to_json(path_or_buf=None, orient=None, date_format=None, double_precision=10, force_ascii=True, date_unit=’ms’, default_handler=None, lines=False, compression=’infer’, index=True)
Read the JSON File directly from Dataset
In this example below code reads and prints JSON data from the specified API endpoint (‘http://api.population.io/1.0/population/India/today-and-tomorrow/?format=json’) using the pandas library in Python.
Python3
import pandas as pd
print (data)
|
Output
total_population
0 {'date': '2019-03-18', 'population': 1369169250}
1 {'date': '2019-03-19', 'population': 1369211502}
Nested JSON Parsing with Pandas
Nested JSON files can be time consuming and difficult process to flatten and load into Pandas.
We are using nested ”’raw_nyc_phil.json.”’ to create a flattened pandas data frame from one nested array then unpack a deeply nested array. Let’s unpack the works column into a standalone dataframe. We’ll also grab the flat columns.
JSON Data Normalization and DataFrame Creation with Pandas : In this example code uses the ‘json’ and ‘pandas’ libraries to read a JSON file from a GitHub URL and loads it into a Python dictionary. It then normalizes the nested JSON data under the ‘programs’ key and creates a pandas DataFrame named ‘nycphil.
Python3
import json
import pandas as pd
from pandas.io.json import json_normalize
d = json.load(f)
nycphil = json_normalize(d[ 'programs' ])
nycphil.head( 3 )
|
Output:
JSON Normalization For Works Data
Let’s unpack the works column into a standalone dataframe using
Python3
works_data = json_normalize(data = d[ 'programs' ],
record_path = 'works' ,
meta = [ 'id' , 'orchestra' , 'programID' , 'season' ])
works_data.head( 3 )
|
Output:
JSON Normalization For Soloists Data
Let’s flatten the ‘soloists’ data here by passing a list. Since soloists is nested in work.
Python3
soloist_data = json_normalize(data = d[ 'programs' ],
record_path = [ 'works' , 'soloists' ],
meta = [ 'id' ])
soloist_data.head( 3 )
|
Output:
Last Updated :
08 Mar, 2024
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...