Open In App

Load CSV data in Tensorflow

Improve
Improve
Like Article
Like
Save
Share
Report

This article will look at the ways to load CSV data in the Python programming language using TensorFlow.

TensorFlow library provides the make_csv_dataset( ) function, which is used to read the data and use it in our programs. 

Loading single CSV File 

To get the single CSV data file from the URL, we use the Keras get_file function. Here we will use the Titanic Dataset. 

To use this, we add the following lines in our code:

Python3




import tensorflow as tf
from tensorflow.keras import layers
import pandas as pd
  
data_path = tf.keras.utils.get_file("data_train.csv"
                    "https://storage.googleapis.com/tf-datasets/titanic/train.csv")
  
data_train_tf = tf.data.experimental.make_csv_dataset(
    data_path,
    batch_size=10,
    label_name='survived',
    num_epochs=1,
    ignore_errors=True,)


The data now can be used as a dict where the key is the column name and values are the data records. The first item in the dataset is our data columns; the other is label data. In our data batch, each column/feature name acts as a key, and all values in the column are its value. 

Python3




for batch, label in data_train_tf.take(1):
    for key, value in batch.items():
        print(f"{key:10s}: {value}")


Output:

Loading CSV using Tensorflow

 

Loading Multiple CSVs Files:

The primary use of make_csv_dataset method can be seen when we have to import multiple CSV files into our dataset. We will use the fonts dataset, which contains different language fonts.

Example: In this example, we use the Keras get_file feature to read multiple datasets onto the disk, and cache_dir and cache_subdir define where to store these.

Once we have the datasets saved, then using the file_pattern command in our make_csv_dataset we can specify the path to all files to be imported. Create a new file and execute the following code:

Python3




fonts = tf.keras.utils.get_file('fonts.zip'
    cache_dir='.', cache_subdir='fonts',
    extract=True)
  
fonts_data = tf.data.experimental.make_csv_dataset(
    file_pattern="fonts/*.csv",
    batch_size=10, num_epochs=1,
    num_parallel_reads=4,
    shuffle_buffer_size=10000)
  
for features in fonts_data.take(1):
    for i, (name, value) in enumerate(features.items()):
        if i > 15:
            break
        print(f"{name:20s}: {value}")
    print(f"[total: {len(features)} features]")


We are displaying the first 15 features of each dataset and their values. The final count of total feature is displayed using len( ) function. In this example, we have 412 features in total.

Output:

Loading CSVs using Tensorflow

 



Last Updated : 23 Sep, 2022
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads