Open In App

Difference Between Dataset.from_tensors and Dataset.from_tensor_slices

Last Updated : 19 Dec, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

In this article, we will learn the difference between from_tensors and from_tensor_slices. Both of these functionalities are used to iterate a dataset or convert a data to TensorFlow data pipeline but how it is done difference lies there. Suppose we have a dataset represented as a Numpy matrix of shape (num_features, num_examples) and we wish to convert it to Tensorflow type tf.data.Dataset.

Difference between Dataset.from_tensors and Dataset.from_tensor_slices

Now we have two methods to do this – Dataset.from_tensors and Dataset.from_tensor_slices.

from_tensors – This method is used to combine several smaller datasets to form a large dataset.

from_tensor_slices – This method is generally used while training machine learning models using data input pipeline. This methods help us to combine the independent features and their target as one dataset.

We will try to understand this one by one using code examples of the same. First of all, the main condition to using the from_tensor_slices is that the dimension of the matrix at the 0th rank must be the same.

Necessary Condition of Shapes in from_tensors and from_tensor_slices

There is a condition to using the from_tensor_slices function but there is no such in the case of from_tensors. The condition is that the input data or tensor’s shape must be the same if one wants to use the from_tensor_slices method. This condition is also referred to as the same dimension at the 0th rank of the input matrix. 

Python3




import tensorflow as tf
ds1 = (
    tf.data.Dataset
    .from_tensors((tf.random.uniform([10, 4]),
                   tf.random.uniform([9])))
)
ds1


Output:

<TensorDataset element_spec=(TensorSpec(shape=(10, 4), dtype=tf.float32, name=None),
 TensorSpec(shape=(10,), dtype=tf.float32, name=None))>

Now if we will try to use this same function using the from_tensor_slices method then we will get an error message of incompatibility.

Python3




ds2 = (
    tf.data.Dataset
    .from_tensor_slices((tf.random.uniform([10, 4]),
                         tf.random.uniform([9])))
)
ds2


Output:

#ERROR

The above code will give an error because the necessary condition of the same dimension at the 0th rank does not meet.

Way of Combining Input Data in from_tensors & .from_tensor_slices

from_tensors method combine smaller dataset to form a large data set but the from_tensor_slices don’t do any such thing. Let’s look at this using the below implementation.

Python3




dataset1 = tf.data.Dataset.from_tensors(
    [tf.random.uniform([2, 3]), tf.random.uniform([2, 3])])
print(dataset1)


Output:

<TensorDataset element_spec=TensorSpec(shape=(2, 2, 3), dtype=tf.float32, name=None)>

From the shape of the above dataset, we can say that it has combined the two data into a single data.

Python3




dataset2 = tf.data.Dataset.from_tensor_slices(
    [tf.random.uniform([2, 3]), tf.random.uniform([2, 3])])
print(dataset2)


Output:

<TensorSliceDataset element_spec=TensorSpec(shape=(2, 3),
 dtype=tf.float32, name=None)>

Way of Interpreting Input Data in from_tensors and from_tensor_slices

The next difference lies in the way data is being treated by these two functions.

Python3




t1 = tf.constant([[1, 2], [3, 4]])
ds1 = tf.data.Dataset.from_tensors(t1)
[x for x in ds1]


Output:

[<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
 array([[1, 2],
        [3, 4]], dtype=int32)>]

Now let’s look at the from_tensor_slices output for the same data or input matrix.

Python3




t2 = tf.constant([[1, 2], [3, 4]])
ds2 = tf.data.Dataset.from_tensor_slices(t2)
[x for x in ds2]


Output:

[<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 2], dtype=int32)>,
 <tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 4], dtype=int32)>]

From the above output, we can say that from_tensors combines the whole data as a single entity. But the same is not true with from_tensor_slices it creates slices of the input data row-wise. From the above format in which the content of these two datasets has been printed even after the data was the same. We can say that ds1 has shape (2, 2) but in the case of ds2, it is (2, 2, 3).



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads