Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). This data structure can be converted to NumPy ndarray with the help of the DataFrame.to_numpy() method. In this article we will see how to convert dataframe to numpy array.
Syntax of Pandas DataFrame.to_numpy()
Syntax: Dataframe.to_numpy(dtype = None, copy = False)
Parameters:
- dtype: Data type which we are passing like str.
- copy: [bool, default False] Ensures that the returned value is a not a view on another array.
Returns: numpy.ndarray
Convert DataFrame to Numpy Array
Here, we will see how to convert DataFrame to a Numpy array.
import pandas as pd
# initialize a dataframe df = pd.DataFrame(
[[ 1 , 2 , 3 ],
[ 4 , 5 , 6 ],
[ 7 , 8 , 9 ],
[ 10 , 11 , 12 ]],
columns = [ 'a' , 'b' , 'c' ])
# convert dataframe to numpy array arr = df.to_numpy()
print ( '\nNumpy Array\n----------\n' , arr)
print ( type (arr))
|
Output:
Numpy Array ---------- [[ 1 2 3] [ 4 5 6] [ 7 8 9] [10 11 12]] <class 'numpy.ndarray'>
Here we want to convert a particular column into numpy array.
import pandas as pd
# initialize a dataframe df = pd.DataFrame(
[[ 1 , 2 , 3 ],
[ 4 , 5 , 6 ],
[ 7 , 8 , 9 ],
[ 10 , 11 , 12 ]],
columns = [ 'a' , 'b' , 'c' ])
# convert dataframe to numpy array arr = df[[ 'a' , 'c' ]].to_numpy()
print ( '\nNumpy Array\n----------\n' , arr)
print ( type (arr))
|
Output:
Numpy Array ---------- [[ 1 3] [ 4 6] [ 7 9] [10 12]] <class 'numpy.ndarray'>
Here we are converting a dataframe with different datatypes.
import pandas as pd
import numpy as np
#initialize a dataframe df = pd.DataFrame(
[[ 1 , 2 , 3 ],
[ 4 , 5 , 6.5 ],
[ 7 , 8.5 , 9 ],
[ 10 , 11 , 12 ]],
columns = [ 'a' , 'b' , 'c' ])
arr = df.to_numpy()
print ( 'Numpy Array' , arr)
print ( 'Numpy Array Datatype :' , arr.dtype)
|
Output:
Numpy Array [[ 1. 2. 3. ] [ 4. 5. 6.5] [ 7. 8.5 9. ] [10. 11. 12. ]] Numpy Array Datatype : float64
To get the link to the CSV file, click on nba.csv
Example 1:
Here, we are using a CSV file for changing the Dataframe into a Numpy array by using the method DataFrame.to_numpy(). After that, we are printing the first five values of the Weight column by using the df.head() method.
# importing pandas import pandas as pd
# reading the csv data = pd.read_csv( "nba.csv" )
data.dropna(inplace = True )
# creating DataFrame from weight column df = pd.DataFrame(data[ 'Weight' ].head())
# using to_numpy() function print (df.to_numpy())
|
Output:
[[180.] [235.] [185.] [235.] [238.]]
Example 2:
In this example, we are just providing the parameters in the same code to provide the dtype here.
# importing pandas import pandas as pd
# read csv file data = pd.read_csv( "nba.csv" )
data.dropna(inplace = True )
# creating DataFrame from weight column df = pd.DataFrame(data[ 'Weight' ].head())
# providing dtype print (df.to_numpy(dtype = 'float32' ))
|
Output:
[[180.] [235.] [185.] [235.] [238.]]
Example 3:
Validating the type of the array after conversion.
# importing pandas import pandas as pd
# reading csv data = pd.read_csv( "nba.csv" )
data.dropna(inplace = True )
# creating DataFrame from weight column df = pd.DataFrame(data[ 'Weight' ].head())
# using to_numpy() print ( type (df.to_numpy()))
|
Output:
<class 'numpy.ndarray'>