How to get name of dataframe column in PySpark ?
In this article, we will discuss how to get the name of the Dataframe column in PySpark.
To get the name of the columns present in the Dataframe we are using the columns function through this function we will get the list of all the column names present in the Dataframe.
We can also get the names of the columns from the list of StructFields then extract the name of the columns from the list of StructFields.
Let’s create a sample dataframe given below:
Example 1: Using df.columns
In the example, we have created the Dataframe, then we’re getting the list of column names present in the Dataframe using df.columns then we have printed the list of column names.
Example 2: Using df.schema.fields
In the example, we have created the Dataframe, then we are getting the list of StructFields that contains the name of the column, datatype of the column, and nullable flag.
We have stored this list of StructFields in the variable named as ‘field’ then iterate the for loop of field and for getting the count of iteration we have taken the count of and used enumerate() function for getting the count from 1 onwards we have passed 1 after passing the field in the enumerate() function. Then print the count and names of the column simultaneously.
Example 3: Using df.printSchema()
Another way of seeing or getting the names of the column present in the dataframe we can see the Schema of the Datafame, this can be done by the function printSchema() this function is used to print the schema of the Dataframe from that scheme we can see all the column names.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course