Adding two columns to existing PySpark DataFrame using withColumn
In this article, we are going to see how to add two columns to the existing Pyspark Dataframe using WithColumns.
WithColumns is used to change the value, convert the datatype of an existing column, create a new column, and many more.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course
Syntax: df.withColumn(colName, col)
Returns: A new :class:`DataFrame` by adding a column or replacing the existing column that has the same name.
Example 1: Creating Dataframe and then add two columns.
Here we are going to create a dataframe from a list of the given dataset.
Now Add the columns:
Here, we create two-column based on the existing columns.
Example 2: Creating Dataframe from csv and then add the columns.
Here we will use the cricket_data_set_odi.csv file as a dataset and create dataframe from this file.
Creating Dataframe for demonstration:
Then, Adding the columns in an existing Dataframe: