Renaming columns for PySpark DataFrames Aggregates
In this article, we will discuss how to rename columns for PySpark dataframe aggregates using Pyspark.
Dataframe in use:
In PySpark, groupBy() is used to collect the identical data into groups on the PySpark DataFrame and perform aggregate functions on the grouped data. These are available in functions module:
Method 1: Using alias()
We can use this method to change the column name which is aggregated.
- dataframe is the input dataframe
- column_name_group is the grouped column
- aggregate_function is the function from the above functions
- column_name is the column where aggregation is performed
- new_column_name is the new name for column_name
Example 1: Aggregating DEPT column with sum() and avg() by changing FEE column name to Total Fee
Example 2 : Aggregating DEPT column with min(),count(),mean() and max() by changing FEE column name to Total Fee
Method 2: Using withColumnRenamed()
This takes a resultant aggregated column name and renames this column. After aggregation, It will return the column names as aggregate_operation(old_column)
so using this we can replace this with our new column
Example: Aggregating DEPT column with sum() FEE and rename to Total Fee
Please Login to comment...