Spark Trim String Column on DataFrame
Last Updated :
29 Sep, 2023
In this article, we will see that in PySpark, we can remove white spaces in the DataFrame string column. Here we will perform a similar operation to trim() (removes left and right white spaces) present in SQL in PySpark itself.
PySpark Trim String Column on DataFrame
Below are the ways by which we can trim String Column on DataFrame in PySpark:
- Using withColumn with rtrim()
- Using withColumn with trim()
- Using select()
- Using SQL Expression
- Using PySpark trim(), rtrim(), ltrim()
PySpark Trim using withColumn() with rtrim()
In this example, we are using withColumn() with rtrim() to remove white spaces.
Python3
from pyspark.sql import SparkSession
from pyspark.sql.functions import trim, ltrim, rtrim, col
spark = SparkSession.builder.appName( "WhiteSpaceRemoval" ).getOrCreate()
data = [( 1 , "ABC " ), ( 2 , " DEF" ), ( 3 , " GHI " )]
df = spark.createDataFrame(data, [ "col1" , "col2" ])
df.show()
df = df.withColumn( "col2" , rtrim(col( "col2" )))
df.show()
spark.stop()
|
+----+----------+
|col1| col2|
+----+----------+
| 1| ABC |
| 2| DEF |
| 3| GHI|
+----+----------+
+----+---+
|col1|col2|
+----+---+
| 1|ABC|
| 2|DEF|
| 3|GHI|
+----+---+
PySpark Trim using withColumn with trim()
In this example, we are using withColumn() with trim() to remove white spaces.
Python3
from pyspark.sql import SparkSession
from pyspark.sql.functions import trim, ltrim, rtrim, col
spark = SparkSession.builder.appName( "WhiteSpaceRemoval" ).getOrCreate()
data = [( 1 , "ABC " ), ( 2 , " DEF" ), ( 3 , " GHI " )]
df = spark.createDataFrame(data, [ "col1" , "col2" ])
df.show()
df = df.withColumn( "col2" , trim(col( "col2" )))
df.show()
spark.stop()
|
+----+----------+
|col1| col2|
+----+----------+
| 1| ABC |
| 2| DEF |
| 3| GHI|
+----+----------+
+----+---+
|col1|col2|
+----+---+
| 1|ABC|
| 2|DEF|
| 3|GHI|
+----+---+
PySpark Trim using select() to Remove White Spaces
In this example, we are using select() to trim in PySpark.
Python3
from pyspark.sql import SparkSession
from pyspark.sql.functions import trim, ltrim, rtrim, col
spark = SparkSession.builder.appName( "WhiteSpaceRemoval" ).getOrCreate()
data = [( 1 , "ABC " ), ( 2 , " DEF" ), ( 3 , " GHI " )]
df = spark.createDataFrame(data, [ "col1" , "col2" ])
df.show()
df = df.select(col( "col1" ), trim(col( "col2" )).alias( "col2" ))
df.show()
spark.stop()
|
+----+----------+
|col1| col2|
+----+----------+
| 1| ABC |
| 2| DEF |
| 3| GHI|
+----+----------+
+----+---+
|col1|col2|
+----+---+
| 1|ABC|
| 2|DEF|
| 3|GHI|
+----+---+
Using SQL Expression
In this example, we are using SQL Expression to remove extra whitespaces.
Python3
from pyspark.sql import SparkSession
from pyspark.sql.functions import trim, ltrim, rtrim, col
spark = SparkSession.builder.appName( "WhiteSpaceRemoval" ).getOrCreate()
data = [( 1 , "ABC " ), ( 2 , " DEF" ), ( 3 , " GHI " )]
df = spark.createDataFrame(data, [ "col1" , "col2" ])
df.show()
df.createOrReplaceTempView( "TAB" )
spark.sql( "SELECT col1, TRIM(col2) AS col2 FROM TAB" ).show()
spark.stop()
|
+----+----------+
|col1| col2|
+----+----------+
| 1| ABC |
| 2| DEF |
| 3| GHI|
+----+----------+
+----+---+
|col1|col2|
+----+---+
| 1|ABC|
| 2|DEF|
| 3|GHI|
+----+---+
Using PySpark trim(), rtrim(), ltrim()
In PySpark, we can easily remove whitespaces or trim by using pyspark.sql.functions.trim() function. We use ltrim() to remove only left white spaces and rtrim() function to remove right white spaces.
Python3
from pyspark.sql.functions import trim, ltrim, rtrim
data = [( 1 , "ABC " ), ( 2 , " DEF" ), ( 3 , " GHI " )]
df = spark.createDataFrame(data = data, schema = [ "col1" , "col2" ])
df.show()
df.withColumn( "col2" , trim( "col2" )).show()
df.withColumn( "col2" , ltrim( "col2" )).show()
df.withColumn( "col2" , rtrim( "col2" )).show()
df.select( "col1" , trim( "col2" ).alias( 'col2' )).show()
df.createOrReplaceTempView( "TAB" )
spark.sql( "select col1,trim(col2) as col2 from TAB" ).show()
|
+----+----------+
|col1| col2|
+----+----------+
| 1| ABC |
| 2| DEF |
| 3| GHI|
+----+----------+
+----+---+
|col1|col2|
+----+---+
| 1|ABC|
| 2|DEF|
| 3|GHI|
+----+---+
+----+---+
|col1|col2|
+----+---+
| 1|ABC|
| 2|DEF|
| 3|GHI|
+----+---+
+----+---+
|col1|col2|
+----+---+
| 1|ABC|
| 2|DEF|
| 3|GHI|
+----+---+
+----+---+
|col1|col2|
+----+---+
| 1|ABC|
| 2|DEF|
| 3|GHI|
+----+---+
Share your thoughts in the comments
Please Login to comment...