How to Get substring from a column in PySpark Dataframe ?
In this article, we are going to see how to get the substring from the PySpark Dataframe column and how to create the new column and put the substring in that newly created column.
We can get the substring of the column using substring() and substr() function.
- str – It can be string or name of the column from which we are getting the substring.
- start and pos – Through this parameter we can give the starting position from where substring is start.
- length and len – It is the length of the substring from the starting position.
Let’s create a dataframe.
Example 1: Using substring() getting the substring and creating new column using withColumn() function.
Example 2: Creating New_Country column by getting the substring using substr() function.
Example 3: Using substring() with select() function.
Example 4: Using substring() with selectExpr() function.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course