Extract punctuation from the specified column of Dataframe using Regex

Last Updated : 29 Dec, 2020

Prerequisite: Regular Expression in Python

In this article, we will see how to extract punctuation used in the specified column of the Dataframe using Regex.

Firstly, we are making regular expression that contains all the punctuation: [!”\$%&\'()*+,\-.\/:;=#@?\[\\\]^_`{|}~]* Then we are passing each row of specific column to re.findall() function for extracting the punctuation and then assigning that extracted punctuation to a new column in a Dataframe.

re.findall() function is used to extract all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found.

Syntax: re.findall(regex, string)

Return: All non-overlapping matches of pattern in string, as a list of strings.

Now, Let’s create a Dataframe:

Python3

# import required libraries 
import pandas as pd 
import re 
  
# creating Dataframe with 
# name and their comments 
df = pd.DataFrame({ 
    'Name' : ['Akash', 'Ashish', 'Ayush', 
              'Diksha' , 'Radhika'], 
    
    'Comments': ['Hey! Akash how r u' ,  
                 'Why are you asking this to me?' , 
                 'Today, what we are going to do.' , 
                 'No plans for today why?' , 
                 'Wedding plans, what are you saying?']}, 
    
    columns = ['Name', 'Comments'] 
    ) 
  
# show the Dataframe 
df

Output:

Now, Extracting the punctuation from the column comment:

Python3

# define a function for extracting 
# the punctuations 
def check_find_punctuations(text): 
    
    # regular expression containing 
    # all punctuation 
    result = re.findall(r'[!"\$%&\'()*+,\-.\/:;=#@?\[\\\]^_`{|}~]*',  
                        text) 
      
    # form a string 
    string = "".join(result) 
      
    # list of strings return 
    return list(string) 
    
# creating new column name 
# as a punctuation_used and  
# applying user defined function 
# on each rows of Comments column 
df['punctuation_used'] = df['Comments'].apply( 
                         lambda x : check_find_punctuations(x) 
                         ) 
  
# show the Dataframe 
df

Output:

Suggest improvement

Convert the column type from string to datetime format in Pandas dataframe

Replace missing white spaces in a string with the least frequent character using Pandas

Share your thoughts in the comments

Pandas DataFrame Practice Exercises

Pandas Dataframe Rows Practice Exercise

Pandas Dataframe Columns Practice Exercise

Pandas Series Practice Exercise

Pandas Date and Time Practice Exercise

DataFrame String Manipulation

Accessing and Manipulating Data in DataFrame

DataFrame Visualization and Exporting

Data Aggregation and Grouping

Merging and Joining

Filtering and Selecting Data

Select Rows With Multiple Filters in Pandas

Selection and Slicing

Miscellaneous DataFrame Operations

Data Cleaning and Manipulation

Concatenation and Manipulation

DataFrame Sorting and Reordering

DataFrame Transformation and Conversion

DataFrame Filtering and Selection

DataFrame Conversion and Reshaping

Extract punctuation from the specified column of Dataframe using Regex

Python3

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?