Given some mixed data containing multiple values as a string, let’s see how can we divide the strings using regex and make multiple columns in Pandas DataFrame.
In this method we will use
re.search(pattern, string, flags=0). Here pattern refers to the pattern that we want to search. It takes in a string with the following values:
- \w matches alphanumeric characters
- \d matches digits, which means 0-9
- \s matches whitespace characters
- \S matches non-whitespace characters
- . matches any character except the new line character \n
- * matches 0 or more instances of a pattern
- In the code above, we use a for loop to iterate through movie data so we can work with each movie in turn. We create a dictionary, movies, that will hold all the details of each detail, such as the rating and name.
- We then find the entire Name field using the
re.search()function. The . means any character except \n, and * extends it to the end of the line. Assign this to the variable name_field.
- But, data isn’t always straightforward. It can contain surprises. For instance, what if there’s no Name: field? The script would throw an error and break. We pre-empt errors from this scenario and check for a not None case.
- Again we use the re.search() function to extract the final required string from the name_field. For the name we use \w* to represent the first word, \s to represent the space in between and \w* for the second word.
- Do the same for year and rating and get the final required dictionary.
To break up the string we will use
Series.str.extract(pat, flags=0, expand=True) function. Here pat refers to the pattern that we want to search for.
- Python | Pandas Split strings into two List/Columns using str.split()
- Split a text column into two columns in Pandas DataFrame
- Python | Pandas Reverse split strings into two List/Columns using str.rsplit()
- How to Remove repetitive characters from words of the given Pandas DataFrame using Regex?
- Replace values in Pandas dataframe using regex
- Extract date from a specified column of a given Pandas DataFrame using Regex
- Split large Pandas Dataframe into list of smaller Dataframes
- Convert given Pandas series into a dataframe with its index as another column on the dataframe
- Python | Delete rows/columns from DataFrame using Pandas.drop()
- Using dictionary to remap values in Pandas DataFrame columns
- Select Rows & Columns by Name or Index in Pandas DataFrame using [ ], loc & iloc
- Highlight Pandas DataFrame's specific columns using applymap()
- Highlight Pandas DataFrame's specific columns using apply()
- How to select multiple columns in a pandas dataframe
- How to drop one or multiple columns in Pandas Dataframe
- How to rename columns in Pandas DataFrame
- Difference of two columns in Pandas dataframe
- Change Data Type for one or more columns in Pandas Dataframe
- Getting frequency counts of a columns in Pandas DataFrame
- Dealing with Rows and Columns in Pandas DataFrame
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.