Given some mixed data containing multiple values as a string, let’s see how can we divide the strings using regex and make multiple columns in Pandas DataFrame.
In this method we will use
re.search(pattern, string, flags=0). Here pattern refers to the pattern that we want to search. It takes in a string with the following values:
- \w matches alphanumeric characters
- \d matches digits, which means 0-9
- \s matches whitespace characters
- \S matches non-whitespace characters
- . matches any character except the new line character \n
- * matches 0 or more instances of a pattern
- In the code above, we use a for loop to iterate through movie data so we can work with each movie in turn. We create a dictionary, movies, that will hold all the details of each detail, such as the rating and name.
- We then find the entire Name field using the
re.search()function. The . means any character except \n, and * extends it to the end of the line. Assign this to the variable name_field.
- But, data isn’t always straightforward. It can contain surprises. For instance, what if there’s no Name: field? The script would throw an error and break. We pre-empt errors from this scenario and check for a not None case.
- Again we use the re.search() function to extract the final required string from the name_field. For the name we use \w* to represent the first word, \s to represent the space in between and \w* for the second word.
- Do the same for year and rating and get the final required dictionary.
To break up the string we will use
Series.str.extract(pat, flags=0, expand=True) function. Here pat refers to the pattern that we want to search for.
- Split a text column into two columns in Pandas DataFrame
- Python | Pandas Split strings into two List/Columns using str.split()
- Replace values in Pandas dataframe using regex
- How to rename columns in Pandas DataFrame
- Python | Pandas DataFrame.columns
- Difference of two columns in Pandas dataframe
- Conditional operation on Pandas DataFrame columns
- Dealing with Rows and Columns in Pandas DataFrame
- Iterating over rows and columns in Pandas DataFrame
- How to select multiple columns in a pandas dataframe
- Getting frequency counts of a columns in Pandas DataFrame
- How to drop one or multiple columns in Pandas Dataframe
- Change Data Type for one or more columns in Pandas Dataframe
- Using dictionary to remap values in Pandas DataFrame columns
- Python | Pandas Reverse split strings into two List/Columns using str.rsplit()
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.