Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
startswith() is yet another method to search and filter text data in Series or Data Frame. This method is Similar to Python’s startswith() method, but has different parameters and it works on Pandas objects only. Hence .str has to be prefixed everytime before calling this method, so that the compiler knows that it’s different from default function.
Syntax: Series.str.startswith(pat, na=nan)
pat: String to be searched. (Regex are not accepted)
na: Used to set what should be displayed if the value in series is NULL.
Return type: Boolean series which is True where the value has the passed string in the start.
To download the CSV used in code, click here.
In the following examples, the data frame used contains data of some NBA players. The image of data frame before any operations is attached below.
Example #1: Returning Bool series
In this example, the college column is checked if elements have “G” in the start of string using the str.startswith() function. A Boolean series is returned which is true at the index position where string has “G” in the start.
As shown in the output image, The bool series is having True at the index position where the College column was having “G” in the starting. It can also be compared by looking at the image of original data frame.
Example #2: Handling NULL values
The most important part in data analysis is handling Null values. As it can be seen in the above output image, the Boolean series is having NaN wherever the value in College column was empty or NaN. If this boolean series is passed into data frame, it will give an error. Hence, the NaN values need to be handled using na Parameter. It can be set to string too, but since bool series is used to pass and return respective value, it should be set to a Bool value only.
In this example, na Parameter is set to False. So wherever the College column is having Null value, the Bool series will store False instead of NaN. After that, the series is passed again to data frame to display only True values.
As shown in the output image, the data frame is having rows which have “G” in starting of string in the College column. NaN values are not displayed since the na parameter was set to False.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.