Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
str.get_dummies() is used to separate each string in the caller series at the passed separator. A data frame is returned with all the possible values after splitting every string. If the text value in original data frame at same index contains the string (Column name/ Splited values) then the value at that position is 1 otherwise, 0.
Since this is a string operation, .str has to be prefixed every time before calling this function. Otherwise, it will throw an error.
sep: String value, separator to split strings at
Return type: Data frame with binary values only
To download the data set used in following examples, click here.
In the following examples, the data frame used contains data of some employees. The image of data frame before any operations is attached below.
Example #1: Separating different strings on whitespace.
In this example, string in the Team column have been split at ” ” (White-space) and the data frame is returned with all possible values after splitting. The value in returned data frame is 1 if the string(Column name) exists in the text value at same index in old data frame.
As shown in the output image, it can be compared with the original image of data frame. If the string exists at that same index, then value is 1 otherwise 0.
Example #2: Splitting at multiple points/Static value column
In this example, a static value is taken for the new column (“Hello gfg family”). Then the get_dummies() method is applied and the string is separated at “g”. Since “g” is occurring more than once, there will be more than one column and also the values in all column must be same as the string is also same for all rows.
As shown in output image, the new data frame has 3 columns and every row has same values.
- Python | pandas.to_markdown() in Pandas
- Python | pandas.map()
- Python | Pandas Series.str.contains()
- Python | Pandas PeriodIndex.second
- Python | Pandas DataFrame.ix[ ]
- Python | Pandas Series.get()
- Python | Pandas Series.mod()
- Python | Pandas Dataframe.at[ ]
- Python | Pandas Dataframe.iat[ ]
- Pandas.cut() method in Python
- Python | Pandas Series.all()
- Python | Pandas dataframe.add()
- Python | Pandas TimedeltaIndex.name
- Python | Pandas Index.contains()
- Python | Pandas Series.iat
- Python | Pandas Series.ix
- Python | Pandas Series.loc
- Python | Pandas Series.dt.tz
- Python | Pandas Series.xs
- Python | Pandas Series.where
Improved By : Akanksha_Rai