Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.
Sometimes csv file has null values, which are later displayed as NaN in Data Frame. Pandas dropna() method allows the user to analyze and drop Rows/Columns with Null values in different ways.
DataFrameName.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
axis: axis takes int or string value for rows/columns. Input can be 0 or 1 for Integer and ‘index’ or ‘columns’ for String.
how: how takes string value of two kinds only (‘any’ or ‘all’). ‘any’ drops the row/column if ANY value is Null and ‘all’ drops only if ALL values are null.
thresh: thresh takes integer value which tells minimum amount of na values to drop.
subset: It’s an array which limits the dropping process to passed rows/columns through list.
inplace: It is a boolean which makes the changes in data frame itself if True.
For link to CSV file Used in Code, click here.
Example #1: Dropping Rows with at least 1 null value.
Data frame is read and all rows with any Null values are dropped. The size of old and new data frames is compared to see how many rows had at least 1 Null value.
Old data frame length: 458 New data frame length: 364 Number of rows with at least 1 NA value: 94
Since the difference is 94, there were 94 rows which had at least 1 Null value in any column.
Example #2: Changing axis and using how and inplace Parameters
Two data frames are made. A column with all values = none is added to the new Data frame. Column names are verified to see if the Null column was inserted properly. Then Number of columns is compared before and after dropping NaN values.
['Name' 'Team' 'Number' 'Position' 'Age' 'Height' 'Weight' 'College' 'Salary'] ['Name' 'Team' 'Number' 'Position' 'Age' 'Height' 'Weight' 'College' 'Salary' 'Null Column'] Column number before dropping Null column 9 10 Column number after dropping Null column 9 9
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.
- Python | pandas.to_markdown() in Pandas
- Add a Pandas series to another Pandas series
- Python | Pandas Index.insert()
- Python | Pandas DatetimeIndex.inferred_freq
- Python | Pandas PeriodIndex.start_time
- Python | Pandas PeriodIndex.week
- Python | Pandas Timestamp.second
- Python | Pandas Series.asobject
- Python | Pandas str.join() to join string/list elements with passed delimiter
- Python | Pandas DataFrame.reset_index()
- Python | Pandas dataframe.notna()
- Python | Pandas PeriodIndex.weekday
- Python | Pandas Series.dt.floor
- Python | Pandas Index.get_slice_bound()
- Python | Pandas Dataframe.duplicated()
- Python | Pandas dataframe.notnull()
- Python | Pandas series.cumprod() to find Cumulative product of a Series
- Use Pandas to Calculate Statistics in Python
- Python | Pandas Timestamp.date
- Python | Pandas Timestamp.ctime
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.