How to Count Occurrences of Specific Value in Pandas Column?
Last Updated :
28 Aug, 2023
In this article, we will discuss how to count occurrences of a specific column value in the pandas column.
Dataset in use:
We can count by using the value_counts() method. This function is used to count the values present in the entire dataframe and also count values in a particular column.
Syntax: data[‘column_name’].value_counts()[value]
where
- data is the input dataframe
- value is the string/integer value present in the column to be counted
- column_name is the column in the dataframe
Example: To count occurrences of a specific value
Python3
import pandas as pd
data = pd.DataFrame({
'name' : [ 'sravan' , 'ojsawi' , 'bobby' , 'rohith' ,
'gnanesh' , 'sravan' , 'sravan' , 'ojaswi' ],
'subjects' : [ 'java' , 'php' , 'java' , 'php' , 'java' ,
'html/css' , 'python' , 'R' ],
'marks' : [ 98 , 90 , 78 , 91 , 87 , 78 , 89 , 90 ],
'age' : [ 11 , 23 , 23 , 21 , 21 , 21 , 23 , 21 ]
})
print (data[ 'name' ].value_counts()[ 'sravan' ])
print (data[ 'subjects' ].value_counts()[ 'php' ])
print (data[ 'marks' ].value_counts()[ 89 ])
|
Output:
3
2
1
If we want to count all values in a particular column, then we do not need to mention the value.
Syntax:
data['column_name'].value_counts()
Example: To count the occurrence of a value in a particular column
Python3
import pandas as pd
data = pd.DataFrame({
'name' : [ 'sravan' , 'ojsawi' , 'bobby' , 'rohith' ,
'gnanesh' , 'sravan' , 'sravan' , 'ojaswi' ],
'subjects' : [ 'java' , 'php' , 'java' , 'php' , 'java' ,
'html/css' , 'python' , 'R' ],
'marks' : [ 98 , 90 , 78 , 91 , 87 , 78 , 89 , 90 ],
'age' : [ 11 , 23 , 23 , 21 , 21 , 21 , 23 , 21 ]
})
print (data[ 'name' ].value_counts())
print (data[ 'subjects' ].value_counts())
print (data[ 'marks' ].value_counts())
print (data[ 'age' ].value_counts())
|
Output:
If we want to get the results in order (like ascending and descending order), we have to specify the parameter
Syntax:
Ascending order:
data[‘column_name’].value_counts(ascending=True)
Descending Order:
data[‘column_name’].value_counts(ascending=False)
Example: To get results in an ordered fashion
Python3
import pandas as pd
data = pd.DataFrame({
'name' : [ 'sravan' , 'ojsawi' , 'bobby' , 'rohith' ,
'gnanesh' , 'sravan' , 'sravan' , 'ojaswi' ],
'subjects' : [ 'java' , 'php' , 'java' , 'php' , 'java' ,
'html/css' , 'python' , 'R' ],
'marks' : [ 98 , 90 , 78 , 91 , 87 , 78 , 89 , 90 ],
'age' : [ 11 , 23 , 23 , 21 , 21 , 21 , 23 , 21 ]
})
print (data[ 'name' ].value_counts(ascending = True ))
print (data[ 'subjects' ].value_counts(ascending = True ))
print (data[ 'marks' ].value_counts(ascending = False ))
print (data[ 'age' ].value_counts(ascending = False ))
|
Output:
Dealing with missing values
Here we can count the occurrence with or without NA values. By using dropna parameter to include NA values if set to True, it will not count NA if set to False.
Syntax:
Include NA values:
data[‘column_name’].value_counts(dropna=True)
Exclude NA Values:
data[‘column_name’].value_counts(dropna=False)
Example: Dealing with missing values
Python3
import pandas as pd
import numpy
data = pd.DataFrame({
'name' : [ 'sravan' , 'ojsawi' , 'bobby' , 'rohith' , 'gnanesh' ,
'sravan' , 'sravan' , 'ojaswi' , numpy.nan],
'subjects' : [ 'java' , 'php' , 'java' , 'php' , 'java' , 'html/css' ,
'python' , 'R' , numpy.nan],
'marks' : [ 98 , 90 , 78 , 91 , 87 , 78 , 89 , 90 , numpy.nan],
'age' : [ 11 , 23 , 23 , 21 , 21 , 21 , 23 , 21 , numpy.nan]
})
print (data[ 'name' ].value_counts(dropna = False ))
print (data[ 'subjects' ].value_counts(dropna = False ))
print (data[ 'marks' ].value_counts(dropna = False ))
print (data[ 'age' ].value_counts(dropna = True ))
|
Output:
Count values with relative frequencies
We are going to add normalize parameter to get the relative frequencies of the repeated data. It is set to True.
Syntax:
data[‘column_name’].value_counts(normalize=True)
Example: Count values with relative frequencies
Python3
import pandas as pd
data = pd.DataFrame({
'name' : [ 'sravan' , 'ojsawi' , 'bobby' , 'rohith' ,
'gnanesh' , 'sravan' , 'sravan' , 'ojaswi' ],
'subjects' : [ 'java' , 'php' , 'java' , 'php' , 'java' ,
'html/css' , 'python' , 'R' ],
'marks' : [ 98 , 90 , 78 , 91 , 87 , 78 , 89 , 90 ],
'age' : [ 11 , 23 , 23 , 21 , 21 , 21 , 23 , 21 ]
})
print (data[ 'name' ].value_counts(normalize = True ))
|
Output:
sravan 0.375
ojaswi 0.125
ojsawi 0.125
bobby 0.125
rohith 0.125
gnanesh 0.125
Name: name, dtype: float64
Get details
If we want to get the details like count, mean, std, min, 25%, 50%,75%, max, then we have to use describe() method.
Syntax:
data['column_name'].describe()
Example: Get details
Python3
import pandas as pd
data = pd.DataFrame({
'name' : [ 'sravan' , 'ojsawi' , 'bobby' , 'rohith' ,
'gnanesh' , 'sravan' , 'sravan' , 'ojaswi' ],
'subjects' : [ 'java' , 'php' , 'java' , 'php' , 'java' ,
'html/css' , 'python' , 'R' ],
'marks' : [ 98 , 90 , 78 , 91 , 87 , 78 , 89 , 90 ],
'age' : [ 11 , 23 , 23 , 21 , 21 , 21 , 23 , 21 ]
})
print (data[ 'age' ].describe())
|
Output:
count 8.000000
mean 20.500000
std 3.964125
min 11.000000
25% 21.000000
50% 21.000000
75% 23.000000
max 23.000000
Name: age, dtype: float64
Using size() with groupby()
Here this will return the count of all occurrences in a particular column.
Syntax:
data.groupby('column_name').size()
Example: Count of all occurrences in a particular column
Python3
import pandas as pd
data = pd.DataFrame({
'name' : [ 'sravan' , 'ojsawi' , 'bobby' , 'rohith' ,
'gnanesh' , 'sravan' , 'sravan' , 'ojaswi' ],
'subjects' : [ 'java' , 'php' , 'java' , 'php' , 'java' ,
'html/css' , 'python' , 'R' ],
'marks' : [ 98 , 90 , 78 , 91 , 87 , 78 , 89 , 90 ],
'age' : [ 11 , 23 , 23 , 21 , 21 , 21 , 23 , 21 ]
})
print (data.groupby( 'name' ).size())
|
Output:
name
bobby 1
gnanesh 1
ojaswi 1
ojsawi 1
rohith 1
sravan 3
dtype: int64
Using count() with groupby()
Here this will return the count of all occurrences in a particular column across all columns.
Syntax:
data.groupby('column_name').count()
Example: Count of all occurrences in a particular column
Python3
import pandas as pd
data = pd.DataFrame({
'name' : [ 'sravan' , 'ojsawi' , 'bobby' , 'rohith' ,
'gnanesh' , 'sravan' , 'sravan' , 'ojaswi' ],
'subjects' : [ 'java' , 'php' , 'java' , 'php' , 'java' ,
'html/css' , 'python' , 'R' ],
'marks' : [ 98 , 90 , 78 , 91 , 87 , 78 , 89 , 90 ],
'age' : [ 11 , 23 , 23 , 21 , 21 , 21 , 23 , 21 ]
})
print (data.groupby( 'name' ).count())
|
Output:
Using bins
If we want to get the count in a particular range of values, then the bins parameter is applied. We can specify the number of ranges(bins).
Syntax:
(data['column_name'].value_counts(bins)
where,
- data is the input dataframe
- column_name is the column to get bins
- bins is the total number of bins to be specified
Example: Get count in particular range of values
Python3
import pandas as pd
data = pd.DataFrame({
'name' : [ 'sravan' , 'ojsawi' , 'bobby' , 'rohith' ,
'gnanesh' , 'sravan' , 'sravan' , 'ojaswi' ],
'subjects' : [ 'java' , 'php' , 'java' , 'php' , 'java' ,
'html/css' , 'python' , 'R' ],
'marks' : [ 98 , 90 , 78 , 91 , 87 , 78 , 89 , 90 ],
'age' : [ 11 , 23 , 23 , 21 , 21 , 21 , 23 , 21 ]
})
print (data[ 'age' ].value_counts(bins = 6 ))
print (data[ 'age' ].value_counts(bins = 4 ))
|
Output:
(19.0, 21.0] 4
(21.0, 23.0] 3
(10.987, 13.0] 1
(17.0, 19.0] 0
(15.0, 17.0] 0
(13.0, 15.0] 0
Name: age, dtype: int64
(20.0, 23.0] 7
(10.987, 14.0] 1
(17.0, 20.0] 0
(14.0, 17.0] 0
Name: age, dtype: int64
Using apply()
If we want to get a count of all columns across all columns, then we have to use apply() function. In that we will use value_counts() method.
Syntax:
data.apply(pd.value_counts)
Example: Get count of all columns across all columns
Python3
import pandas as pd
data = pd.DataFrame({
'name' : [ 'sravan' , 'bobby' , 'sravan' , 'sravan' , 'ojaswi' ],
'subjects' : [ 'java' , 'php' , 'java' , 'html/css' , 'python' ],
'marks' : [ 98 , 90 , 78 , 91 , 87 ],
'age' : [ 11 , 23 , 23 , 21 , 21 ]
})
data. apply (pd.value_counts)
|
Output:
Like Article
Suggest improvement
Share your thoughts in the comments
Please Login to comment...