Let us see how to extract the dominant colors of an image using Python. Clustering is used in much real-world application, one such real-world example of clustering is extracting dominant colors from an image.
Any image consists of pixels, each pixel represents a dot in an image. A pixel contains three values and each value ranges between 0 to 255, representing the amount of red, green and blue components. The combination of these forms an actual color of the pixel. To find the dominant colors, the concept of the k-means clustering is used. One important use of k-means clustering is to segment satellite images to identify surface features.
Below shown satellite image contains the terrain of a river valley.
Various colors typically belong to different features, k-means clustering can be used to cluster them into groups which can then be identified into various surfaces like water, vegetation etc as shown below.
Tools to find dominant colors
- matplotlib.image.imread – It converts JPEG image into a matrix which contains RGB values of each pixel.
- matplotlib.pyplot.imshow – This method would display colors of the cluster centers after k-means clustering performed on RGB values.
Lets now dive into an example, performing k-means clustering on the following image:
As it can be seen that there are three dominant colors in this image, a shade of blue, a shade of red and black.
Step 1 : The first step in the process is to convert the image to pixels using imread method of image class.
(187, 295, 4)
The output is M*N*3 matrix where M and N are the dimensions of the image.
Step 2 : In this analysis, we are going to collectively look at all pixels regardless of there positions. So in this step, all the RGB values are extracted and stored in their corresponding lists. Once the lists are created, they are stored into the Pandas DataFrame, and then scale the DataFrame to get standardized values.
55165 55165 55165
Step 3 : Now, to find the number of clusters in k-means using the elbow plot approach . This is not an absolute method to find the number of clusters but helps in giving an indication about the clusters.
Elbow plot: a line plot between cluster centers and distortion (the sum of the squared differences between the observations and the corresponding centroid).
Below is the code to generate the elbow plot:
Elbow plot is plotted as shown below :
It can be seen that a proper elbow is formed at 3 on the x-axis, which means the number of clusters is equal to 3 (there are three dominant colors in the given image).
Step 4 : The cluster centers obtained are standardized RGB values.
Standardized value = Actual value / Standard Deviation
Dominant colors are displayed using
imshow() method, which takes RGB values scaled to the range of 0 to 1. To do so, you need to multiply the standardized values of the cluster centers with there corresponding standard deviations. Since the actual RGB values take the maximum range of 255, the multiplied result is divided by 255 to get scaled values in the range 0-1.
Here is the resultant plot showing the three dominant colors of the given image.
Notice the three colors resemble the three that are indicative from visual inspection of the image.
Below is the full code without the comments :
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course