Skip to content
Related Articles

Related Articles

Difference between Data Cleaning and Data Processing

View Discussion
Improve Article
Save Article
  • Last Updated : 10 Jul, 2021

Data Processing: It is defined as Collection, manipulation, and processing of collected data for the required use. It is a task of converting data from a given form to a much more usable and desired form i.e. making it more meaningful and informative. Using Machine Learning algorithms, mathematical modelling and statistical knowledge, this entire process can be automated. This might seem to be simple but when it comes to really big organizations like Twitter, Facebook, Administrative bodies like Parliament, UNESCO and health sector organisations, this entire process needs to be performed in a very structured manner. So, the steps to perform are as follows:

Data Cleaning: Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. It is one of the important parts of machine learning. It plays a significant part in building a model. Data Cleaning is one of those things that everyone does but no one really talks about. It surely isn’t the fanciest part of machine learning and at the same time, there aren’t any hidden tricks or secrets to uncover. However, proper data cleaning can make or break your project. Steps involved in Data Cleaning –


Data Processing Vs Data Cleaning

Sr. no.

 Data Processing

Data Cleaning

1Data Processing is done after data cleaningData Cleaning is done before data Processing 
2Data Processing requires necessary storage hardware like Ram, Graphical Processing units etc for processing the dataData Cleaning doesn’t require hardware tools.                                                                                                                      
3Data Processing Frameworks like Hadoop, Pig Frameworks etcData Cleaning involves Removing Noisy data etc. No special Frameworks are used.
4Data Processing is difficult when compared to data cleaning.Data Cleaning is easier than data Processing.


  • Loading Student data in Hadoop Cluster(data storage) and retrieving (processing)the marks less than 60 percent.
  • Percentage calculation.


  • Finding the fraud data like age of the student is greater than the range and Percentage is not more than 100.
  • Check whether the marks is not inserted or not. If not, we can verify and place the correct data in place of missed data.
My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!