The people who work in Data Science and are busy finding the answers for different questions every day comes across the Data Science Methodology. Data Science Methodology indicates the routine for finding solutions to a specific problem. This is a cyclic process that undergoes a critic behaviour guiding business analysts and data scientists to act accordingly.
- Business Understanding:
Before solving any problem in the Business domain it needs to be understood properly. Business understanding forms a concrete base, which further leads to easy resolution of queries. We should have the clarity of what is the exact problem we are going to solve.
- Analytic Understanding:
Based on the above business understanding one should decide the analytical approach to follow. The approaches can be of 4 types: Descriptive approach (current status and information provided), Diagnostic approach(a.k.a statistical analysis, what is happening and why it is happening), Predictive approach(it forecasts on the trends or future events probability) and Prescriptive approach( how the problem should be solved actually).
- Data Requirements:
The above chosen analytical method indicates the necessary data content, formats and sources to be gathered. During the process of data requirements, one should find the answers for questions like ‘what’, ‘where’, ‘when’, ‘why’, ‘how’ & ‘who’.
- Data Collection:
Data collected can be obtained in any random format. So, according to the approach chosen and the output to be obtained, the data collected should be validated. Thus, if required one can gather more data or discard the irrelevant data.
- Data Understanding:
Data understanding answers the question “Is the data collected representative of the problem to be solved?”. Descriptive statistics calculates the measures applied over data to access the content and quality of matter. This step may lead to reverting the back to the previous step for correction.
- Data Preparation:
Let’s understand this by connecting this concept with two analogies. One is to wash freshly picked vegetables and second is only taking the wanted items to eat in the plate during the buffet. Washing of vegetables indicates the removal of dirt i.e. unwanted materials from the data. Here noise removal is done. Taking only eatable items in the plate is, if we don’t need specific data then we should not consider it for further process. This whole process includes transformation, normalization etc.
Modelling decides whether the data prepared for processing is appropriate or requires more finishing and seasoning. This phase focusses on the building of predictive/descriptive models.
Model evaluation is done during model development. It checks for the quality of the model to be assessed and also if it meets the business requirements. It undergoes diagnostic measure phase (the model works as intended and where are modifications required) and statistical significance testing phase (ensures about proper data handling and interpretation).
As the model is effectively evaluated it is made ready for deployment in the business market. Deployment phase checks how much the model can withstand in the external environment and perform superiorly as compared to others.
Feedback is the necessary purpose which helps in refining the model and accessing its performance and impact. Steps involved in feedback define the review process, track the record, measure effectiveness and review with refining.
After successful abatement of these 10 steps, the model should not be left untreated, rather based on the feedbacks and deployment appropriate update should be made. As new technologies emerge, new trends should be reviewed so that the model continually provides value to solutions.
- What is Data Science?
- Overview of Data Science
- Introduction to Data Science
- Python for Data Science
- Data Science Process
- Structure of Data Science Project
- How to Get Masters in Data Science in 2020?
- Top Data Science Trends You Must Know in 2020
- Python IDEs For Data Science
- Machine Learning and Data Science
- Data Science - Solving Linear Equations
- Data Science | Solving Linear Equations
- Top 10 Data Science Skills to Learn in 2020
- Top 10 Python Libraries for Data Science in 2020
- Difference between Data Science and Machine Learning
- Effect of Google Quantum Supremacy on Data Science
- Computer Science Class XII (2016-17)
- CBSE Class 12 | Computer Science - Python Syllabus
- CBSE Class 11 | Computer Science - Python Syllabus
- A Practical approach to Simple Linear Regression using R
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.