Open In App


Last Updated : 05 Sep, 2020
Like Article

SEMMA is the sequential methods to build machine learning models incorporated in ‘SAS Enterprise Miner’, a product by SAS Institute Inc., one of the largest producers of commercial statistical and business intelligence software. However, the sequential steps guide the development of a machine learning system. Let’s look at the five sequential steps to understand it better.

SEMMA model in Machine Learning

Sample: This step is all about selecting the subset of the right volume dataset from a large dataset provided for building the model. It will help us to build the model very efficiently. Basically in this step, we identify the independent variables(outcome) and dependent variables(factors). The selected subset of data should be actually a representation of the entire dataset originally collected, which means it should contain sufficient information to retrieve. The data is also divided into training and validation purpose.

Explore: In this phase, activities are carried out to understand the data gaps and relationship with each other. Two key activities are univariate and multivariate analysis. In univariate analysis, each variable looks individually to understand its distribution, whereas in multivariate analysis the relationship between each variable is explored. Data visualization is heavily used to help understand the data better. In this step, we do analysis with all the factors which influence our outcome.

Modify: In this phase, variables are cleaned where required. New derived features are created by applying business logic to existing features based on the requirement. Variables are transformed if necessary. The outcome of this phase is a clean dataset that can be passed to the machine learning algorithm to build the model. In this step, we check whether the data is completely transformed or not. If we need the transformation of data we use the label encoder or label binarizer.

Model: In this phase, various modelling or data mining techniques are applied to the pre-processed data to benchmark their performance against desired outcomes. In this step, we perform all the mathematical which makes our outcome more precise and accurate as well.

Assess: This is the last phase. Here model performance is evaluated against the test data (not used in model training) to ensure reliability and business usefulness. Finally, in this step, we perform the evaluation and interpretation of data. We compare our model outcome with the actual outcome and analysis of our model limitation and also try to overcome that limitation.

Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads