Machine Learning with Microsoft Azure ML Studio Without Code
Are you a complete beginner to the broad spectrum of Machine Learning? Are you torn between R, Python, GNU Octave, and all the other computer programming languages and frameworks available for Machine Learning? Do you just not ‘get’ coding?
Don’t worry, you are in the right spot! Machine Learning can be a tough nut to crack, especially if one doesn’t have prior programming experience. For instance, ML aficionados who use Python are familiar with the basic data types, function definitions and calls, popular modules and libraries like NumPy and Pandas, and fundamental operations on cleaning and visualizing data, to name a few of the many pre-requisites. Machine Learning without programming appears to be a far-fetched dream. However, Microsoft Azure’s ML studio transmogrified this widely-coveted desire into concrete reality. This article seeks to cognize ML neophytes about Azure ML Studio and provides a short tutorial on building, training, and testing a basic ML model using Azure’s ML Studio.
Microsoft Azure’s ML Studio is a Graphical User Interface that leverages a user-friendly drag-and-drop UI to build, train and deploy resilient machine learning models at scale. It is a no-code interface that depicts a dynamic pipeline through smaller visual workflows. ML Studio streamlines the entire process from preprocessing to validation and visualization. It automates the project and reduces the demand for an intrinsic infrastructure by provisioning a robust foundation. ML Studio drastically reduces the complexity associated with ML workflows and its inherent simplicity renders it perfect for ML rookies.
We shall now attempt to build, train and test a simple machine learning model that predicts the approximate price of automobiles factored by their make, engine, built, etc on Azure ML Studio. Since price is a continuous-valued output, we shall be using a linear regression model. But first, we shall delve into the basics of regression.
REGRESSION AND LINEAR REGRESSION-AN EASY APPROACH
Regression can be defined as a statistical method of attempting to estimate the relationship between independent variables (X) and dependent variables (Y). Linear regression is a subset of regression analysis wherein the parameters(variables) have a linear correlation. Linear Regression models are used when we need to predict a continuous-valued, non-discrete numeric quantity like price, age, etc. It can be divided into 2 broad categories for our assessment. More variants are beyond the scope of this article.
- Simple Linear Regression– Let us assume that you have to predict the body weight of an individual based on only ONE criterion. Let that criterion be his/her height. So in this case, you are predicting the value for the label (dependent variable) based on only one feature (independent variable). The linear regression model leveraged in this case is a simple linear regression model. In this model, there is only one predictor variable(height) that is used to predict just one outcome (bodyweight). The equation can be represented in the slope-intercept format, i.e.
- Y= dependent variable,
- X= independent variable,
- m= slope/gradient of the line, and
- C= The y-intercept.
- Multiple Linear Regression– In this tutorial, we shall be discussing how to predict automobile prices based on numerous factors like make, body style, no. of cylinders, etc. In this example, multiple predictors were used to determine one output in the form of automobile prices. This is an example of a multiple linear regression model. In this model, there is more than one predictor/ independent variable but only one outcome/ dependent variable. The equation can be expanded as:
- Y= dependent variable
- m,n…..z= Coefficients
- x1,x2….xn= Independent variables
Now that your basic concepts about linear regression have been cleared, let us get on with the tutorial on Azure ML Studio
PREDICT AUTOMOBILE PRICES ON AZURE ML STUDIO
Creating a New Project:
Step 1: Go to this link and Sign up for a free account if you do not have a subscription. This free plan is perfect for beginners who are experimenting with the platform.
Step 2: After log-in is complete, open the ‘Projects’ tab located in the pane on the left. You can now see all the projects created by the user over the past. If you are a new user, this pane would remain empty. Hit the ‘New’ button on the bottom-left corner to create a new project.
Step 3: Select ‘Empty Project’ from the pane that appears. A dialog box would pop open, asking for a name and a description for your project. Give your project an appropriate name like ‘Automobile Price Prediction’ and add an optional description. Hit the ‘tick’ button on the bottom-right corner of the dialog box.
Step 4: Click on the ‘Add Assets’ link. Once you are directed to the ‘Change Project Configuration’ window, move the ‘datasets’ and ‘experiments’ from the ‘All Assets’ pane to the ‘Project Assets’ pane by clicking on the right arrowhead. Both these assets are sufficient for this project. Click on the ‘tick’ button on the bottom-right corner of the page.
Your new project is ready! Also, it is not necessary to create a new project to work with the experiments. You could even start working on a new experiment by following the steps given below without creating a new project at all. Although, they could simplify experiment organization by a tad bit.
Creating a New Experiment:
Follow the below steps to create a new experiment:
Step 1: Hit the ‘Experiments’ tab on the left pane. You can now see all the experiments done by the user in the past. If you are a new user, this ‘My experiments’ pane would remain empty. To create a new experiment, hit the ‘New’ button on the bottom-left corner of the page.
Step 2: Now, select the ‘Blank Experiment’ option. You can also opt for various pre-defined templates provided by ML Studio.
A new blank experiment has been created! The canvas of this blank experiment appears.
Now, you can simply look for items in the left pane, drag them to the canvas and drop them wherever they are required. To establish a proper workflow, the items need to be connected with the previous and forthcoming items. These relationships can be easily established by dragging the mouse pointer and releasing it upon reaching the designated item.
Now, rename your experiment from ‘Experiment created on to a valid name. Add an appropriate summary and description.
Now, we shall begin with the main process of building, training, and testing the models.
- ML Studio provides numerous sample datasets for beginners. On the ‘search experiment items’ pane on the left, hit the ‘Saved Datasets’ button and scrounge for the Automobile Price dataset under ‘Samples’. Once you find it, drag it and drop it at the center of the canvas. You can upload your own dataset also.
- Under ‘search experiment items’, look for ‘select columns in dataset’. This module enables the users to include or exclude certain columns in a dataset. Drag it and drop it below the dataset module. In the ‘Properties’ pane on the right, select the ‘Launch column selector. Now, we want to exclude the ‘Normalized losses’ column because it has numerous missing values. Select ‘exclude’ -> ‘column name’ -> ‘Normalized Losses’. Hit the ‘tick’ button on the bottom-right corner. The desired column shall be excluded from the model.
Now, link the Automobile price data module to the ‘Select columns in Dataset’ module by simply dragging an arrowhead from the first module and dropping it onto the 2nd module.
- From the ‘search experiment items’ pane, access the ‘clean missing data’ module. Drag it and drop it below the previous module. On the ‘properties’ pane on the right, change the cleaning mode to ‘remove entire row’. This means that if a particular row has missing values, the row would be deleted. You can also alternatively choose to substitute the missing values with various methods of central tendency depending on the ultimate goal. Link it to the previous module.
- Select the ‘select columns in dataset’ module from the left-most pane again and drop it onto the canvas. This time, we shall select the columns we want to include in the training model. So, hit the ‘launch column selector’ button on the ‘Properties’ pane and choose the columns of your choice.
Begin with No columns and hit the ‘tick’ button on the bottom-right corner. after selection. Link this module to the first port of the previous module.
- Now, we need to split the dataset into the training part and the testing part. So, drag the ‘Split the data’ module from the ‘search experiment items’ pane and drop it onto the canvas. In the ‘Properties’ pane, change the ‘fraction of rows in the first output dataset’ value to 0.7 and check the ‘randomized split’ box. This renders 70% of data available for training randomly and the remaining to be used for testing. Link it to the previous module.
- We can now select the Machine Learning algorithm for training. On the ‘search experiment items’ pane, look for ‘Linear Regression’ and drop it on the canvas. Also, drop the ‘Train Model’ module onto the canvas. Link the 1st port of the ‘split data’ module to the 2nd port of the ‘Train model’ module and the ‘Linear Regression’ module to the 1st port of the ‘Train Model’ module.
- Hit the ‘Launch Column Selector’ button on the right-most pane and choose the column you want to predict. Hit the ‘tick’ button on the bottom-right corner of the page.
- Drop the ‘Score Model’ module from the ‘search experiment items’ pane onto the canvas. Connect the ‘Train Model’ module to the 1st port of the ‘Score Model’ and the 2nd port of the ‘Split Data’ module to the 2nd port of the ‘Score Model’ module. ML Studio suggests the port one should connect the corresponding links with.
- Finally, drop the ‘Evaluate Model’ module from the ‘search experiment items’ pane. Link it with the previous module.
- Your set-up is complete! You just have to hit ‘run’ on the bar located at the bottom of the page.
Wait for the model to complete training and validation. Once these are complete, hit the hovering number that appears on the ‘Score Model’ module and select the ‘Visualize’ option.
The column ‘Scored Labels’ predicts the prices for the automobiles based on the features we had selected. You can compare the predicted prices with the actual prices and ascertain the level of accuracy of our model.
Adding the Experiment to the Project:
Now that your experiment has been completed, you can add it to the project folder you had created beforehand.
- Hit the ‘Project’ tab on the right-most pane and select ‘Add to Project’
- Select the ‘Automobile Price Prediction’ project folder that we had created earlier and hit the ‘tick’ button.
The experiment is now added to the project.