# Placement prediction using Logistic Regression

• Difficulty Level : Medium
• Last Updated : 08 Sep, 2021

In this article, we are going to discuss how to predict the placement status of a student based on various student attributes using Logistic regression algorithm.

Placements hold great importance for students and educational institutions. It helps a student to build a strong foundation for the professional career ahead as well as a good placement record gives a competitive edge to a college/university in the education market.

This study focuses on a system that predicts if a student would be placed or not based on the studentâ€™s qualifications, historical data, and experience. This predictor uses a machine-learning algorithm to give the result.

The algorithm used is logistic regression. Logistic regression is basically a supervised classification algorithm. In a classification problem, the target variable(or output), y, can take only discrete values for given set of features(or inputs), X. Talking about the dataset, it contains the secondary school percentage, higher secondary school percentage, degree percentage, degree, and work experience of students. After predicting the result its efficiency is also calculated based on the dataset. The dataset used here is in .csv format.

### Below is the step-by-step Approach:

Step 1: Import the required modules.

## Python

 `# import modules``import` `pandas as pd``import` `numpy as np``import` `matplotlib.pyplot as plt`

Step 2: Now to read the dataset that we are going to use for the analysis and then checking the dataset.

## Python

 `# reading the file``dataset ``=` `pd.read_csv(``'Placement_Data_Full_Class.csv'``)``dataset`

Output:

Step 3: Now we will drop the columns that are not needed.

## Python

 `# dropping the serial no and salary col``dataset ``=` `dataset.drop(``'sl_no'``, axis``=``1``)``dataset ``=` `dataset.drop(``'salary'``, axis``=``1``)`

Step 4: Now before moving forward we need to pre-process and transform our data. For that, we will use astype() method on some columns and change the datatype to category.

## Python

 `# catgorising col for further labelling``dataset[``"gender"``] ``=` `dataset[``"gender"``].astype(``'category'``)``dataset[``"ssc_b"``] ``=` `dataset[``"ssc_b"``].astype(``'category'``)``dataset[``"hsc_b"``] ``=` `dataset[``"hsc_b"``].astype(``'category'``)``dataset[``"degree_t"``] ``=` `dataset[``"degree_t"``].astype(``'category'``)``dataset[``"workex"``] ``=` `dataset[``"workex"``].astype(``'category'``)``dataset[``"specialisation"``] ``=` `dataset[``"specialisation"``].astype(``'category'``)``dataset[``"status"``] ``=` `dataset[``"status"``].astype(``'category'``)``dataset[``"hsc_s"``] ``=` `dataset[``"hsc_s"``].astype(``'category'``)``dataset.dtypes`

Output:

Step 5: Now we will apply codes on some of these columns to convert their text values to numerical values.

## Python

 `# labelling the columns``dataset[``"gender"``] ``=` `dataset[``"gender"``].cat.codes``dataset[``"ssc_b"``] ``=` `dataset[``"ssc_b"``].cat.codes``dataset[``"hsc_b"``] ``=` `dataset[``"hsc_b"``].cat.codes``dataset[``"degree_t"``] ``=` `dataset[``"degree_t"``].cat.codes``dataset[``"workex"``] ``=` `dataset[``"workex"``].cat.codes``dataset[``"specialisation"``] ``=` `dataset[``"specialisation"``].cat.codes``dataset[``"status"``] ``=` `dataset[``"status"``].cat.codes``dataset[``"hsc_s"``] ``=` `dataset[``"hsc_s"``].cat.codes` `# display dataset``dataset`

Output:

Step 6: Now to split the dataset into features and values using iloc() function:

## Python

 `# selecting the features and labels``X ``=` `dataset.iloc[:, :``-``1``].values``Y ``=` `dataset.iloc[:, ``-``1``].values` `# display dependent variables``Y`

Output:

Step 7: Now we will split the dataset into train and test data which will be used to check the efficiency later.

## Python

 `# dividing the data into train and test``from` `sklearn.model_selection ``import` `train_test_split``X_train, X_test, Y_train, Y_test ``=` `train_test_split(X, Y,``                                                    ``test_size``=``0.2``)` `# display dataset``dataset.head()`

Output:

Step 8: Now we need to train our model for which we will need to import a file, and then we will create a classifier using sklearn module. Then we will check the accuracy of the model.

## Python

 `# creating a classifier using sklearn``from` `sklearn.linear_model ``import` `LogisticRegression` `clf ``=` `LogisticRegression(random_state``=``0``, solver``=``'lbfgs'``,``                         ``max_iter``=``1000``).fit(X_train,``                                            ``Y_train)``# printing the acc``clf.score(X_test, Y_test)`

Output:

Step 9: Once we have trained the model, we will check it giving some random values:

## Python

 `# predicting for random value``clf.predict([[``0``, ``87``, ``0``, ``95``, ``0``, ``2``, ``78``, ``2``, ``0``, ``0``, ``1``, ``0``]])`

Output:

Step 10: To gain a more nuanced understanding of our modelâ€™s performance we need to make a confusion matrix. A confusion matrix is a table with two rows and two columns that reports the number of false positives, false negatives, true positives, and true negatives.

To get the confusion matrix it takes in two arguments: The actual labels of your test set y_test and predicted labels. The predicted labels of the classifier are stored in y_pred as follows:

## Python

 `# creating a Y_pred for test data``Y_pred ``=` `clf.predict(X_test)` `# display predicted values``Y_pred`

Output:

Step 11: Finally, we have y_pred, so we can generate the confusion matrix:

## Python

 `# evaluation of the classifier``from` `sklearn.metrics ``import` `confusion_matrix, accuracy_score` `# display confusion matrix``print``(confusion_matrix(Y_test, Y_pred))` `# display accuracy``print``(accuracy_score(Y_test, Y_pred))`

Output:

My Personal Notes arrow_drop_up