Related Articles

# MakeMyTrip Interview for Data Engineer

• Difficulty Level : Easy
• Last Updated : 18 Dec, 2019

Q.1. What is Regression ? What is Classification ?
Ans. Regression : Target variable continuous.
Classification : Target variable is discrete

Q.2. What are the error metrics used for both of them ?
Ans. Regression : SSE
Classification : Confusion metrics

Q.3. Why Accuracy does not help when there is a class imbalance ?
Ans. Because accuracy gets drawn by the majority class.

Q.4. How to handle class imbalance ?
Ans. 1. Use performance metrics as Area Under ROC Curve
2. Penalize Algorithms
3. Use Tree-Based Algorithms like RF, Gradient Boosted Trees

Q.5. What is knn ?
Q.6. What is k-means ?
Explanation here : https://www.quora.com/How-is-the-k-nearest-neighbor-algorithm-different-from-k-means-clustering

Q.7. y=ax+b is a linear model. Can you tell me if y=ax^2 + bx + c is also a linear model ?
Ans. y=ax^2 + bx + c is also linear as x^2 can also be represented as X.
So, the actual relationship might not be linear but the model fitted is linear

Q.8. What is SSE and RMSE ? Why to use RMSE and not SSE ?
Ans. RMSE has mean value but SSE is total value.

Q.9. Does a low RMSE denote overfitting ?

Q.10. How to resolve overfitting ?
Ans. 1. Cross-validation
2. Regularization
3. Ensembling

Q.11. Why knn is not a model ?
Ans.11. It is a lazy model.

Q.12. Write code in Python for the following problems :
(a)
bookings table :
id, date, platform
1, 12/3, android
2, 12/3, ios
3, 13/3, android
4, 13/3, ios
5, 13/3, android
6, 14/3, ios
7, 14/3, android
For each date, how many bookings are from android and how many from ios ?
df1.groupby([‘date’, ‘platform’]).count()

(b)
data = [‘cat’, ‘bat’, ‘rat’, ‘cat’, ‘rat’]
Give the count of each unique element of the list