Q.1. What is Decision Tree ? How to split ? How does decision tree work ?
Q.2. What does each node contain in a Decision Tree ?
Q.3. What is Entropy and Genie Index and how does it help ?
Q.4. What is Random Forest ? What is Random in Random Forest ? How to calculate OOB Error ?
Q.5. How does random forest work ?
Q.6. Explain the entire process from the point you get the data till you reach the final stage of prediction.
Q.7. How does knn work ? Which distance algorithm to use in knn when data is categorical ?
Q.8. You have 10 documents. Each topic has been tagged with a topic. Once a new document comes, how to tag it to one of those topics ?
Primary focus : Candidate should be good in coding and he should also have sound knowledge on ML algorithms.
Face to Face :-
Coding round in R
1. Create a data frame of this form
01/01/2019 12:00 xx
01/31/2019 11:59 .
Value can be randomly generated
2. Transpose the data frame into this form
Date Hour1 Hour2 Hour3 . . . Value
01/01/2019 12:00 13:00 14:00 . . . xx
02/01/2019 12:00 13:00 14:00 . . . xx
. . . . . .
. . . . . .
. . . . . .
31/01/2019 12:00 13:00 14:00 . . . xx
Q.1. If I want to find a relationship between Price and Sales should I use regression or correlation ?
Answer : Simple linear regression can be used to understand the relationship between
the dependent variable (Sales) and independent variable (Price).
Assumption = No other parameters are present.
Correlation coefficient or Standardized covariance (-1 < r < 1) will tell us :
1. If there is positive or negative correlation.
2. It gives strength and relationship between 2 variables.
Q.2. If I have multiple features in my dataset, how do I know which ones to include for my model building ?
Answer. Check coefficient of determination i.e. R squared. It is the percentage of variation in the y variable that is explainable by x variable.
If r squared is 0 that means you can't predict y from x.
If r squared is 1 that means you can predict y from x without any errors.
I had answered dimensionality reduction technique like Principal Component Analysis.
Q.3. Questions on SSE, RMSE, MAPE.
Q.4. More questions on end to end process of data analysis.
Q.5. I was asked few problems on practical scenarios :
a) If I want to improve the traffic conditions what are the data I would ask for.
b) Which algorithm to use when kind of questions.
Attention reader! Don’t stop learning now. Get hold of all the important DSA concepts with the DSA Self Paced Course at a student-friendly price and become industry ready. To complete your preparation from learning a language to DS Algo and many more, please refer Complete Interview Preparation Course. In case you are prepared, test your skills using TCS, Wipro, Amazon and Microsoft Test Serieses.