Robert Bosch Data Scientist interview 2019
Q.1. What is Decision Tree ? How to split ? How does decision tree work ?
Q.2. What does each node contain in a Decision Tree ?
Q.3. What is Entropy and Genie Index and how does it help ?
Q.4. What is Random Forest ? What is Random in Random Forest ? How to calculate OOB Error ?
Q.5. How does random forest work ?
Q.6. Explain the entire process from the point you get the data till you reach the final stage of prediction.
Q.7. How does knn work ? Which distance algorithm to use in knn when data is categorical ?
Q.8. You have 10 documents. Each topic has been tagged with a topic. Once a new document comes, how to tag it to one of those topics ?
Primary focus : Candidate should be good in coding and he should also have sound knowledge on ML algorithms.
Face to Face :-
Coding round in R
1. Create a data frame of this form
01/01/2019 12:00 xx
01/31/2019 11:59 .
Value can be randomly generated
2. Transpose the data frame into this form
Date Hour1 Hour2 Hour3 . . . Value
01/01/2019 12:00 13:00 14:00 . . . xx
02/01/2019 12:00 13:00 14:00 . . . xx
. . . . . .
. . . . . .
. . . . . .
31/01/2019 12:00 13:00 14:00 . . . xx
Q.1. If I want to find a relationship between Price and Sales should I use regression or correlation ?
Answer : Simple linear regression can be used to understand the relationship between
the dependent variable (Sales) and independent variable (Price).
Assumption = No other parameters are present.
Correlation coefficient or Standardized covariance (-1 < r < 1) will tell us :
1. If there is positive or negative correlation.
2. It gives strength and relationship between 2 variables.
Q.2. If I have multiple features in my dataset, how do I know which ones to include for my model building ?
Answer. Check coefficient of determination i.e. R squared. It is the percentage of variation in the y variable that is explainable by x variable.
If r squared is 0 that means you can't predict y from x.
If r squared is 1 that means you can predict y from x without any errors.
I had answered dimensionality reduction technique like Principal Component Analysis.
Q.3. Questions on SSE, RMSE, MAPE.
Q.4. More questions on end to end process of data analysis.
Q.5. I was asked few problems on practical scenarios :
a) If I want to improve the traffic conditions what are the data I would ask for.
b) Which algorithm to use when kind of questions.