Robert Bosch Data Scientist interview 2019
Q.1. What is Decision Tree ? How to split ? How does decision tree work ?
Q.2. What does each node contain in a Decision Tree ?
Q.3. What is Entropy and Genie Index and how does it help ?
Q.4. What is Random Forest ? What is Random in Random Forest ? How to calculate OOB Error ?
Q.5. How does random forest work ?
Q.6. Explain the entire process from the point you get the data till you reach the final stage of prediction.
Q.7. How does knn work ? Which distance algorithm to use in knn when data is categorical ?
Q.8. You have 10 documents. Each topic has been tagged with a topic. Once a new document comes, how to tag it to one of those topics ?
Primary focus : Candidate should be good in coding and he should also have sound knowledge on ML algorithms.
Hey geek! It's time to become a success story instead of reading them. Check out our most renowned DSA Self Paced Course, now at a student-friendly price and become industry ready. And if you are looking for a more complete interview preparation resource, check out Complete Interview Preparation Course that will prepare you for the SDE role of your dreams!
Feeling prepared enough for your interview? Test your skills with our Test Series that will help you prepare for top companies like Amazon, Microsoft, TCS, Wipro, Google and many more!
Face to Face :-
Coding round in R
1. Create a data frame of this form
01/01/2019 12:00 xx
01/31/2019 11:59 .
Value can be randomly generated
2. Transpose the data frame into this form
Date Hour1 Hour2 Hour3 . . . Value
01/01/2019 12:00 13:00 14:00 . . . xx
02/01/2019 12:00 13:00 14:00 . . . xx
. . . . . .
. . . . . .
. . . . . .
31/01/2019 12:00 13:00 14:00 . . . xx
Q.1. If I want to find a relationship between Price and Sales should I use regression or correlation ?
Answer : Simple linear regression can be used to understand the relationship between
the dependent variable (Sales) and independent variable (Price).
Assumption = No other parameters are present.
Correlation coefficient or Standardized covariance (-1 < r < 1) will tell us :
1. If there is positive or negative correlation.
2. It gives strength and relationship between 2 variables.
Q.2. If I have multiple features in my dataset, how do I know which ones to include for my model building ?
Answer. Check coefficient of determination i.e. R squared. It is the percentage of variation in the y variable that is explainable by x variable.
If r squared is 0 that means you can't predict y from x.
If r squared is 1 that means you can predict y from x without any errors.
I had answered dimensionality reduction technique like Principal Component Analysis.
Q.3. Questions on SSE, RMSE, MAPE.
Q.4. More questions on end to end process of data analysis.
Q.5. I was asked few problems on practical scenarios :
a) If I want to improve the traffic conditions what are the data I would ask for.
b) Which algorithm to use when kind of questions.