ZS Associates Interview Experience for Data Science Associate
ZS Associates is one of the best consulting firms which also has a great team of Data Scientists. The main clientele of ZS is Pharmaceutical companies, but it consists of other domains as well. Consultant Roles at ZS follow below hierarchy:
- Associate Consultant
Working with ZS was a goal for me when in college. I had applied for the DAA(Decision Analytics) role, but couldn’t crack it then. ZS hires freshers and entry-level experienced candidates for the Data Science Associate(DSA) role. When I started into ML and Data Science, I had made a list of target companies that I’d want to be a part of. ZS was one of them considering its work ethics and how it treats its employees.
Application Process: I had applied for the DSA role in March 2020 initially. I didn’t have anyone to ask for a referral, so I had directly applied on their website. The first step in the application is the shortlisting of your resume. So make sure the resume is on point for the job you’re applying for. I’ll be soon writing a blog on resume writing as well. Back to the point, I received the invitation to submit a Machine Learning test on Hackerearth after getting my resume shortlisted.
I submitted the test, but soon after the lockdown happened and ZS froze its hiring! That was a bummer for me but I kept expanding my skills either way.
Fast-forward to October 2020, ZS started hiring again after a gap of 6 months. I applied again and got the call to submit the test again. The process consisted of 3 rounds and all were elimination rounds.
Round 1(Machine Learning Challenge): The first round was to solve a Machine Learning problem and submit the predictions CSV along with the source code. The ML problem I had got was a text classification one which had Job Descriptions from a Pharmaceutical company’s job portal. My task was to make a Machine Learning model that takes in the job description text and predict 2 targets: Job Type and Job Category. The submission was to be a CSV file containing the predictions on test data.
Problem and Approach: Job Type consisted of 6 classes and Job Category consisted of 11 classes. In essence, this was a Multi-Class Classification problem and of 2 targets. I did the below steps to solve the problem and submit the solution and predictions:
- Data Understanding
- Text preprocessing
- Word Vectorization & creating embeddings using Word2Vec
- Hyperparameter Tuning
- Getting predictions on test data
- Saving and submitting the source code and CSV
I was given almost 2.5 days to complete this challenge on HackerEarth. I submitted the solution and crossed my fingers.
Round 2(Case Debrief): I received a call from HR regarding the second round 4-5 days after submitting the round 1 problem. This round was a technical discussion of the ML challenge and my solution. I had to make a PPT describing the steps I did, the results I got, and the source code.
The interview was scheduled on Zoom for 1 hour and required me to share my screen and present my solution to them. The interview started and I described the entire solution to them step by step through my presentation.
Discussion: After seeing my presentation from start to end, I was asked almost no questions. And I thought this is easy! But it was not. Once I finished explaining the solution, they started asking me questions about my approach from the first step. Some of them were:
- Why didn’t you do more EDA? What else could you have done?
- You ignored word importances. Any way by which you could have analyzed the word importance and add another feature for it?
- The first target had a class imbalance. How did you deal with that?
- What all feature engineering did you do?
- You removed all numeric values from the text. Couldn’t it be a case that job descriptions having numeric values such as dates mean that it is an internship or something like that?
- Did you think of a better way than to drop all non-alphabetic values?
- You used stemming. Why didn’t you use lemmatization? When is lemmatization helpful over stemming and vice-versa?
- You used word embeddings using Word2Vec. Was it a pre-trained one or you trained it on this data? Don’t you think a pre-trained embedding would have done better?
- Did you use other techniques like Bag of words, TF-IDF, N-grams, etc? How were they performing?
- You averaged out the word vectors of all words in a sentence to form the complete feature vector. Don’t you think this would have resulted in a loss of information? What better way could you have used here?
- What all models did you try running and what was the result?
- Did you use Deep Learning too? Don’t you think Deep Learning could have fetched better results?
- How do you think the metrics could have been improved? What are the reasons for a low score on target- Job Category?
After answering the majority of the above questions, I was exhausted! I thought I could have done much better with my solution and tried more techniques to get better results. Nevertheless, I crossed my fingers again and hoped to get a call for the 3rd round.
Round 3(Technical+Fit round): The 3rd and last round was a technical round which I thought would be on my resume, projects and skills. However, the interviewer had different plans.
Technical Interview: The interviewer asked me about my strong areas in Machine Learning, and I answered that it is NLP. He started the round by giving me a scenario that we have a client who has text data of emails which are customer feedback on their products. He started with the complete Data Science lifecycle and went step by step till the end.
- The data we have is not labelled and we want to classify it into different classes regarding the department it belongs to and the product it is for. How will you proceed?
- Once you have labelled it, how will you clean it and pre-process it?
- Once the data is clean, what next step will you take to analyze it and add features?
- How will you vectorize it? What are all the ways you know of?
- Do you think there could be a bias in the data? How to tackle that?
- What will be the next step? How will you make sure you have the best model?
- How can you reduce the training time?
- Once the model is finalized, how will the deployment be done, and what all to be taken care of?
- What and how will the retraining be done?
- What are the types of clustering?
- How does a linear model make assumptions? What will be its drawback?
- How do ensembles work?
- Any cloud platform you’ve worked on? What all elements do Azure consists of?
- How can you decide whether a feature is important or not during regression analysis?
He asked a few more questions on statistics and Machine Learning concepts. He also went a little into Deep Learning and CNNs. Overall, I answered most of the questions he put towards me.
I asked him a few questions on the work there, the clients, and the different domains involved to which he answered graciously.
I was pretty confident about my application now after the last round went pretty well. Fingers crossed, again.
Confirmation & Offer Letter: The moment that I had been waiting for finally arrived when I received the confirmation call from HR, and I was delighted! The hard work and persistence of months finally brought colors. Soon in a few days, I received the offer letter and happily accepted it. The process couldn’t have been any smoother and perfect. Hard work does pay off!
I received tons of requests from the community regarding my interview experience and I thought that this blog would do justice to all of them. I would be soon writing blogs on resume preparation and job search strategy as well.