Think globally, and impact millions. That’s the driving force behind Tiger Analytics, a data-driven powerhouse leading the AI and analytics consulting world. Tiger Analytics tackles challenges that resonate across the globe, shaping the lives of millions through innovative data-driven solutions. More than just a company, Tiger Analytics fosters a culture of expertise and respect, where collaboration remains supreme. With headquarters in Silicon Valley and delivery centers scattered across the globe, including India’s bustling hubs of Chennai and Hyderabad, Tiger Analytics offers a dynamic environment catering to both in-person and remote teams.
To know more about Tiger Analytics Recruitment Process please go through this attached link
Table of Content
Cracking the Tiger Analytics data analyst interview is not an easy task, it requires careful planning and the correct tools. But don’t worry, aspiring data analysts! Sharpen your data storytelling abilities with strategic communication prompts, and impress with your knowledge of the company’s cutting-edge tools and projects. This article contains a treasure of important interview questions that have been frequently asked in data analyst interviews at Tiger Analytics and will turn you into a confident data analyst, so be ready to ace the interview and take your career to the next level!
Easy Level Questions.
Q1. How to swap two numbers without using a temporary variable?
The idea is to get a sum in one of the two given numbers. The numbers can then be swapped using the sum and subtraction from the sum.
Q2. Check if a number is Palindrome
Let the given number be num. A simple method for this problem is to first reverse digits of num, then compare the reverse of num with num. If both are the same, then return true, else false.
Q3.Find the Second largest element in an array
Using Sorting: The idea is to sort the array in descending order and then return the second element which is not equal to the largest element from the sorted array.
Q4. Reverse a Linked List
The idea is to use three pointers curr, prev, and next to keep track of nodes to update reverse links.
Q5. What is a Constructors?
Constructor in C++ is a special method that is invoked automatically at the time of object creation. It is used to initialize the data members of new objects generally.
Q6. Difference between graph and tree
Definition |
The graph is a non-linear data structure. |
The tree is a non-linear data structure. |
---|---|---|
Structure | It is a collection of vertices/nodes and edges. | It is a collection of nodes and edges. |
Structure cycle | A graph can be connected or disconnected, can have cycles or loops, and does not necessarily have a root node. | A tree is a type of graph that is connected, acyclic (meaning it has no cycles or loops), and has a single root node. |
Edges | Each node can have any number of edges. | If there is n nodes then there would be n-1 number of edges |
Types of Edges | They can be directed or undirected | They are always directed |
Root node | There is no unique node called root in graph. | There is a unique node called root(parent) node in trees. |
Loop Formation | A cycle can be formed. | There will not be any cycle. |
Traversal | For graph traversal, we use Breadth-First Search (BFS), and Depth-First Search (DFS). | We traverse a tree using in-order, pre-order, or post-order traversal methods. |
Q7. What is an Inheritance?
The capability of a class to derive properties and characteristics from another class is called Inheritance. Inheritance is one of the most important features of Object-Oriented Programming.
Q8. How do data analysts differ from data scientists?
Feature |
Data analyst |
Data Scientist |
---|---|---|
Skills | Excel, SQL, Python, R, Tableau, PowerBI | Machine Learning, Statistical Modeling, Docker, Software Engineering |
Tasks | Data Collection, Web Scrapping, Data Cleaning, Data Visualization, Explanatory Data Analysis, Reports Development and Presentations | Database Management, Predictive Analysis and prescriptive analysis, Machine Learning model building and Deployment, Task automation, Work for Business Improvements Process. |
Positions | Entry Label | Seniors Label |
Q9. What is Data Wrangling?
Data Wrangling is the process of gathering, collecting, and transforming Raw data into another format for better understanding, decision-making, accessing, and analysis in less time. Data Wrangling is also known as Data Munging.
Q10. What is a join in SQL? What are the types of joins?
An SQL Join statement is used to combine data or rows from two or more tables based on a common field between them. Different types of Joins are:
- INNER JOIN: The INNER JOIN keyword selects all rows from both tables as long as the condition is satisfied. This keyword will create the result set by combining all rows from both the tables where the condition satisfies i.e. the value of the common field will be the same.
- LEFT JOIN: This join returns all the rows of the table on the left side of the join and matching rows for the table on the right side of the join. For the rows for which there is no matching row on the right side, the result set will be null. LEFT JOIN is also known as LEFT OUTER JOIN
- RIGHT JOIN: RIGHT JOIN is similar to LEFT JOIN. This join returns all the rows of the table on the right side of the join and matching rows for the table on the left side of the join. For the rows for which there is no matching row on the left side, the result set will contain null. RIGHT JOIN is also known as RIGHT OUTER JOIN.
- FULL JOIN: FULL JOIN creates the result set by combining the results of both LEFT JOIN and RIGHT JOIN. The result set will contain all the rows from both tables. For the rows for which there is no matching, the result set will contain NULL values.
Medium Level Questions
Q11. What is the difference between SQL DELETE and SQL TRUNCATE commands?
SQL DELETE |
SQL TRUNCATE |
---|---|
The DELETE statement removes rows one at a time and records an entry in the transaction log for each deleted row. | TRUNCATE TABLE removes the data by deallocating the data pages used to store the table data and records only the page deallocations in the transaction log. |
DELETE command is slower than the identityTRUNCATE command. | While the TRUNCATE command is faster than the DELETE command. |
To use Delete you need DELETE permission on the table. | To use Truncate on a table we need at least ALTER permission on the table. |
The identity of the column retains the identity after using DELETE Statement on the table. | The identity of the column is reset to its seed value if the table contains an identity column. |
Q12. What is a Pivot table?
Pivot tables are one of the most useful features in Excel. They are used to summarize or aggregate lots of data. The summarization of the data can be in the form of average, count, and other statistical methods.
Q13. Difference between Data Lake and Data Warehouse
Data Lake |
Data Warehouse |
---|---|
data is not in normalized form. | Denormalized schemas |
The advances that are utilized in data lakes such as Hadoop, Machine Learning are moderately modern as compared to the information warehouse. | Here the technology that’s utilized for a data warehouse is older. |
A data lake can have all sorts of information and can be utilized with keeping past, show and prospects in mind. | Data Warehouse is concerned, here most of the time is went through on analyzing different sources of the data. |
Data in interior of the data lake are profoundly open and can be rapidly updated. | Data in interior of the data warehouse are more complicated and it requires more fetched to bring any changes to them, availability is additionally confined as it were authorized users. |
Q14. What is Hypothesis Testing
Hypothesis testing is a statistical method that is used in making a statistical decision using experimental data. Hypothesis testing is basically an assumption that we make about a population parameter. It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data.
Q15. Data Preprocessing in Data Mining
Data preprocessing is an important step in the data mining process. It refers to the cleaning, transforming, and integrating of data in order to make it ready for analysis. The goal of data preprocessing is to improve the quality of the data and to make it more suitable for the specific data mining task.
Q16. What Is Time Series Analysis?
Time series data is a sequence of data points recorded or collected at regular time intervals. It is a type of data that tracks the evolution of a variable over time, such as sales, stock prices, temperature, etc.
Q17. Types of Outliers in Data Mining
An outlier is an object that deviates significantly from the rest of the objects. They can be caused by measurement or execution errors. The analysis of outlier data is referred to as outlier analysis or outlier mining.
Q18. Collaborative Filtering in Machine Learning
In Collaborative Filtering, we tend to find similar users and recommend what similar users like. In this type of recommendation system, we don’t use the features of the item to recommend it, rather we classify the users into clusters of similar types and recommend each user according to the preference of its cluster.
Q19. What are B-trees Data structures?
The limitations of traditional binary search trees can be frustrating. Meet the B-Tree, the multi-talented data structure that can handle massive amounts of data with ease. When it comes to storing and searching large amounts of data, traditional binary search trees can become impractical due to their poor performance and high memory usage. B-trees, also known as B-Tree or Balanced Tree, are a type of self-balancing tree that was specifically designed to overcome these limitations.
Q20.Detect Cycle in a Directed Graph
To find a cycle in a directed graph we can use the Depth First Traversal (DFS) technique. It is based on the idea that there is a cycle in a graph only if there is a back edge [i.e., a node points to one of its ancestors] present in the graph.
Hard Level Questions
Q21. Data Normalization Machine Learning
Data normalization is a vital pre-processing, mapping, and scaling method that helps forecasting and prediction models become more accurate. The current data range is transformed into a new, standardized range using this method.
Q22. How can pandas be used for data analysis?
Pandas is one of the most widely used Python libraries for data analysis. It has powerful tools and data structure which is very helpful in analyzing and processing data. Some of the most useful functions of pandas which are used for various tasks involved in data analysis are as follows:
- Data loading functions: Pandas provides different functions to read the dataset from the different-different formats like read_csv, read_excel, and read_sql functions are used to read the dataset from CSV, Excel, and SQL datasets respectively in a pandas DataFrame.
- Data Exploration: Pandas provides functions like head, tail, and sample to rapidly inspect the data after it has been imported. In order to learn more about the different data types, missing values, and summary statistics, use pandas .info and .describe functions.
- Data Cleaning: Pandas offers functions for dealing with missing values (fillna), duplicate rows (drop_duplicates), and incorrect data types (astype) before analysis.
- Data Transformation: Pandas may be used to modify and transform data. It is simple to do actions like selecting columns, filtering rows (loc, iloc), and adding new ones. Custom transformations are feasible using the apply and map functions.
- Data Aggregation: With the help of pandas, we can group the data using groupby function, and also apply aggregation tasks like sum, mean, count, etc., on specific columns.
- Time Series Analysis: Pandas offers robust support for time series data. We can easily conduct date-based computations using functions like resample, shift, etc.
- Merging and Joining: Data from different sources can be combined using Pandas merge and join functions.
Q23. Difference between Descriptive and Inferential statistics
Descriptive Statistics |
Inferential Statistics |
---|---|
It gives information about raw data which describes the data in some manner. | It makes inferences about the population using data drawn from the population. |
It helps in organizing, analyzing, and presenting data in a meaningful manner. | It allows us to compare data, and make hypotheses and predictions. |
It is used to describe a situation. | It is used to explain the chance of occurrence of an event. |
It explains already known data and is limited to a sample or population having a small size. | It attempts to reach a conclusion about the population. |
Q24. What is a correlation?
Correlation is a statistical term that analyzes the degree of a linear relationship between two or more variables. It estimates how effectively changes in one variable predict or explain changes in another. Correlation is often used to assess the strength and direction of associations between variables in various fields, including statistics, and economics.
The correlation between two variables is represented by a correlation coefficient, denoted as “r”. The value of “r” can range between -1 and +1, reflecting the strength of the relationship:
- Positive correlation (r > 0): As one variable increases, the other tends to increase. The greater the positive correlation, the closer “r” is to +1.
- Negative correlation (r < 0): As one variable rises, the other tends to fall. The closer “r” is to -1, the greater the negative correlation.
- No correlation (r = 0): There is little or no linear relationship between the variables.
Q25. Topological Sorting
Approach:
- Create a stack to store the nodes.
- Initialize the visited array of size N to keep the record of visited nodes.
- Run a loop from 0 till N :
- if the node is not marked True in visited array then call the recursive function for topological sort and perform the following steps:
- Mark the current node as True in the visited array.
- Run a loop on all the nodes which has a directed edge to the current node
- if the node is not marked True in the visited array:
- Recursively call the topological sort function on the node
- Push the current node in the stack.
- Print all the elements in the stack.
Q26. Rotate Image by 90 degree
Approach:
- Transform each row of original matrix into required column of final matrix. From the above picture, we can observe that:
- first row of original matrix——> last column of final matrix
- second row of original matrix——> second last column of final matrix
- so on …… last row of original matrix——> first column of final matrix
Q27. 8 queen problem
Explanation:
- This pseudocode uses a backtracking algorithm to find a solution to the 8 Queen problem, which consists of placing 8 queens on a chessboard in such a way that no two queens threaten each other.
- The algorithm starts by placing a queen on the first column, then it proceeds to the next column and places a queen in the first safe row of that column.
- If the algorithm reaches the 8th column and all queens are placed in a safe position, it prints the board and returns true.
- If the algorithm is unable to place a queen in a safe position in a certain column, it backtracks to the previous column and tries a different row.
- The “isSafe” function checks if it is safe to place a queen on a certain row and column by checking if there are any queens in the same row, diagonal or anti-diagonal.
- It’s worth to notice that this is just a high-level pseudocode and it might need to be adapted depending on the specific implementation and language you are using.
Approach 1 (Using Hashing): The idea behind the following approach is
The numbers will be in the range (1, N), an array of size N can be maintained to keep record of the elements present in the given array
Q29. What is the purpose of the HAVING clause in SQL? How is it different from the WHERE clause?
WHERE Clause |
HAVING Clause |
---|---|
WHERE Clause is used to filter the records from the table based on the specified condition. | HAVING Clause is used to filter record from the groups based on the specified condition. |
WHERE Clause can be used without GROUP BY Clause | HAVING Clause cannot be used without GROUP BY Clause |
WHERE Clause implements in row operations | HAVING Clause implements in column operation |
WHERE Clause cannot contain aggregate function | HAVING Clause can contain aggregate function |
WHERE Clause can be used with SELECT, UPDATE, DELETE statement. | HAVING Clause can only be used with SELECT statement. |
WHERE Clause is used before GROUP BY Clause | HAVING Clause is used after GROUP BY Clause |
Q30. What is the difference between joining and blending in Tableau?
Tableau, joining and blending are ways to combine data from various tables or data sources. However, they are employed in various contexts and have several major differences:
Basis |
Joining |
Blending |
---|---|---|
Data Source Requirement | Joining is basically used when you have data from the same data source, such as a field foundationrelational database, where tables are already related through primary and foreign keys. | Blending is used when we have data from different data sources. such as a combination of Excel spreadsheets, CSV files, and databases. These sources may not have predefined relationships. |
Relationships | Foundation for joins is the use of common data like a customer ID or product code to establish predetermined links between tables. These relations are developed within the involvesame data source. | There is no need for pre-established links between tables while blending. Instead, you link different data sources separately and combine them by matching fields with comparable values. |
Data Combining | When tables are joined, a single unified data source with a merged schema is produced. A single table with every relevant fields is created by combining the two tables. | Data blending maintains the separation of the data sources. At query time, tableau gathers and combines data from several sources to produce a momentary, in-memory blend for visualization needs. |
Data Transformation | It is useful for data transformation, aggregations and calculations on the combined data. The information from many connected tables can be used to build computed fields. | It is only useful for data transformation and calculations. It cannot create calculated fields that involves data from different blended data sources. |
P.S: To check the Tiger Analytics Experiences and other asked questions go through the attached link