Collection and Presentation of Data
We come across a lot of information every day from different sources. Our newspapers, TV, Phone and the Internet, etc are the sources of information in our life. This information can be related to anything, from bowling averages in cricket to profits of the company over the years. These facts and figures are often numerical and are called Data. Statistics is the study of data. Let’s look into this in detail.
Statistics – Collection and Presentation of Data
Before going into Statistics, first, let’s define what is Data.
Attention reader! All those who say programming isn't for kids, just haven't met the right mentors yet. Join the Demo Class for First Step to Coding Course, specifically designed for students of class 8 to 12.
The students will get to learn more about the world of programming in these free classes which will definitely help them in making a wise career choice in the future.
“Data are units of information, often numeric, collected through observation.”
It is plural form of the Latin word “Datum”.
Our world has become very information-oriented in the past two decades. So, it becomes essential for us to extract meaningful information out of data. For that we need statistics. Let’s see what statistics mean in formal terms.
Statistics is derived from Latin word “Status” which means “a state”. It concerns with the nature, meaning and distribution of the data.
Collection of Data
Collection of data refers to collecting information about something with an objective to analyze it or extract some meaningful information from it. Some examples of activities involving the collection of data are:
- Students collecting data from their localities about the number of people with Covid Vaccines.
- A Football fan collecting information about the goals scored by his favorite player.
- A record company collecting information about album sales by their artists.
Types of Recorded Data
Most of the time when we collect data for our experiment with an objective. It usually falls into one of these two categories:
- Categorical Data
- Numerical Data
This data represents the characteristics of something entity. For example, if we are collecting data about some people. Categorical data related to this information might be, gender of the person, marital status, etc. These things will have values that are not numerical, often “Yes/No” or in this case “Male/Female”. Since they are not numerical, they cannot be added together.
This data comes out of measurement and is numerical in nature. For example, Weight of the person, stock prices, marks of students of class XII, etc. This data is also called quantitative data. It can be broken down further into types:
- Continuous Data
- Discrete Data
Continuous Data: This data can take any value between intervals. The number of possible values for this data cannot be counted. For example Length of a ruler can take any length between 0-100cm. It can be either 30cm, 30.11cm and so on. There are infinitely many possible values.
Discrete Data: This data takes only certain values. For example: If a coin is tossed three times, and we want to count the number of heads. There are only a handful of values that are possible. 0,1,2 or 3. It cannot take 2.2 or any other value. So, there are only finite possible values.
Presentation of Data
After collecting the data, we need to present it in a meaningful way. Let’s take an example,
Suppose we have the data of heights of students in a class,
140, 161, 152, 184, 135, 168 and 144.
We need to answer the following questions related to the data:
- What is the height of the longest student in the class?
- What is the height of the shortest student in the class?
- What is the average height?
It is a little difficult to analyze the data in this format. The data in the form is called raw data. Analyzing the data in this form might take more time if the data is big. It can be made a little easier if sort the data in ascending or descending order. Thus, in this way, the presentation of data affects the information and the time taken to extract it from the data.
Suppose if this data was even bigger, then it would be very difficult to organize the data in sorted order. In such cases, we might use a frequency table. Let’s see this through an example.
Un-Grouped Frequency Distribution
In this type of frequency table, we consider the values as it is and then count their number of occurrences in the data. We don’t group the data. Let’s see this through an example.
Question: Let’s say we have marks of students of class XII. The marks are out of 40.
20 21 29 15 7 8 10 31 40 24 5 11 13 20 24 27 13 15 38 33 29
Represent this data using a frequency table.
Let’s take marks of some student in one column and frequency of such marks in another column.
Marks Frequency 5 1 7 1 8 1 10 1 11 1 13 2 15 2 20 2 21 1 24 2 29 1 33 1 38 1 40 1
Notice that in this table, we have not grouped the data instead we have taken exact values and their frequency. So, this type of representation is called ungrouped frequency distribution.
Grouped Frequency Distribution
The previous kind of representation is definitely an improvement over previous representations but as seen in the above example, tables can get pretty big in such representations. Tally Marks and grouping can also be used to represent this data.
Question: We have the data for the number of covid cases on a particular day in 20 cities.
10 21 25 33 15 8 16 20 0 5 38 28 5 0 16 23
Represent this data using a frequency table.
In the previous example we saw that ungrouped frequency distribution is cumbersome and very long to look at. So now, we will divide the data into groups. This kind of frequency table representation is called grouped frequency representation.
Let’s divide the numbers of cases in the groups like, 0-5, 5-10, 10-15 … and so on.
Then the frequency table will become,
Group Frequency 0-5 2 5-10 3 10-15 1 15-20 3 20-25 2 25-30 2 30-35 1 35-40 1
The intervals like 0-5, 5-10 .. And so on given in the above example are called class intervals. The larger number is called higher limit and the lower number is called the lower limit.
Let’s see some sample problems on these concepts
Problem 1: The table below represents the data. Represent this data in the form of suitable frequency distribution.
3 4 3 3 2 4 4 3 2 2 2 3
We can see from the data given above, that there are only three values – 2,3 and 4. These values occur multiple times throughout the data. Since there are very less number of values, we can represent this kind of data in the form un-grouped frequency table.
Value Frequency 2 4 3 5 4 3 Total – 12
Problem 2: The data given below represents the blood groups of the 20 students of class XI.
O A AB A AB AB O B A A O B B O B A B AB O B
Represent the data given above in the table in the form of a frequency table. Which of the following blood group has the highest frequency among the students?
We know there are four types of blood groups in the table.
O, A, AB and B
So, we will use ungrouped frequency distribution table to represent the data.
Blood Group Frequency O 5 A 5 AB 4 B 6 Total 20
From the frequency distribution table we can tell the B is the blood group which most commonly occurring in students.
Problem 3: The table represents the weights of the students of class X.
60 73 62 54 48 88 49 52 55 60 62 63 77 47 65 59
Answer the following questions:
- What is the range in which most students lie?
- Suppose students weighing more than 70 are considered overweight and those weighing less than 50 are considered as underweight. How many such students are there in the class?
Let’s make a grouped frequency distribution table for this data.
Assuming intervals like 0-10,10-20…and so on. Let’s divide the data into these intervals are count the frequency.
Weight Group Frequency 0-10 0 10-20 0 20-30 0 30-40 0 40-50 3 50-60 4 60-70 6 70-80 2 80-90 1 Total 16
This above table represents a grouped frequency table. Now answering the questions.
1. Most students lie in the range from 60-70.
2. For overweight students, we need to count the number of students with weight greater than 70. It can be observed from the table that there are three such students.
For underweight students, the number students with weight less than 50 are also three students.
Problem 4: Three coins are tossed 20 times. The number of heads that occurred each time is recorded and given in this data below. Prepare a frequency distribution for the given data.
0 2 1 3 2 1 1 1 3 2 0 3 2 3 2 2 2 0 1 2
We know there are maximum of three heads possible at each turn in this experiment. So we can actually make an ungrouped frequency distribution for such data
Number of Heads Frequency 0 3 1 5 2 8 3 4 Total 20
Thus, the table above represents the frequency table for this data.