What is Data Organization?
The data collected by an investigator is in raw form and cannot offer any meaningful conclusion; hence, it needs to be organized properly. Therefore, the process of systematically arranging the collected data or raw data so that it can be easy to understand the data is known as organization of data. With the help of organized data, it becomes convenient for the investigator to perform further statistical treatments. The investigator can also compare the mass of similar data if the collected raw data is organized systematically.
Classification of Data
A method of organization of data for the distribution of raw data into different classes based on their classifications is known as classification of data. In other words, classification of data means converting raw data collected by an investigator into statistical series in a way that provides meaningful conclusions.
According to Conner, “Classification is the process of arranging things (either actually or notionally) in groups or classes according to their resemblances and affinities, and gives expression to the unity of attributes that may exist amongst a diversity of individuals.”
Based on the definition of classification of data by Conner, the two basic features of this process are:
- The raw data is divided into different groups. For example, on the basis of marital status, people can be classified as married, unmarried, divorced and engaged.
- The raw data is classified based on class similarities. All similar units of the raw data are put together in one class. For example, every educated person can be put together in one class and uneducated in another.
Each group or division of the raw data classified on the basis of their similarities is known as Class.
For example, the population of a city can be classified or grouped based on their age, education, income, sex, marital status, etc., as it can provide the investigator with better conclusions for different purposes.
Objectives of Classification of Data
The major objectives of the classification of data are as follows:
- Brief and Simple: The main objective of the classification of data is presentation of the raw data in a systematic, brief and simple form. It will help the investigator in understanding the data easily and efficiently, as they can draw out meaningful conclusions through them.
- Distinctiveness: Through classification of data, one can render obvious differences from the collected raw data more distinctly.
- Utility: Classification of data brings out the similarities within the raw diverse data of the study that enhances its utility.
- Comparability: With the classification of data, one can easily compare data and can also estimate it for various purposes.
- Effective and Attractive: Classification makes raw data more attractive and effective.
- Scientific Arrangement: The process of classification of data facilitates proper arrangement of raw data in a scientific manner. In this way, one can increase the reliability of the collected data.
Characteristics of a Good Classification
- Clarity: Classification of the raw data is beneficial for an investigator only when it provides a clear and simple form of information. Clarity here means that there should not be any kind of confusion regarding any element or part of a class.
- Comprehensiveness: There should be comprehensiveness in the classification of the raw data so that each of its items gets a place in some class. In other words, a classification is good if no item is left out of the classes.
- Homogeneity: Each and every item of a class must be similar to each other. Homogeneity in the different items of a class ensures the best results and further investigations.
- Stability: Stability in the same set of classification of data for a specific kind of investigation is essential, as it does not confuse the investigator. Therefore, the base of classification of data should not change with every investigation.
- Suitability: The classes in the data classification process must suit the motive of enquiry. For example, classifying children of a city based on their weight, age, and sex for the investigation of literacy rate makes no sense. The data for literacy rate investigation must be done into classes, like educated and uneducated.
- Elastic: Data classification can provide better results only if it is elastic and hence, has scope for change if there is any change in the scope or objective of the investigation.
Basis of Classification
Statistical information can be classified into four different categories described below:
1. Geographical or Spatial Classification
Under this category, the data is classified on the basis of location or geographical differences in the data. In other words, geographical classification involves classifying data according to the geographical region. For example, to perform a study on the production of cotton in India, we can take the major four central regions and classify data based on this geographical classification as:
Production of Cotton (in kg.)
2. Chronological Classification
Under this category, the data is classified on the basis of time of existence, like months, weeks, days, years, quarters, etc. In chronological data classification, the given data is arranged either in descending order or ascending order with reference to the time as years, months, days, weeks, quarters, etc. Another name for chronological classification is temporal classification. For example, profits of a company in three years 2010, 2011 and 2012.
3. Qualitative Classification
Under this category, the given data is classified based on its attributes or qualities. The attributes or qualities of data include hair colour, gender, intelligence, religion, honesty, etc. In the qualitative classification of data, one cannot measure the attributes of the study; instead, one can only discover whether the attribute is present or not. It is further divided into two categories: Single Classification and Manifold Classification.
- Simple Classification: In simple data classification, the given data is precisely classified into two groups. In other words, the data is classified based on the existence or absence of the quality. Therefore, simple classification is also known as classification according to dichotomy. In simple words, this type of classification consists of two classes, where one class holds the attribute, while the other does not. For example, classification of students based on their gender by simply categorizing them as male and female.
- Manifold Classification: In manifold data collection, after categorizing the data into two groups, it is then further divided on the basis of extra attributes or qualities within the initially formed attributes. It means that classification of data can have different levels of attributes with more than two classes. For example, students of a class can be classified as male or female. Then they can be further classified as above average and below average, and so on.
4. Quantitative or Numerical Classification
As the name suggests, under the quantitative classification of data, the collected data is classified on the basis of numerical values. The variables of quantities under the quantitative classification of data can be either operated on or estimated for further analysis. These measurable characteristics include age, income, weight, height, etc. For example, classification of 50 students in a class based on their weight.
|Weight (in kg.)|
Number of Students