Open In App

Data Mining: Data Attributes and Quality

Last Updated : 06 May, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Prerequisite – Data Mining 
Data: It is how the data objects and their attributes are stored. 

  • An attribute is an object’s property or characteristics. For example. A person’s hair colour, air humidity etc.
  • An attribute set defines an object. The object is also referred to as a record of the instances or entity.

Different types of attributes or data types: 

In data mining, understanding the different types of attributes or data types is essential as it helps to determine the appropriate data analysis techniques to use. The following are the different types of data:

1]Nominal Data: 

This type of data is also referred to as categorical data. Nominal data represents data that is qualitative and cannot be measured or compared with numbers. In nominal data, the values represent a category, and there is no inherent order or hierarchy. Examples of nominal data include gender, race, religion, and occupation. Nominal data is used in data mining for classification and clustering tasks.

2]Ordinal Data: 

This type of data is also categorical, but with an inherent order or hierarchy. Ordinal data represents qualitative data that can be ranked in a particular order. For instance, education level can be ranked from primary to tertiary, and social status can be ranked from low to high. In ordinal data, the distance between values is not uniform. This means that it is not possible to say that the difference between high and medium social status is the same as the difference between medium and low social status. Ordinal data is used in data mining for ranking and classification tasks.

3]Binary Data: 

This type of data has only two possible values, often represented as 0 or 1. Binary data is commonly used in classification tasks, where the target variable has only two possible outcomes. Examples of binary data include yes/no, true/false, and pass/fail. Binary data is used in data mining for classification and association rule mining tasks.

4]Interval Data: 

This type of data represents quantitative data with equal intervals between consecutive values. Interval data has no absolute zero point, and therefore, ratios cannot be computed. Examples of interval data include temperature, IQ scores, and time. Interval data is used in data mining for clustering and prediction tasks.

5]Ratio Data: 

This type of data is similar to interval data, but with an absolute zero point. In ratio data, it is possible to compute ratios of two values, and this makes it possible to make meaningful comparisons. Examples of ratio data include height, weight, and income. Ratio data is used in data mining for prediction and association rule mining tasks.

6]Text Data: 

This type of data represents unstructured data in the form of text. Text data can be found in social media posts, customer reviews, and news articles. Text data is used in data mining for sentiment analysis, text classification, and topic modeling tasks.

Data Quality: Why do we preprocess the data? 
Data preprocessing is an essential step in data mining and machine learning as it helps to ensure the quality of data used for analysis. There are several factors that are used for data quality assessment, including:

1.Incompleteness: 

This refers to missing data or information in the dataset. Missing data can result from various factors, such as errors during data entry or data loss during transmission. Preprocessing techniques, such as imputation, can be used to fill in missing values to ensure the completeness of the dataset.

2.Inconsistency:

 This refers to conflicting or contradictory data in the dataset. Inconsistent data can result from errors in data entry, data integration, or data storage. Preprocessing techniques, such as data cleaning and data integration, can be used to detect and resolve inconsistencies in the dataset.

3.Noise: 

This refers to random or irrelevant data in the dataset. Noise can result from errors during data collection or data entry. Preprocessing techniques, such as data smoothing and outlier detection, can be used to remove noise from the dataset.

4.Outliers: 

Outliers are data points that are significantly different from the other data points in the dataset. Outliers can result from errors in data collection, data entry, or data transmission. Preprocessing techniques, such as outlier detection and removal, can be used to identify and remove outliers from the dataset.

5.Redundancy: 

Redundancy refers to the presence of duplicate or overlapping data in the dataset. Redundant data can result from data integration or data storage. Preprocessing techniques, such as data deduplication, can be used to remove redundant data from the dataset.

5.Data format: 

This refers to the structure and format of the data in the dataset. Data may be in different formats, such as text, numerical, or categorical. Preprocessing techniques, such as data transformation and normalization, can be used to convert data into a consistent format for analysis.



Similar Reads

Measuring Clustering Quality in Data Mining
A cluster is the collection of data objects which are similar to each other within the same group. The data objects of a cluster are dissimilar to data objects of other groups or clusters. Clustering Approaches:1. Partitioning approach: The partitioning approach constructs various partitions and then evaluates them by some criterion, e.g., minimizi
4 min read
Difference Between Data Mining and Text Mining
Data Mining: Data mining is the process of finding patterns and extracting useful data from large data sets. It is used to convert raw data into useful data. Data mining can be extremely useful for improving the marketing strategies of a company as with the help of structured data we can study the data from different databases and then get more inn
3 min read
Difference Between Data Mining and Web Mining
Data mining: Data mining is the method of analyzing expansive sums of data in an exertion to discover relationships, designs, and insights. These designs, concurring to Witten and Eibemust be "meaningful in that they lead to a few advantages, more often than not a financial advantage." Data in data mining is additionally ordinarily quantitative par
3 min read
Generalized Sequential Pattern (GSP) Mining in Data Mining
GSP is a very important algorithm in data mining. It is used in sequence mining from large databases. Almost all sequence mining algorithms are basically based on a prior algorithm. GSP uses a level-wise paradigm for finding all the sequence patterns in the data. It starts with finding the frequent items of size one and then passes that as input to
7 min read
Text Mining in Data Mining
In this article, we will learn about the main process or we should say the basic building block of any NLP-related tasks starting from this stage of basically Text Mining. What is Text Mining?Text mining is a component of data mining that deals specifically with unstructured text data. It involves the use of natural language processing (NLP) techni
10 min read
Difference Between Data Science and Data Mining
Data Science: Data Science is a field or domain which includes and involves working with a huge amount of data and uses it for building predictive, prescriptive and prescriptive analytical models. It's about digging, capturing, (building the model) analyzing(validating the model) and utilizing the data(deploying the best model). It is an intersecti
6 min read
Difference Between Big Data and Data Mining
Big Data: It is huge, large or voluminous data, information or the relevant statistics acquired by the large organizations and ventures. Many software and data storage created and prepared as it is difficult to compute the big data manually. It is used to discover patterns and trends and make decisions related to human behavior and interaction tech
3 min read
Difference Between Data Mining and Data Visualization
Data mining: Data mining is the method of analyzing expansive sums of data in an exertion to discover relationships, designs, and insights. These designs, concurring to Witten and Eibemust be "meaningful in that they lead to a few advantages, more often than not a financial advantage." Data in data mining is additionally ordinarily quantitative par
2 min read
Difference Between Data Mining and Data Analysis
1. Data Analysis : Data Analysis involves extraction, cleaning, transformation, modeling and visualization of data with an objective to extract important and helpful information which can be additional helpful in deriving conclusions and make choices. The main purpose of data analysis is to search out some important information in raw data so the d
2 min read
Difference between Data Warehousing and Data Mining
A data warehouse is built to support management functions whereas data mining is used to extract useful information and patterns from data. Data warehousing is the process of compiling information into a data warehouse.  Data Warehousing:It is a technology that aggregates structured data from one or more sources so that it can be compared and analy
4 min read
Article Tags :