Prerequisite – Data Mining
Data: It is how the data objects and their attributes are stored.
- An attribute is an object’s property or characteristics. For example. A person’s hair colour, air humidity etc.
- An attribute set defines an object. The object is also referred to as a record of the instances or entity.
Different types of attributes or data types:
- Nominal Attribute:
Nominal Attributes only provide enough attributes to differentiate between one object and another. Such as Student Roll No., Sex of the Person.
- Ordinal Attribute:
The ordinal attribute value provides sufficient information to order the objects. Such as Rankings, Grades, Height
- Binary Attribute:
These are 0 and 1. Where 0 is the absence of any features and 1 is the inclusion of any characteristics.
- Numeric attribute:It is quantitative, such that quantity can be measured and represented in integer or real values ,are of two types
Interval Scaled attribute:
It is measured on a scale of equal size units,these attributes allows us to compare such as temperature in C or F and thus values of attributes have order.
- Ratio Scaled attribute:
Both differences and ratios are significant for Ratio. For eg. age, length, Weight.
Data Quality: Why do we preprocess the data?
Many characteristics act as a deciding factor for data quality, such as incompleteness and incoherent information, which are common properties of the big database in the real world. Factors used for data quality assessment are:
There are many possible reasons for flawed or inaccurate data here. i.e. Having incorrect values of properties that could be human or computer errors.
For some reasons, incomplete data can occur, attributes of interest such as customer information for sales & transaction data may not always be available.
Incorrect data can also result from inconsistencies in naming convention or data codes, or from input field incoherent format. Duplicate tuples need cleaning of details, too.
It also affects the quality of the data. At the end of the month, several sales representatives fail to file their sales record on time. These are also several corrections & adjustments which flow into after the end of the month. Data stored in the database are incomplete for a time after each month.
It is reflective of how much users trust the data.
It is a reflection of how easy the users can understand the data.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.
- Difference Between Data Mining and Text Mining
- Difference Between Data Mining and Web Mining
- Difference between Data Warehousing and Data Mining
- Difference Between Data Science and Data Mining
- Difference Between Big Data and Data Mining
- Difference Between Data Mining and Data Visualization
- Types of Sources of Data in Data Mining
- Data Normalization in Data Mining
- Data Preprocessing in Data Mining
- Data Integration in Data Mining
- Data Mining: Data Warehouse Process
- Data Reduction in Data Mining
- Data Transformation in Data Mining
- Data Objects, Attributes and Relationships in DBMS
- Redundancy and Correlation in Data Mining
- Relationship between Data Mining and Machine Learning
- Tasks and Functionalities of Data Mining
- Types and Part of Data Mining architecture
- Difference Between Data mining and Machine learning
- Difference Between Data Mining and Statistics
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.
Improved By : shiksharanchi2000