Prerequisite – Data Mining
Data: It is how the data objects and their attributes are stored.
- An attribute is an object’s property or characteristics. For example. A person’s hair colour, air humidity etc.
- An attribute set defines an object. The object is also referred to as a record of the instances or entity.
Different types of attributes or data types:
Nominal Attributes only provide enough attributes to differentiate between one object and another. Such as Student Roll No., Sex of the Person.
The ordinal attribute value provides sufficient information to order the objects. Such as Rankings, Grades, Height
- Binary Attribute:
These are 0 and 1. Where 0 is the absence of any features and 1 is the inclusion of any characteristics.
The difference between values is significant for interval attributes. Such characteristics are calculated on units of equal size. Such as dates, the temperature etc.
Both differences and ratios are significant for Ratio. For eg. age, length, Weight.
Data Quality: Why do we preprocess the data?
Many characteristics act as a deciding factor for data quality, such as incompleteness and incoherent information, which are common properties of the big database in the real world. Factors used for data quality assessment are:
There are many possible reasons for flawed or inaccurate data here. i.e. Having incorrect values of properties that could be human or computer errors.
For some reasons, incomplete data can occur, attributes of interest such as customer information for sales & transaction data may not always be available.
Incorrect data can also result from inconsistencies in naming convention or data codes, or from input field incoherent format. Duplicate tuples need cleaning of details, too.
It also affects the quality of the data. At the end of the month, several sales representatives fail to file their sales record on time. These are also several corrections & adjustments which flow into after the end of the month. Data stored in the database are incomplete for a time after each month.
It is reflective of how much users trust the data.
It is a reflection of how easy the users can understand the data.
- Data Mining: Data Warehouse Process
- Types of Sources of Data in Data Mining
- Difference between Data Warehousing and Data Mining
- Data Integration in Data Mining
- Data Transformation in Data Mining
- Data Reduction in Data Mining
- Data Preprocessing in Data Mining
- Data Normalization in Data Mining
- Data Mining
- Data Mining | Set 2
- Challenges of Data Mining
- KDD Process in Data Mining
- Hierarchical Clustering in Data Mining
- Tasks and Functionalities of Data Mining
- Measures of Distance in Data Mining
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.