Skip to content
Related Articles
Open in App
Not now

Related Articles

Data Mining: Data Attributes and Quality

Improve Article
Save Article
  • Last Updated : 27 Sep, 2022
Improve Article
Save Article

Prerequisite – Data Mining 
Data: It is how the data objects and their attributes are stored. 

  • An attribute is an object’s property or characteristics. For example. A person’s hair colour, air humidity etc.
  • An attribute set defines an object. The object is also referred to as a record of the instances or entity.

Different types of attributes or data types: 

  1. Nominal Attribute: 
    Nominal Attributes only provide enough attributes to differentiate between one object and another. Such as Student Roll No., Sex of the Person. 
  2. Ordinal Attribute: 
    The ordinal attribute value provides sufficient information to order the objects. Such as Rankings, Grades, Height
  3. Binary Attribute: 
    These are 0 and 1. Where 0 is the absence of any features and 1 is the inclusion of any characteristics.
  4. Numeric attribute:It is quantitative, such that quantity can be measured and represented in integer or real values ,are of two types
    Interval Scaled attribute: 
    It is measured on a scale of equal size units,these attributes allow us to compare such as temperature in C or F and thus values of attributes have ordered.
     Ratio Scaled attribute: 
    Both differences and ratios are significant for Ratio. For eg. age, length, and Weight.

Data Quality: Why do we preprocess the data? 
Many characteristics act as a deciding factor for data quality, such as incompleteness and incoherent information, which are common properties of the big database in the real world. Factors used for data quality assessment are: 

  • Accuracy: 
    There are many possible reasons for flawed or inaccurate data here. i.e. Having incorrect values of properties that could be human or computer errors. 
     
  • Completeness: 
    For some reasons, incomplete data can occur, attributes of interest such as customer information for sales & transaction data may not always be available. 
     
  • Consistency: 
    Incorrect data can also result from inconsistencies in naming convention or data codes, or from input field incoherent format. Duplicate tuples need cleaning of details, too. 
     
  • Timeliness: 
    It also affects the quality of the data. At the end of the month, several sales representatives fail to file their sales records on time. There are also several corrections & adjustments which flow into after the end of the month. Data stored in the database are incomplete for a time after each month. 
     
  • Believability: 
    It is reflective of how much users trust the data. 
     
  • Interpretability: 
    It is a reflection of how easy the users can understand the data.
My Personal Notes arrow_drop_up
Related Articles

Start Your Coding Journey Now!