Open In App

Types of Outliers in Data Mining

Last Updated : 04 May, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Outlier is a data object that deviates significantly from the rest of the data objects and behaves in a different  manner. An outlier is an object that deviates significantly from the rest of the objects. They can be caused by measurement or execution errors. The analysis of outlier data is referred to as outlier analysis or outlier mining.

 An outlier cannot be termed as a noise or error. Instead, they are suspected of not being generated by the same method as the rest of the data objects. 

Outliers are of three types, namely –

  1. Global (or Point) Outliers
  2. Collective Outliers
  3. Contextual (or Conditional) Outliers

1. Global Outliers

1. Definition: Global outliers are data points that deviate significantly from the overall distribution of a dataset.
2. Causes: Errors in data collection, measurement errors, or truly unusual events can result in global outliers.
3. Impact: Global outliers can distort data analysis results and affect machine learning model performance.
4. Detection: Techniques include statistical methods (e.g., z-score, Mahalanobis distance), machine learning algorithms (e.g., isolation forest, one-class SVM), and data visualization techniques.
5. Handling: Options may include removing or correcting outliers, transforming data, or using robust methods.
6. Considerations: Carefully considering the impact of global outliers is crucial for accurate data analysis and machine learning model outcomes.

The red data point is a global outlier.

2. Collective Outliers

1. Definition: Collective outliers are groups of data points that collectively deviate significantly from the overall distribution of a dataset.
2. Characteristics: Collective outliers may not be outliers when considered individually, but as a group, they exhibit unusual behavior.
3. Detection: Techniques for detecting collective outliers include clustering algorithms, density-based methods, and subspace-based approaches.
4 Impact: Collective outliers can represent interesting patterns or anomalies in data that may require special attention or further investigation.
5. Handling: Handling collective outliers depends on the specific use case and may involve further analysis of the group behavior, identification of        contributing factors, or considering contextual information.
6. Considerations: Detecting and interpreting collective outliers can be more complex than individual outliers, as the focus is on group behavior rather than individual data points. Proper understanding of the data context and domain knowledge is crucial for effective handling of collective outliers.

The red data points as a whole are collective outliers.

3. Contextual Outliers

1. Definition: Contextual outliers are data points that deviate significantly from the expected behavior within a specific context or subgroup.
2. Characteristics: Contextual outliers may not be outliers when considered in the entire dataset, but they exhibit unusual behavior within a specific context or subgroup.
3. Detection: Techniques for detecting contextual outliers include contextual clustering, contextual anomaly detection, and context-aware machine learning approaches.
4. Contextual Information: Contextual information such as time, location, or other relevant factors are crucial in identifying contextual outliers.
5. Impact: Contextual outliers can represent unusual or anomalous behavior within a specific context, which may require further investigation or          attention.
6. Handling: Handling contextual outliers may involve considering the contextual information, contextual normalization or transformation of data, or      using context-specific models or algorithms.
7. Considerations: Proper understanding of the context and domain-specific knowledge is crucial for accurate detection and interpretation of contextual outliers, as they may vary based on the specific context or subgroup being considered.

 A low temperature value in June is a contextual outlier because the same value in December is not an outlier.


Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads