Open In App

Rough Set Theory | An Introduction

The notion of Rough sets was introduced by Z Pawlak in his seminal paper of 1982 (Pawlak 1982). It is a formal theory derived from fundamental research on logical properties of information systems. Rough set theory has been a methodology of database mining or knowledge discovery in relational databases. In its abstract form, it is a new area of uncertainty mathematics closely related to fuzzy theory. We can use rough set approach to discover structural relationship within imprecise and noisy data. Rough sets and fuzzy sets are complementary generalizations of classical sets. The approximation spaces of rough set theory are sets with multiple memberships, while fuzzy sets are concerned with partial memberships. The rapid development of these two approaches provides a basis for “soft computing, ” initiated by Lotfi A. Zadeh. Soft Computing includes along with rough sets, at least fuzzy logic, neural networks, probabilistic reasoning, belief networks, machine learning, evolutionary computing, and chaos theory. 

Basic problems in data analysis solved by Rough Set:



Goals of Rough Set Theory –

Information system –

In Rough Set, data model information is stored in a table. Each row (tuples) represents a fact or an object. Often the facts are not consistent with each other. In Rough Set terminology a data table is called an Information System. Thus, the information table represents input data, gathered from any domain.



  

Note: Rows of a table are called examples(objects, entities). Information system is a pair (U, A), U is a non-empty finite set of objects and A is a non-empty finite set of attributes. The elements of A are called conditional attributes. An Information table sometimes called decision table when it contains decision attribute/attributes. Decision system is a pair of (U, A union {d}), where d is decision attribute (instead of one we can consider more decision attributes).

 

Indiscernibility –

Tables may contain many objects having the same features. A way of reducing table size is to store only one representative object for every set of objects with same features. These objects are called indiscernible objects or tuples. With any P subset A there is an associated equivalence relation IND(P): Where IND(P) is called indiscernibility of relation. Here x and y are indiscernible from each other by attribute P. In above example,

IND({p1}) = {{O1, O2}, {O3, O5, O7, O9, O10}, {O4, O6, O8}} O1 and O2 are characterized by the same values of attribute p1 and the value is 1. O3, O5, O7, O9, O10 are characterized by the same value of attribute p1 and the value is 2. O4, O6, O8 are characterized by the same value of attribute p1 and the value is 0.

The indiscernibility relation is an equivalence relation. Sets that are indiscernible are called elementary sets.

Approximations –

It is a formal approximation of a crisp set defined by its two approximations – Upper approximation and Lower approximation.

represents the positive region which contains the objects definitely belonging to the target set X. represents the negative region which contains the objects that can be definitely ruled out as a member of the target set X. represents the boundary region which contains the objects that may or may not belong to the target set X.

A set is said to be rough if its boundary region is non-empty, otherwise the set is crisp.  

Let’s discuss an example. the previous table is taken as information table.   

References: http://zsi.tech.us.edu.pl/~nowak/bien/w2.pdf https://www.sciencedirect.com/science/article/pii/S2468232216300786 https://www.mimuw.edu.pl/~son/datamining/RSDM/Intro.pdf

Article Tags :