Open In App

Ripper Algorithm

RIPPER Algorithm :

It stands for Repeated Incremental Pruning to Produce Error Reduction. The Ripper Algorithm is a Rule-based classification algorithm. It derives a set of rules from the training set. It is a widely used rule induction algorithm.



Uses of Ripper Algorithm:

  1. It works well on datasets with imbalanced class distributions. In a dataset, if we have several records out of which most of the records belong to a particular class and the remaining records belong to different classes then the dataset is said to have an imbalanced distribution of class.
  2. It works well with noisy datasets as it uses a validation set to prevent model overfitting.

Working of RIPPER

Case I: Training records belong to only two classes



Among the records given, it identifies the majority class ( which has appeared the most ) and takes this class as the default class. For example: if there are 100 records and 80 belong to Class A and 20 to Class B. then Class A will be default class. 

For the other class, it tries to learn/derive various rules to detect that class.

Case II: Training records have more than two classes ( Multiple Classes )

Consider all the classes that are available and then arrange them on the basis of their frequency in a particular order ( say increasing).

Consider the classes are arranged as –

C1,C2,C3,......,Cn
C1 - least frequent
Cn - most frequent

The class with the maximum frequency (Cn) is taken as the default class.

How the rule is Derived: 

In the first instance, it tries to derive rules for those records which belong to class C1. Records belonging to C1 will be considered as positive examples(+ve) and other classes will be considered as negative examples(-ve).

Sequential Covering Algorithm is used to generate the rules that discriminate between +ve and -ve examples.

Next, at this junction Ripper tries to derive rules for C2 distinguishing it from the other classes.

This process is repeated until stopping criteria is met, which is- when we are left with Cn (default class).

Rule Growing in RIPPER Algorithm:

Rule Pruning Using RIPPER:

We need to identify whether a particular rule should be pruned or not. To determine this a metric is used, which is – 

(P-N)/(P+N)

P = number of positive examples in the validation set covered by the rule.
N = number of negative examples in the validation set covered by the rule.
ABCD ---> Y ,where A,B,C,D are conjuncts and Y is the class.

First it will remove the conjunct D and measure the metric value. If the quality of the 
metric is improved the conjunct D is removed.

If the quality does not improve then the pruning is checked for CD,BCD and so on.

Building the Ruleset in RIPPER Algorithm:

A) Minimum description length principle: For transferring the information from one end to another end you require a minimum number of bits. We want the rule to be represented using a minimum number of bits. If the new rule increases the total description length of the ruleset by d bits ( by default d is 64 bits), then RIPPER stops adding rules into the ruleset.

B) Error Rate – We will consider the rule and calculate its error rate (misclassification) w.r.t the validation set. The error rate of a particular rule should not exceed more than 50%.

This is how a RIPPER Algorithm works. For any queries do leave a comment down below.

Article Tags :