RIPPER Algorithm :
It stands for Repeated Incremental Pruning to Produce Error Reduction. The Ripper Algorithm is a Rule-based classification algorithm. It derives a set of rules from the training set. It is a widely used rule induction algorithm.
Uses of Ripper Algorithm:
- It works well on datasets with imbalanced class distributions. In a dataset, if we have several records out of which most of the records belong to a particular class and the remaining records belong to different classes then the dataset is said to have an imbalanced distribution of class.
- It works well with noisy datasets as it uses a validation set to prevent model overfitting.
Working of RIPPER
Case I: Training records belong to only two classes
Among the records given, it identifies the majority class ( which has appeared the most ) and takes this class as the default class. For example: if there are 100 records and 80 belong to Class A and 20 to Class B. then Class A will be default class.
For the other class, it tries to learn/derive various rules to detect that class.
Case II: Training records have more than two classes ( Multiple Classes )
Consider all the classes that are available and then arrange them on the basis of their frequency in a particular order ( say increasing).
Consider the classes are arranged as –
C1,C2,C3,......,Cn C1 - least frequent Cn - most frequent
The class with the maximum frequency (Cn) is taken as the default class.
How the rule is Derived:
In the first instance, it tries to derive rules for those records which belong to class C1. Records belonging to C1 will be considered as positive examples(+ve) and other classes will be considered as negative examples(-ve).
Sequential Covering Algorithm is used to generate the rules that discriminate between +ve and -ve examples.
Next, at this junction Ripper tries to derive rules for C2 distinguishing it from the other classes.
This process is repeated until stopping criteria is met, which is- when we are left with Cn (default class).
- Ripper extracts rules from minority class to the majority class.
Rule Growing in RIPPER Algorithm:
- Ripper makes use of general to a specific strategy of growing rules. It starts from an empty rule and goes on adding the best conjunct to the rule antecedent.
- For evaluation of conjuncts the metric is chosen is FOIL’s Information Gain. Using this the best conjunct is chosen.
- Stopping Criteria for adding the conjuncts – when the rule starts covering the negative (-ve) examples.
- The new rule is pruned based on its performance on the validation set.
Rule Pruning Using RIPPER:
We need to identify whether a particular rule should be pruned or not. To determine this a metric is used, which is –
(P-N)/(P+N) P = number of positive examples in the validation set covered by the rule. N = number of negative examples in the validation set covered by the rule.
- Whenever a conjunct is added or removed we calculate the value of the above metric for the original rule (before adding/removing) and the new rule (after adding/removing).
- If the value of the new rule is better than the original rule then we can add/remove the conjunct. Otherwise, the conjunct will not be added/removed.
- Pruning is done starting from the rightmost end. For example: Consider a rule –
ABCD ---> Y ,where A,B,C,D are conjuncts and Y is the class. First it will remove the conjunct D and measure the metric value. If the quality of the metric is improved the conjunct D is removed. If the quality does not improve then the pruning is checked for CD,BCD and so on.
Building the Ruleset in RIPPER Algorithm:
- After a rule is derived, all the positive and negative examples covered by the rule are eliminated.
- The rule is then added into the ruleset until it doesn’t violate the stopping condition. The stopping criteria which we can use are –
A) Minimum description length principle: For transferring the information from one end to another end you require a minimum number of bits. We want the rule to be represented using a minimum number of bits. If the new rule increases the total description length of the ruleset by d bits ( by default d is 64 bits), then RIPPER stops adding rules into the ruleset.
B) Error Rate – We will consider the rule and calculate its error rate (misclassification) w.r.t the validation set. The error rate of a particular rule should not exceed more than 50%.
This is how a RIPPER Algorithm works. For any queries do leave a comment down below.