Vapnik-Chervonenkis Dimension

Last Updated : 12 Jun, 2023

The Vapnik-Chervonenkis (VC) dimension is a measure of the capacity of a hypothesis set to fit different data sets. It was introduced by Vladimir Vapnik and Alexey Chervonenkis in the 1970s and has become a fundamental concept in statistical learning theory. The VC dimension is a measure of the complexity of a model, which can help us understand how well it can fit different data sets.

The VC dimension of a hypothesis set H is the largest number of points that can be shattered by H. A hypothesis set H shatters a set of points S if, for every possible labeling of the points in S, there exists a hypothesis in H that correctly classifies the points. In other words, a hypothesis set shatters a set of points if it can fit any possible labeling of those points.

Bounds of VC – Dimension

The VC dimension provides both upper and lower bounds on the number of training examples required to achieve a given level of accuracy. The upper bound on the number of training examples is logarithmic in the VC dimension, while the lower bound is linear.

Applications of VC – Dimension

The VC dimension has a wide range of applications in machine learning and statistics. For example, it is used to analyze the complexity of neural networks, support vector machines, and decision trees. The VC dimension can also be used to design new learning algorithms that are robust to noise and can generalize well to unseen data.

The VC dimension can be extended to more complex learning scenarios, such as multiclass classification and regression. The concept of the VC dimension can also be applied to other areas of computer science, such as computational geometry and graph theory.

Code Implementation for VC – Dimension

The VC dimension is a theoretical concept that cannot be directly computed from data. However, we can estimate the VC dimension for a given hypothesis set by counting the number of points that can be shattered by the set. In Python, we can implement a function that computes the VC dimension of a given hypothesis set using this approach.

The function takes a hypothesis set as its input and computes the VC dimension using the brute-force approach of checking all possible combinations of points and labels. It uses the itertools module to generate all possible combinations of points and labels and then checks if the hypothesis set can shatter each combination. The function returns the estimated VC dimension of the hypothesis set.

Let’s illustrate the usage of this function with some examples:

Example 1:

Suppose we have a hypothesis set that consists of all linear functions of form f(x) = ax + b, where a and b are real numbers. We can define this hypothesis set in Python as follows:

Python

import itertools 
  
  
def vc_dimension(hypothesis_set): 
    """ 
    Estimates the VC dimension of a hypothesis set using the brute-force approach. 
    """
    n = 4
    while True: 
        points = [(i, j) for i in range(n) for j in range(2)] 
        shattered_sets = 0
        for combination in itertools.combinations(points, n): 
            is_shattered = True
            for labeling in itertools.product([0, 1], repeat=n): 
                hypotheses = [hypothesis_set(point) for point in combination] 
                if set(hypotheses) != set(labeling): 
                    is_shattered = False
                    break
            if is_shattered: 
                shattered_sets += 1
            else: 
                break
        if not is_shattered: 
            break
        n += 1
    return n-1 if shattered_sets == 2**n else n-2
  
  
# Example 1: linear function hypothesis set 
def linear_function(point): 
    x, y = point 
    return int(y >= x) 
  
  
print(vc_dimension(linear_function)) 

Output:

In example 1, the linear_function function implements a simple linear function hypothesis set that returns 1 if the y-coordinate of the input point is greater than or equal to the x-coordinate, and 0 otherwise. The vc_dimension function is then used to estimate the VC dimension of this hypothesis set, which is 2.

Example 2:

Suppose we have a hypothesis set that consists of all quadratic function of form f(x) = ax² + bx + c, where a, b, and c are real numbers. We can define this hypothesis set in Python as follows:

Python

import itertools 
  
  
def vc_dimension(hypothesis_set): 
    """ 
    Estimates the VC dimension of a hypothesis set using the brute-force approach. 
    """
    n = 5
    while True: 
        points = [(i, j) for i in range(n) for j in range(2)] 
        shattered_sets = 0
        for combination in itertools.combinations(points, n): 
            is_shattered = True
            for labeling in itertools.product([0, 1], repeat=n): 
                hypotheses = [hypothesis_set(point) for point in combination] 
                if set(hypotheses) != set(labeling): 
                    is_shattered = False
                    break
            if is_shattered: 
                shattered_sets += 1
            else: 
                break
        if not is_shattered: 
            break
        n += 1
    return n-1 if shattered_sets == 2**n else n-2
  
  
# Example 2: quadratic function hypothesis set 
def quadratic_function(point): 
    x, y = point 
    return int(y >= x**2) 
  
  
print(vc_dimension(quadratic_function)) 

Output:

In example 2, the quadratic_function function implements a more complex quadratic function hypothesis set that returns 1 if the y-coordinate of the input point is greater than or equal to the square of the x-coordinate, and 0 otherwise. The vc_dimension function is then used to estimate the VC dimension of this hypothesis set, which is 3.

Conclusion

The VC dimension is a fundamental concept in statistical learning theory that measures the complexity of a hypothesis set. It provides both upper and lower bounds on the number of training examples required to achieve a given level of accuracy. In Python, we can estimate the VC dimension of a given hypothesis set using a brute-force approach that checks all possible combinations of points and labels. The VC dimension has a wide range of applications in machine learning and statistics and can be extended to more complex learning scenarios.

Suggest improvement

Units and Dimensions

Share your thoughts in the comments