Open In App

Lamarckian Evolution and Baldwin Effect in Evolutionary

Improve
Improve
Like Article
Like
Save
Share
Report

Lamarckian Evolution Theory:

The Lamarckian theory states the characteristic individual acquire during their lifetime pass them to their children. This theory is named after French biologist Jean Baptiste Lamarck. According to Lamarck’s theory, learning is an important part of the evolution of species(or for our purpose in the Evolutionary algorithm). This theory is discredited in a biological context but can be used in genetic algorithms in machine learning.

Baldwin Effect:

Baldwin proposed that individual learning can explain evolutionary phenomena that appear to require Lamarckian inheritance of acquired characteristics. The ability of individuals to learn can guide the evolutionary process. In effect, learning smooths the fitness landscape, thus facilitating evolution.

 Baldwin Effect is first demonstrated by Hinton and Nolan in the context of machine learning in 1987. They take simple Neural Networks (NNs). In one experiment they take NNs of fixed weights while other NNs set to trainable. They concluded that:

  • When there is no individual learning, the population(collection of NNs) failed to improve over time.
  • When learning is applied in early stages, the population contains many individuals with many trainable weights, but in later stages, it achieved high fitness with the number of trainable weights decreases in individuals.

G-prop algorithm:

 G-Prop is an evolutionary hybrid algorithm. It is a hybrid of Backpropagation (BP) and Multi-layer Perceptron (MLP). Below is the G-Prop algorithm.

  • Generate the initial model with random weight values and hidden layer sizes uniformly distributed from 2 to a maximum of a given value.
  • For G generations:
    • Evaluate the new individuals: train them using the training set and obtain their fitness according to the number of correct classifications on the validation set and the hidden layer size.
    • Select the n best individuals in the population based on the value fitness function and combine them, using mutation, crossover, addition, elimination, and substitution of hidden neurons.
    • Replace the n worst individuals by the new individuals.
  • Use the best individuals on the test set to obtain a testing error.

Fitness Function: The fitness function is defined as the ability to classify/approximate the validation set to segregate the best individual while training for each generation. In the case of two individuals having the same fitness function, the individual with the lowest hidden layer parameter terms better because the number of parameters is proportional to the speed of training.

  • The Lamarckian approach uses no special fitness function, instead, it uses a local search genetic operator (similar to Quick Propagation) that is designed to improve the individuals, saving the individual trained weights back to the population.
  • For the Baldwin effect, the process of using fitness function is followed:
    • In the first step, we calculate the classification/ approximation ability of an individual on the validation set before training.
    • Then it is trained and its ability is calculated using the following criteria:
      • The best individual (MLP/ANN) is that with higher classification/approximation ability after training.
      • If both MLPs show the same accuracy. Then the best is that whose classification/approximation ability before training is higher, this is because the intuition is that the MLP is more likely to have high accuracy when trained again.
      • If both MLPs has equal accuracy before and after training, then the best model is the smaller (the model which has less trainable parameters).

Results and Conclusion:


Results on Glass1a dataset



Error and Size comparison b/w Lamarckian and Baldwin Effect when learned for 300 generations

  • The author concluded that the Lamarckian strategy finds a suitable individual (MLP) in the early generations which remains the best during the whole simulation, thus evolution stops. While the Baldwin effect can be better than the Lamarckian approach but it requires more generations.
  • It is also noticed from the above conclusion that the neural networks generated by the Lamarckian strategy are small thus requires less time to train, predict, and also to design it.
  • Another important result is that the Lamarckian operator improves the fitness function in early generations. This is due to the elitist algorithm in which some part of the fittest individual is copied to the next generation. The individuals (on which the fittest individual data copied) can obtain an advantage in relation to the remaining members of the population and will continue to be the best individuals among the population until the end of the simulation

References:


Last Updated : 26 Nov, 2020
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads