The Lottery Ticket Hypothesis has been presented in the form of a research paper at ICLR 2019 by MITIBM Watson AI Lab. This paper has been awarded the Best Paper Award in ICLR 2019.
Background: Network Pruning
Pruning basically means reducing the extent of a neural network by removing superfluous and unwanted parts. Network Pruning is a commonly used practise to reduce the size, storage and computational space occupied by a neural network. Like – Fitting an entire neural network in your phone. The idea of Network Pruning was originated in the 1990s which was later popularized in 2015.
How do you “prune” a neural network?
We can summarize the process of pruning into 4 major steps:
 Train the Network
 Remove superflous structures
 Finetune the network
 Optionally : Repeat the Step 2 and 3 iteratively
But, before we further move ahead, you must know :
 Usually, pruning is done after a neural network is trained on data.
 The superfluous structures can be Weights, Neurons, Filters, Channels . However, here we consider “sparse pruning” which means pruning “weights”.
 A heuristic is needed to define whether a structure is superfluous or not. These heuristics are Magnitudes, Gradients, or Activations. Here, we chose magnitudes. We prune the weights with the lowest magnitudes.
 By removing parts out of neural network, we somewhat have damaged the activation function. Hence, we train the model a bit more. This is known as finetuning.
 Randomly intialize the full network
 Train it and prune superflous structure
 Reset each remaining weight to its value after Step 1.
 A fullyconnected neural network like MNIST having more than 600K parameters supposedly is reduced to a subnet of 21K parameters having the same accuracy as the original network
 Retention of the the original features – Dropout, weight decay, batchnorm, resnet, your favourite optimizer etc.
 Subnetworks are found retroactively
 Finding subnetworks is very expensive
 Small, vision networks and tasks
 Understanding Hypothesis Testing
 ML  Understanding Hypothesis
 Construct two Nlength arrays with sameindexed elements as coprime and a difference of N in their sum
 Queries to count minimum flips required to fill a binary submatrix with 0s only
 Count ways to split array into two equal sum subarrays by changing sign of any one array element
 Find a pair of intersecting ranges from a given array
 Bitwise AND of all unordered pairs from a given array
 Subsequences generated by including characters or ASCII value of characters of given string
 Calculate Bitwise OR of two integers from their given Bitwise AND and Bitwise XOR values
 Count numbers from given range having odd digits at odd places and even digits at even places
 Split array into equal length subsets with maximum sum of Kth largest element of each subset
 Split squares of first N natural numbers into two sets with minimum absolute difference of their sums
 Modify sequence of first N natural numbers to a given array by replacing pairs with their GCD
 Minimum subarray reversals required to make given binary array alternating

If the steps are correctly followed, we can compress the parameters of neural networks like LeNet300100 and AlexNet by a compression rate of 9x to 12x without losing any accuracy.
Can’t we randomly initialize a pruned network and train to convergence?
Many researchers have pondered over this conclusion. However, all of them came with the same answer – No.
It turns out that Training a pruned model from scratch performs worse than retraining a pruned model, which may indicate the difficulty of training a network with small capacity .
However, this is no longer the case. The research conducted by MITIBM shows that we can indeed train pruned networks from scratch. Also, there exist no need for networks to be overparameterized to learn. Weights pruned after training could have been pruned before training however, you need to use the same intializations.
How to train pruned networks ?
This basically suggests that “There exists a subnetwork that exists inside a randomlyinitialized deep neural network which when trained in isolation can match or even outperform the accuracy of the original network.
Advantages of Trained Pruned Networks
Further Scope of Research
Link to the research paper: The lottery ticket hypothesis: Finding sparse, trainable neural networks
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a studentfriendly price and become industry ready.
Recommended Posts:
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.
Improved By : Akanksha_Rai