Machine Learning has improved our lives significantly. Right from the intelligent chatbots to autonomous cars. The main ingredient which improves these models to perform beyond expectation is data. With the digitization and increased popularity of IoT, more and more people have devices that are generating immense amounts of quality data. This can be used to improve the models and provide better results or user experience.
Now the main hindrance to this idea is the sensitive nature of the data produced by these devices. The data on the client’s devices is personal and sensitive. If the data is compromised then this can be used to influence public decisions and opinions. So it is important to use this data without storing it in any central server. This is the exact shortcoming that is handled by Federated Learning.
What is Federated Learning?
It is a distributed Machine Learning technique. It basically enables machine learning engineers and data scientists to work productively with decentralized data with privacy by default.
The data is present on the end nodes only. The models are trained on them and only the updated parameters are sent to the central server. These updates are then aggregated on the main server to produce the required Machine Learning model.
Federated Averaging algorithm can be used to train the main model. the steps are as follow:
- Select k clients from the pool
- Send the initial parameters θt to these clients
- A client receives this θt from the server
- Run some iterations of SGD (Stochastic Gradient Descent) to produce updated parameter θ’
- Return θ’ – θt to the server
- θt+1 = θt + data-weighted average of the client updates
Cons of Federated Learning
As we already know, to get a properly trained model, thousands of iterations of training is required. Now each of these iterations will require sending the original parameters to thousands, or maybe millions of edge nodes, and then allowing them to train. Then after the local training, updates are sent from these millions of devices to the central server.
Now to get a good model, this process has to be repeated multiple times. All this causes a huge communication overhead. Even though the size of the communication message might be small, but the frequency is high. This communication overhead prevents it from being more viable. So a need for a better approach, that can reduce the number of communications between the server and the nodes, is strongly felt.
What is Fusion Learning?
It is a state of the art Distributed ML technique. In Fusion Learning, we try to bring down the number of communication of an edge node with the central server to one. So instead of having multiple training iterations, we send the generative information to the central server.
Now once the central server has this generative information, it can generate some sample data. Now in most of the cases, these data work as if they are the original data. And training on these data results in a very similar model, had these been trained on the original client data.
Different generative approaches can be used. Now, what exactly are this generative information that is being sent. Let’s see some of the ways to do so.
Ways to send Generative Information
1. Distribution Parameters
There are tons of distributions out there. Now the data can be approximated to be coming out from one of these distributions. Once we’ve identified the distribution, we can send back the distribution parameters to the central server. Using these parameters, the main server is able to pick out random points from the distribution and use this data to train.
For the purpose of finding the distribution from which the data can be approximated to be coming from, we can have a list of the most common distributions. Some of them are uninorm, norm, lognorm, gamma, beta, pareto, weibull etc. These can be used to approximate almost any type of data points. Now Kolmogorov-Smirnov test or the KS test can be done to assign scores to each of the distribution. The distribution with the max score can be taken as the final distribution.
2. GAN Generators
GANs or the Generative Adversarial Networks are more recent developments in the machine learning domain done by Ian Goodfellow and his colleagues. In this algorithmic architecture there are two neural networks, fighting against each other, thus called ‘adversarial’. They do so in order to generate new and synthetic instances of data that can pass for being the real one. One of the rather popular use was generating real human faces that do not belong to anyone in the world !!
GANs have generators and discriminators. Generators try to learn the underlying patterns in the data and produce accurate samples of them. Discriminators try to figure out whether a given sample can be coming from the original data set or not. As the GANs train more and more, generators produce more accurate data points and discriminators become more and more efficient at finding out the fake data points. The convergence happens when the generator is successfully able to fool the discriminator.
Once the GAN is done training on the edge node, the generator can be sent to the main server. Sever can then generate accurate sample data from these generators. Once the server has data, it can go ahead with the training iterations. Since these data resemble the client’s data to a very high degree, the trained models can be expected to perform exceptionally well on the client’s data.
This drastic reduction in the communication overhead goes a long way in decreasing the overall training time. Apart from reducing time Fusion Learning has one more plus point. Even if one of the end nodes is not reachable at a given point, and we want to train a new model, we can very well pull out the generative information sent by it the last time. This can then be used to train more and more models.