Collaborative Learning – Federated Learning

Last Updated : 20 Mar, 2024

The field of data science has seen significant developments in the past five years. Traditional Machine Learning training relied on large datasets which were stored in centralized locations like data centers, and the goal was to get accurate predictions and generate insights that will profit us in the end. But, this approach came with challenges like data storage issues, privacy concerns, and processing.

Recently, there has been a key development of the concept of federated learning, which is providing some groundbreaking solutions. This concept was coined by Google AI through its blog post. The title of the blog post was “Federated Learning: Collaborative Machine Learning without Centralized Training Data”.

In this article, we will simplify this term and understand what exactly is Federated Learning in simple terms, and its types, and also will see a real-life application where this is actually present in the backend. We will also try to skim through some of the benefits of the same.

Table of Content

What is Federated Learning?
Types of Federated Learning
How Federated Learning work?
Real-Life Application of Federated Learning
Advantages of Federated Learning
Disadvantages of Federated Learning

What is Federated Learning?

Federated Learning is a technique of training machine learning models on decentralized data, where the data is distributed across multiple devices or nodes, such as smartphones, IoT devices, edge devices, etc. Instead of centralizing the data and training the model in a single location, in Federated Learning, the model is trained locally on each device and the updates are then aggregated and shared with a central server.

Federated Learning in simple terms

To state a technical definition, I would say federated learning is to help learn a shared prediction model while maintaining all the training data on the device (mobile phone here specifically). This concept is purely based on Machine Learning. To be more specific it caters to mobile devices. We know that to perform modeling through a machine learning algorithm we need a sufficient amount of data to get the right prediction accuracy and then to implement it into production. Federated learning is a concept that eliminates the issue of storage of large training data. Data is stored at multiple locations. These locations are nothing but our devices.

Federated Learning

Types of Federated Learning

There are various strategies that are used for Federated Learning. Let’s take a brief look at them.

Centralized Federated Learning: In this, a central server is used to perform different steps of the algorithm. The central system is subjected to selecting the nodes at the beginning of the training process and then it is also responsible for aggregating the model updates that we received from different nodes/devices. Here, all the selected nodes, send the updates to this central server and hence it is the bottleneck of the system. This method can cause bottleneck problems.
Decentralized Federated Learning: In this, the nodes themselves can coordinate to get the updated model. This approach can help in preventing the single server problems, that we can get from the centralized federated learning, as in this the model updates are shared between the interconnected nodes without the need of the central system. Here, the model’s performance is totally dependent on what network topology we opt for.
Heterogeneous Federated Learning: This learning involves a large no of heterogenous clients e.g., mobile devices, and IoT devices. These devices can differ in software or hardware configurations. Recently, a Federated learning framework called HeteroFL has emerged, specifically designed to tackle the challenges posed by heterogeneous clients with varying computation and communication capabilities.

How Federated Learning work?

Let us understand federated learning in a more detailed manner, i.e. the steps. The base model is stored at the central server, and a copy of this model is stored on all devices. Whenever the user enters some information, the following step takes place:

Step 1: The particular device will download the current model.
Step 2: The model would make improvements from the new data that we got from the device.
Step 3: The model changes are summarized as an update and communicated to the cloud. This communication is encrypted.
Step 4: On the cloud, there are many updates coming in from multiple users. These all updates are aggregated and the final model is built.

So, there is no huge amount of data being uploaded to the cloud and also the model is trained with the different data. In this process, the trained data resides within your own smartphone/mobile device.

Real-Life Application of Federated Learning

So everyone is familiar with the Google Keyboard. How does it give us accurate suggestions when we are searching? So, let us take an example:
You want to eat a pizza and you are searching for a good restaurant. You know 2-3 restaurants. So, when you search for those, their names are captured and the history is saved in the backend. Federated Learning comes into the role at this point. It takes up your data which is present in history. After that, the above-mentioned steps are followed in chronological order.

Advantages of Federated Learning

Less Power Consumption – As the data is not that huge, the time for computation that is required for training the model will be less and hence the power consumption will also be less.
Ensures Privacy – The data that is used to update the model, i.e. the data we got from the user’s history remains on the device only, and hence, the privacy problem that we got when the data was huge is solved, with no interference in training.
Doesn’t Interfere with the device performance – The model on the device is only trained when the device is plugged in or not in use, and hence there is no effect on the device performance. So, when the device is idle this federated learning system is in motion. It is the best time since there is minimal interaction of the user with the device and also the performance of the phone is not affected.
Scalability: Federated Learning can handle large-scale datasets that are distributed across multiple devices.
Improved model performance: Federated Learning can improve the performance of models by leveraging the diversity of the data across different devices.
Real-time updates: Federated Learning allows for real-time updates of the model, as the updates are performed locally on each device.

Disadvantages of Federated Learning

Network Latency: The communication between the devices and the central server can be a bottleneck and may add latency to the training process.
Heterogeneous devices: The devices can be heterogeneous in terms of hardware and software, which can make it difficult to ensure the compatibility and consistency of the models.
Data Quality: The quality of data can vary across the devices, which can lead to poor model performance.

Conclusion

Federated Learning is not a one-stop solution to all the existing machine learning problems. This method put forth into the Machine Learning community has brought a different kind of revolution in Data Science. Ongoing research continues to refine this method and its applications in the field of data science.

Suggest improvement

General steps to follow in a Machine Learning Problem

Generate and Test Search

Share your thoughts in the comments