Open In App

Google File System

Last Updated : 30 Mar, 2023
Like Article

Google Inc. developed the Google File System (GFS), a scalable distributed file system (DFS), to meet the company’s growing data processing needs. GFS offers fault tolerance, dependability, scalability, availability, and performance to big networks and connected nodes. GFS is made up of a number of storage systems constructed from inexpensive commodity hardware parts. The search engine, which creates enormous volumes of data that must be kept, is only one example of how it is customized to meet Google’s various data use and storage requirements.

The Google File System reduced hardware flaws while gains of commercially available servers.

GoogleFS is another name for GFS. It manages two types of data namely File metadata and File Data.

The GFS node cluster consists of a single master and several chunk servers that various client systems regularly access. On local discs, chunk servers keep data in the form of Linux files. Large (64 MB) pieces of the stored data are split up and replicated at least three times around the network. Reduced network overhead results from the greater chunk size.

Without hindering applications, GFS is made to meet Google’s huge cluster requirements. Hierarchical directories with path names are used to store files. The master is in charge of managing metadata, including namespace, access control, and mapping data. The master communicates with each chunk server by timed heartbeat messages and keeps track of its status updates.

More than 1,000 nodes with 300 TB of disc storage capacity make up the largest GFS clusters. This is available for constant access by hundreds of clients.

gfs master


Components of GFS

A group of computers makes up GFS. A cluster is just a group of connected computers. There could be hundreds or even thousands of computers in each cluster. There are three basic entities included in any GFS cluster as follows:

  • GFS Clients: They can be computer programs or applications which may be used to request files. Requests may be made to access and modify already-existing files or add new files to the system.
  • GFS Master Server: It serves as the cluster’s coordinator. It preserves a record of the cluster’s actions in an operation log. Additionally, it keeps track of the data that describes chunks, or metadata. The chunks’ place in the overall file and which files they belong to are indicated by the metadata to the master server.
  • GFS Chunk Servers: They are the GFS’s workhorses. They keep 64 MB-sized file chunks. The master server does not receive any chunks from the chunk servers. Instead, they directly deliver the client the desired chunks. The GFS makes numerous copies of each chunk and stores them on various chunk servers in order to assure stability; the default is three copies. Every replica is referred to as one.

Features of GFS

  • Namespace management and locking.
  • Fault tolerance.
  • Reduced client and master interaction because of large chunk server size.
  • High availability.
  • Critical data replication.
  • Automatic and efficient data recovery.
  • High aggregate throughput.

Advantages of GFS

  1. High accessibility Data is still accessible even if a few nodes fail. (replication) Component failures are more common than not, as the saying goes.
  2. Excessive throughput. many nodes operating concurrently.
  3. Dependable storing. Data that has been corrupted can be found and duplicated.

Disadvantages of GFS

  1. Not the best fit for small files.
  2. Master may act as a bottleneck.
  3. unable to type at random.
  4. Suitable for procedures or data that are written once and only read (appended) later.

Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads