Open In App

Google File System

Google Inc. developed the Google File System (GFS), a scalable distributed file system (DFS), to meet the company’s growing data processing needs. GFS offers fault tolerance, dependability, scalability, availability, and performance to big networks and connected nodes. GFS is made up of a number of storage systems constructed from inexpensive commodity hardware parts. The search engine, which creates enormous volumes of data that must be kept, is only one example of how it is customized to meet Google’s various data use and storage requirements.

The Google File System reduced hardware flaws while gains of commercially available servers.



GoogleFS is another name for GFS. It manages two types of data namely File metadata and File Data.

The GFS node cluster consists of a single master and several chunk servers that various client systems regularly access. On local discs, chunk servers keep data in the form of Linux files. Large (64 MB) pieces of the stored data are split up and replicated at least three times around the network. Reduced network overhead results from the greater chunk size.



Without hindering applications, GFS is made to meet Google’s huge cluster requirements. Hierarchical directories with path names are used to store files. The master is in charge of managing metadata, including namespace, access control, and mapping data. The master communicates with each chunk server by timed heartbeat messages and keeps track of its status updates.

More than 1,000 nodes with 300 TB of disc storage capacity make up the largest GFS clusters. This is available for constant access by hundreds of clients.

 

Components of GFS

A group of computers makes up GFS. A cluster is just a group of connected computers. There could be hundreds or even thousands of computers in each cluster. There are three basic entities included in any GFS cluster as follows:

Features of GFS

Advantages of GFS

  1. High accessibility Data is still accessible even if a few nodes fail. (replication) Component failures are more common than not, as the saying goes.
  2. Excessive throughput. many nodes operating concurrently.
  3. Dependable storing. Data that has been corrupted can be found and duplicated.

Disadvantages of GFS

  1. Not the best fit for small files.
  2. Master may act as a bottleneck.
  3. unable to type at random.
  4. Suitable for procedures or data that are written once and only read (appended) later.
Article Tags :