Hadoop is an open-source software framework written in Java along with some shell scripting and C code for performing computation over very large data. Hadoop is utilized for batch/offline processing over the network of so many machines forming a physical cluster. The framework works in such a manner that it is capable enough to provide distributed storage and processing over the same cluster. It is designed to work on cheaper systems commonly known as commodity hardware where each system offers its local storage and computation power.
Hadoop is capable of running various file systems and HDFS is just one single implementation that out of all those file systems. The Hadoop has a variety of file systems that can be implemented concretely. The Java abstract class org.apache.hadoop.fs.FileSystem represents a file system in Hadoop.
Java implementation (all under org.apache.hadoop)
|Local||file||fs.LocalFileSystem||The Hadoop Local filesystem is used for a locally connected disk with client-side checksumming. The local filesystem uses RawLocalFileSystem with no checksums.|
|HDFS||hdfs||hdfs.DistributedFileSystem||HDFS stands for Hadoop Distributed File System and it is drafted for working with MapReduce efficiently.|
The HFTP filesystem provides read-only access to HDFS over HTTP. There is no connection of HFTP with FTP.
This filesystem is commonly used with distcp to share data between HDFS clusters possessing different versions.
|HSFTP||hsftp||hdfs.HsftpFileSystem||The HSFTP filesystem provides read-only access to HDFS over HTTPS. This file system also does not have any connection with FTP.|
|HAR||har||fs.HarFileSystem||The HAR file system is mainly used to reduce the memory usage of NameNode by registering files in Hadoop HDFS. This file system is layered on some other file system for archiving purposes.|
|KFS (Cloud-Store)||kfs||fs.kfs.KosmosFileSystem||cloud store or KFS(KosmosFileSystem) is a file system that is written in c++. It is very much similar to a distributed file system like HDFS and GFS(Google File System).|
|FTP||ftp||fs.ftp.FTPFileSystem||The FTP filesystem is supported by the FTP server.|
|S3 (native)||s3n||fs.s3native.NativeS3FileSystem||This file system is backed by AmazonS3.|
|S3 (block-based)||s3||fs.s3.S3FileSystem||S3 (block-based) file system which is supported by Amazon s3 stores files in blocks(similar to HDFS) just to overcome S3’s file system 5 GB file size limit.|
Hadoop gives numerous interfaces to its various filesystems, and it for the most part utilizes the URI plan to pick the right filesystem example to speak with. You can use any of this filesystem for working with MapReduce while processing very large datasets but distributed file systems with data locality features are preferable like HDFS and KFS(KosmosFileSystem).
- Difference between Hadoop 1 and Hadoop 2
- Difference Between Hadoop 2.x vs Hadoop 3.x
- Hadoop - HDFS (Hadoop Distributed File System)
- Hadoop - Features of Hadoop Which Makes It Popular
- Introduction to Hadoop
- Hadoop - Introduction
- Introduction to Hadoop Distributed File System(HDFS)
- Hadoop | History or Evolution
- Hadoop YARN Architecture
- Hadoop Ecosystem
- Map Reduce in Hadoop
- Sum of even and odd numbers in MapReduce using Cloudera Distribution Hadoop(CDH)
- How to Execute WordCount Program in MapReduce using Cloudera Distribution Hadoop(CDH)
- Distributed Cache in Hadoop MapReduce
- Volunteer and Grid Computing | Hadoop
- RDMS vs Hadoop
- How Does Namenode Handles Datanode Failure in Hadoop Distributed File System?
- Difference Between Hadoop and Cassandra
- Difference Between Hadoop and Teradata
- Difference Between Cloud Computing and Hadoop
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.