Apache Hadoop: It is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model.
Apache Storm: It is a distributed stream processing computation framework written predominantly in the Clojure programming language. Originally created by Nathan Marz and the team at BackType, the project was open-sourced after being acquired by Twitter.
Below is a table of differences between Apache Hadoop and Apache Storm:
Features | Apache Hadoop | Apache Storm |
---|
Processing | Distributed batch processing which uses MapReduce | Distributed real-time data processing which uses DAGs |
Latency | High Latency i.e slow computation | Low Latency i.e fast computation |
Written Language | Whole frame work is written in Java | Frame work is written in Clojure and Java |
Streaming processing | It is State-full streaming processing | It is State-less streaming processing |
Setup | Easy to setup but operating cluster is hard | Easy to use |
Data streaming | Data is dynamic and continuously streamed | Data is static and nonvolatile i.e data is persistence |
Speed | Slow | Fast |
Use cases | It is used in Twitter, Navisite, Wego etc | It is used in Black Box Data, Search Engine Data etc |
Architecture | Hadoop comprises HDFS (used for data storage) and MapReduce (used for Computation) as architectural units. | Storm comprises streams, spouts, and bolts as their architectural units. |