Category Archives: Hadoop

Eclipse is an IDE(Integrated Development Environment) that helps to create and build an application as per our requirement. And Hadoop is used for storing and… Read More
Hadoop Can be installed in two ways. The first is on a single node cluster and the second way is on a multiple node cluster.… Read More
Hive comes with various “One Shot” commands that a user can use through Hive CLI(Command Line Interface) without entering the Hive shell to execute one… Read More
To Perform setting up and installing Hadoop in the pseudo-distributed mode in Windows 10 using the following steps given below as follows. Let’s discuss one… Read More
We are going to create a database and create a table in our database. And will cover Database operations in HIVE Using CLOUDERA – VMWARE… Read More
Apache Pig is a data manipulation tool that is built over Hadoop’s MapReduce. Pig provides us with a scripting language for easier and faster data… Read More
Hadoop is an open-source framework that was introduced by the ASF — Apache Software Foundation. Hadoop is the most crucial framework for coping with Big… Read More
MapReduce is a model that works over Hadoop to access big data efficiently stored in HDFS (Hadoop Distributed File System). It is the core component… Read More
Hive can be used to manage structured data on the top of Hadoop. The data is stored in the form of a table inside a… Read More
MapReduce is a model that works over Hadoop to access big data efficiently stored in HDFS (Hadoop Distributed File System). It is the core component… Read More
Hive is a data warehouse solution built on top of Hadoop. In Hive data is managed at Hadoop Distributed file system (HDFS). In this schema,… Read More
It is a utility or feature that comes with a Hadoop distribution that allows developers or programmers to write the Map-Reduce program using different programming… Read More
HDFS is a distributed file system that stores data over a network of commodity machines. HDFS works on the streaming data access pattern means it… Read More
In the modern world, we are dealing with huge datasets every day. Data is growing even faster than processing speeds. To perform computations on such… Read More
mrjob is the famous python library for MapReduce developed by YELP. The library helps developers to write MapReduce code using a Python Programming language. Developers… Read More