Skip to content

Category Archives: Hadoop

Partitioning in Apache Hive is very much needed to improve performance while scanning the Hive tables. It allows a user working on the hive to… Read More
Big Data deals with large data sets or deals with the complex that dealt with by traditional data processing application software. It has three key… Read More
Hive is a data warehousing tool that was built on top of Hadoop. Hive acts as an interface for the Hadoop ecosystem. It is a… Read More
Apache Spark is a lightning-fast unified analytics engine used for cluster computing for large data sets like BigData and Hadoop with the aim to run… Read More
Big Data is a collection of data that is growing exponentially, and it is huge in volume with a lot of complexity as it comes… Read More
Apache Spark is a unified analytics engine and it is used to process large scale data. Apache spark provides the functionality to connect with other… Read More
Big Data is a huge dataset that can have a high volume of data, velocity, and variety of data. For example, billions of users searching… Read More
Eclipse is an IDE(Integrated Development Environment) that helps to create and build an application as per our requirement. And Hadoop is used for storing and… Read More
Hadoop Can be installed in two ways. The first is on a single node cluster and the second way is on a multiple node cluster.… Read More
Hive comes with various “One Shot” commands that a user can use through Hive CLI(Command Line Interface) without entering the Hive shell to execute one… Read More
To Perform setting up and installing Hadoop in the pseudo-distributed mode in Windows 10 using the following steps given below as follows. Let’s discuss one… Read More
We are going to create a database and create a table in our database. And will cover Database operations in HIVE Using CLOUDERA – VMWARE… Read More
Apache Pig is a data manipulation tool that is built over Hadoop’s MapReduce. Pig provides us with a scripting language for easier and faster data… Read More
Hadoop is an open-source framework that was introduced by the ASF — Apache Software Foundation. Hadoop is the most crucial framework for coping with Big… Read More
MapReduce is a model that works over Hadoop to access big data efficiently stored in HDFS (Hadoop Distributed File System). It is the core component… Read More