Overview of Dynamic Partition in Hive
In this article, we will discuss the overview of Hive and will focus on Dynamic Partitioning, and finally, we will conclude with the operations on Dynamic Partition in Hive. Let’s discuss it one by one.
Hive is a storage system which is developed by Facebook, it is basically a data warehouse system that performs operations on structured data. It is mainly used to perform operations like analytics and MapReduce jobs.it also provides the functionality of reading, writing, and managing large datasets. Hive is mainly popular because it can serve as an alternative to the traditional approach of database operations. Hive uses indexing which helps internally the queries to be efficient, it can also operate on compressed data which is stored in the ecosystem of Hadoop.
Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.
Types of partitioning :
There are two types of partitioning are as follows.
- Static Partitioning
- Dynamic Partitioning
Dynamic Partitioning :
Dynamic partitioning is the strategic approach to load the data from the non-partitioned table where the single insert to the partition table is called a dynamic partition. In dynamic partitioning, the values of the partitioned tabled are existed by default so there is no need to pass the value for those columns manually.
Features of Dynamic Partitioning :
Some of the features of Dynamic Partitioning are as follows.
- To handle large amounts of data that are stored in distributed storage that cases dynamic partitioning is the best choice.
- Dynamic partitioning can be performed on the hive external table and managed table.
- In Dynamic partitioning, there is no requirement of the where clause.
- If you want to perform partition on the tables without knowing the number of columns in that case you can use Dynamic partitioning.
Operations of Dynamic Partitioning :
Now let’s perform some of the operations of Dynamic Partitioning You can prefer ubuntu which would be fast for the operations are as follows.
First of all, we need to create a database in which you want to perform the operation of the creation of a table.
hive>Create database dynamic_Demo; hive>use dynamic_demo //here we have selected the above created database.
After selection of database from the available list. Now we will enable the dynamic partition using the following commands are as follows.
hive> set hive.exec.dynamic.partition=true; hive> set hive.exec.dynamic.partition.mode=nonstrict;
Create any table with a suitable table name to store the data.
hive>CREATE TABLE food_prices ( series_reference string, Period string, data_value int, status string, units string, subject string, product string, series_title string, year string )ROW FORMAT DELIMITED fields terminated by ',' TBLPROPERTIES ("skip.header.line.count"="1");
Now at the next step let’s load the data which we have created by the following command.
hive>load data local inpath '/home/bigdata/Downloads/food_prices.csv' into table food_prices;
To create a partitioned table we have to follow the below command.
create table partitioned_food_prices ( series_reference string, Period string, data_value int, status string, units string, subject string, product string, series_title string )partitioned by (year string);
Now we will load the data into the partitioned table using the following command.
hive> insert overwrite table partitioned_food_prices partition(year) select * from default.food_prices;
let’s perform some query operations.
hive>select * partitioned_food_prices limit 5
The final step is to delete the dynamic partition column.
hive>alter table partioned_food_prices drop partition (year = '2019')