MapReduce Program – Weather Data Analysis For Analyzing Hot And Cold Days
Here, we will write a Map-Reduce program for analyzing weather datasets to understand its data processing programming model. Weather sensors are collecting weather information across the globe in a large volume of log data. This weather data is semi-structured and record-oriented.
This data is stored in a line-oriented ASCII format, where each row represents a single record. Each row has lots of fields like longitude, latitude, daily max-min temperature, daily average temperature, etc. for easiness, we will focus on the main element, i.e. temperature. We will use the data from the National Centres for Environmental Information(NCEI). It has a massive amount of historical weather data that we can use for our data analysis.
Analyzing weather data of Fairbanks, Alaska to find cold and hot days using MapReduce Hadoop.
We can download the dataset from this Link, For various cities in different years. choose the year of your choice and select any one of the data text-file for analyzing. In my case, I have selected CRND0103-2020-AK_Fairbanks_11_NE.txt dataset for analysis of hot and cold days in Fairbanks, Alaska.
We can get information about data from README.txt file available on the NCEI website.
Below is the example of our dataset where column 6 and column 7 is showing Maximum and Minimum temperature, respectively.
Make a project in Eclipse with below steps:
- First Open Eclipse -> then select File -> New -> Java Project ->Name it MyProject -> then select use an execution environment -> choose JavaSE-1.8 then next -> Finish.
- In this Project Create Java class with name MyMaxMin -> then click Finish
- Copy the below source code to this MyMaxMin java class
- Now we need to add external jar for the packages that we have import. Download the jar package Hadoop Common and Hadoop MapReduce Core according to your Hadoop version.
You can check Hadoop Version:
- Now we add these external jars to our MyProject. Right Click on MyProject -> then select Build Path-> Click on Configure Build Path and select Add External jars…. and add jars from it’s download location then click -> Apply and Close.
- Now export the project as jar file. Right-click on MyProject choose Export.. and go to Java -> JAR file click -> Next and choose your export destination then click -> Next.
choose Main Class as MyMaxMin by clicking -> Browse and then click -> Finish -> Ok.
Start our Hadoop Daemons
Move your dataset to the Hadoop HDFS.
hdfs dfs -put /file_path /destination
In below command / shows the root directory of our HDFS.
hdfs dfs -put /home/dikshant/Downloads/CRND0103-2020-AK_Fairbanks_11_NE.txt /
Check the file sent to our HDFS.
hdfs dfs -ls /
Now Run your Jar File with below command and produce the output in MyOutput File.
hadoop jar /jar_file_location /dataset_location_in_HDFS /output-file_name
hadoop jar /home/dikshant/Documents/Project.jar /CRND0103-2020-AK_Fairbanks_11_NE.txt /MyOutput
Now Move to localhost:50070/, under utilities select Browse the file system and download part-r-00000 in /MyOutput directory to see result.
See the result in the Downloaded File.
In the above image, you can see the top 10 results showing the cold days. The second column is a day in yyyy/mm/dd format. For Example, 20200101 means
year = 2020 month = 01 Date = 01