What is Big Data?

Data science is the study of data analyzing by advance technology (Machine Learning, Artificial Intelligence, Big data). It processes a huge amount of structured, semi-structured, unstructured data to extract insight meaning, from which one pattern can be designed that will be useful to take a decision for grabbing the new business opportunity, the betterment of product/service, ultimately business growth.
Data science process to make sense of Big data/huge amount of data that is used in business. The workflow of Data science is as below:

  • Objective and the issue of business determining – What is organization objective, what level organization want to achieve at, what issue company is facing -these are the factors under consideration. Based on such factors which type of data are relevant is considered.
  • Collection of relevant data- relevant data are collected from various source.
  • Cleaning and filtering collected data – non-relevant data are removed.
  • Explore the filtered, cleaned data – Finding any hidden pattern, synchronization in data, plotting them in the graph, chart, etc. form that is understandable to non-technical person.
  • Creating a model by analyzing data – creating a model, validate it.
  • Visualization of finding by interpreting data or created a model to a business person.
  • Help businessperson in making the decision and taking the step for the sack of business growth.

Data Mining: It is a process of extracting insight meaning, hidden pattern from collected data that is useful to take a business decision in the purpose of decreasing expenditure and increasing revenue.

Big Data: This is a term related to extracting meaningful data by analyzing the huge amount of complex, variously formatted data generated at high speed, that cannot be handled, processed by the traditional system.



Data Expansion Day by Day: Day by day amount of data increasing exponentially because of today’s various data production sources like a smart electronic device. As per IDC (International Data Corporation) report, new data created per each person in the world per second by 2020 will be 1.7 MB. The amount of total data in the world by 2020 will reach around 44 ZettaBytes (44 trillion GigaByte) and 175 ZettaBytes by 2025. It is being seen that total volume of data being double every two years. Total size growth of data worldwide, year to year as per IDC report is shown below:

Image Source: Google

Source of Big Data:

  • Social Media: Today’s world a good percent of the total world population is engaged with social media like Facebook, WhatsApp, Twitter, YouTube, Instagram, etc. Each activity on such media like uploading a photo, video, sending the message, making comment, putting like, etc create data.
  • Sensor placed on the various place: Sensor placed in various place of the city that gathers data on temperature, humidity, etc. A camera placed beside road gather information about traffic condition, creates data. Security camera placed in a sensitive area like airport, railway station, shopping mall create a lot of data.
  • Customer feedback on the product or service of the various company on their website creates data. For Example, a retail commercial site like Amazon, Walmart, Flipkart, Myntra gather customer feedback on the quality of their product, delivery time. Telecom company, other service provider organization seek customer experience with their service. These create a lot of data.
  • IoT Appliance: Electronic devices that are connected to the internet create data for their smart functionality, examples are a smart TV, smart washing machine, smart coffee machine, smart AC, etc. It is machine-generated data that are created by sensor kept in various devices.
    For Example, Smart printing machine – it is connected to the internet. A number of such printing machines connected to a network can transfer data within each other. So, if anyone loads a file copy in one printing machine, system store that file content, another printing machine kept in another building or another floor can print out that file hard copy. Such data transfer between various printing machines generates data.
  • In an e-commerce transaction, business transaction, banking, and the stock market, lots of records stored considered as one of the sources of big data. Payment through credit card, debit card or by another electronic way, all these are kept recorded as data.
  • GPS in the vehicle that helps in monitoring movement of the vehicle to shorten the path for a destination to cut fuel, time consumption. This system creates huge data of vehicle position and movement.


My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.




Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.