Web Mining is the process of Data Mining techniques to automatically discover and extract information from Web documents and services. The main purpose of web mining is discovering useful information from the World-Wide Web and its usage patterns. Applications of Web Mining:
- Web mining helps to improve the power of web search engine by classifying the web documents and identifying the web pages.
- It is used for Web Searching e.g., Google, Yahoo etc and Vertical Searching e.g., FatLens, Become etc.
- Web mining is used to predict user behavior.
- Web mining is very useful of a particular Website and e-service e.g., landing page optimization.
Web mining can be broadly divided into three different types of techniques of mining: Web Content Mining, Web Structure Mining, and Web Usage Mining. These are explained as following below.
- Web Content Mining: Web content mining is the application of extracting useful information from the content of the web documents. Web content consist of several types of data – text, image, audio, video etc. Content data is the group of facts that a web page is designed. It can provide effective and interesting patterns about user needs. Text documents are related to text mining, machine learning and natural language processing. This mining is also known as text mining. This type of mining performs scanning and mining of the text, images and groups of web pages according to the content of the input.
- Web Structure Mining: Web structure mining is the application of discovering structure information from the web. The structure of the web graph consists of web pages as nodes, and hyperlinks as edges connecting related pages. Structure mining basically shows the structured summary of a particular website. It identifies relationship between web pages linked by information or direct link connection. To determine the connection between two commercial websites, Web structure mining can be very useful.
- Web Usage Mining: Web usage mining is the application of identifying or discovering interesting usage patterns from large data sets. And these patterns enable you to understand the user behaviors or something like that. In web usage mining, user access data on the web and collect data in form of logs. So, Web usage mining is also called log mining.
Comparison Between Data mining and Web mining:
|Points||Data Mining||Web Mining|
|Definition||Data Mining is the process that attempts to discover pattern and hidden knowledge in large data sets in any system.||Web Mining is the process of data mining techniques to automatically discover and extract information from web documents.|
|Application||Data Mining is very useful for web page analysis.||Web Mining is very useful for a particular website and e-service.|
|Target Users||Data scientist and data engineers.||Data scientists along with data analysts.|
|Access||Data Mining access data privately.||Web Mining access data publicly.|
|Structure||In Data Mining get the information from explicit structure.||In Web Mining get the information from structured, unstructured and semi-structured web pages.|
|Problem Type||Clustering, classification, regression, prediction, optimization and control.||Web content mining, Web structure mining.|
|Tools||It includes tools like machine learning algorithms.||Special tools for web mining are Scrapy, PageRank and Apache logs.|
|Skills||It includes approaches for data cleansing, machine learning algorithms. Statistics and probability.||It includes application level knowledge, data engineering with mathematical modules like statistics and probability.|