Difference between Web Content, Web Structure, and Web Usage Mining

Web mining is an application of the Data Mining technique that is used to find information patterns from the web data. Web Mining helps to improve the power of web search engines by identifying the web pages and classifying web documents.

Types of Web Mining :

1. Web Content Mining –
Web Content Mining can be used for the mining of useful data, information, and knowledge from web page content. Web content mining performs scanning and mining of the text, images, and group of web pages according to the content of the input, by displaying the list in search engines.

There are two approaches that are used for Web Content Mining :

  • (i) Agent-based approach :
    This approach involves intelligent systems. It usually relies on autonomous agents, that can identify websites that are relevant.



  • (ii) Data-based approach :
    Data-Based approach is used to organize semi-structured data present on the internet into structured data.

2. Web Structure Mining –
Web Structure Mining can be used to discover link structure of hyperlinks. The purpose of Structure Mining is to produce the structural summary of websites and similar web pages. Interested in the structure of hyperlinks within the web. This type of mining is applied at the level of document and at hyperlink level. Web Structure Mining plays a very important role in the mining process.

3. Web Usage Mining –
Web Usage Mining is used for mining weblog records (access information of web pages). It helps to discover user access patterns of web pages. There are many available research projects and tools that analyze those patterns for different purposes. There are mainly four techniques of mining applied to web mining namely, Association Rule Mining, Sequential Pattern, Clustering, and Classification.



Difference Between Web Content, Web Structure, and Web Usage Mining :

Criterion Web Content Web Structure Web Usage
IR VIEW DB VIEW
View of data
  • Unstructured
  • Structured
  • Semi-structured
  • Website as DB
  • Link structure
  • Interactivity
Main data
  • Text documents
  • Hypertext documents
Hypertext documents Link structure
  • Server logs
  • Browser logs
Method
  • Machine Learning
  • Statistical (Including NLP)
  • Proprietary algorithm
  • Association rules
Proprietary algorithm
  • Machine learning
  • Statistical
  • Association Rules
Representation
  • Bag of words, n-gram terms
  • Phrases, concepts or ontology
  • Relational
  • Edged labeled graph
  • Relational
Graph
  • Relational Table
  • Graph
Application Categories
  • Categorization
  • Clustering
  • Finding Extract rules
  • Finding Patterns in text
  • Finding frequent sub structures
  • Web site schema discovery
  • Categorization
  • Clustering
  • Site construction
  • Adaptation and management

Attention reader! Don’t stop learning now. Get hold of all the important CS Theory concepts for SDE interviews with the CS Theory Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.