Open In App

Web Structure Mining

Last Updated : 30 Nov, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

Pre-requisites:  Web Mining

Web Structure Mining is one of the three different types of techniques in Web Mining. In this article, we will purely discuss about the Web Structure Mining. Web Structure Mining is the technique of  discovering structure information from the web. It uses graph theory to analyze the nodes and connections in the structure of a website.

Process of Web Structure Mining

 

 Depending upon the type of Web Structural data, Web Structure Mining can be categorised into two types:

1.Extracting patterns from the hyperlink in the Web: The Web works through a system of hyperlinks using the hyper text transfer protocol (http). Hyperlink is a structural component that connects the web page according to different location. Any page can create a hyperlink of any other page and that page can also be linked to some other page. the intertwined or self-referral nature of web lends itself to some unique network analytical algorithms. The structure of Web pages could also be analyzed to examine the pattern of hyperlinks among pages.

Web Graph

 

2. Mining the document structure. It is the analysis of tree like structure of web page to describe HTML or XML usage or the tags usage . There are different terms associated with Web Structure Mining :

  • Web Graph: Web Graph is the directed graph representing Web.
  • Node: Node represents the web page in the graph.
  • Edge(s): Edge represents the hyperlinks of the web page in the graph (Web graph)
  • In degree(s): It is the number of hyperlinks pointing to a particular node in the graph.
  • Degree(s): Degree is the number of links generated from a particular node. These are also called the Out Degrees.

All these terminologies will be more clear by looking at the following diagram of Web Graph:

Example of Web Structure Mining:

One of the techniques is the Page rank Algorithm that the Google uses to rank its web pages. The rank of a page is dependent on the number of pages and the quality of links pointing to the target node.

So, we can say that the Web Structure Mining is the type of Mining that can be performed either at the document level (intra-page) or at the hyperlink level (inter-page). The research done at the hyperlink level is called as Hyperlink Analysis. the Hyperlink Structure can be used to retrieve useful information on the Web.

Web structure Mining basically has two main approaches or there are two basic strategic models for successful websites:

  • Page rank : refer Page Rank
  • Hubs and Authorities 

Hubs And Attributes

  • Hubs: These are pages with large number of interesting links. They serve as a hub or a gathering point,  where people visit to access a variety of information. More focused sites can aspire to become a hub for the new emerging areas. The pages on website themselves could be analyzed for quality of content that attracts most users.
  • Authorities: People usually gravitate towards pages that provide the most complete and authentic information on a particular subject. This could be factual information, news, advice, etc. these websites would have the most number of inbound links from other websites.

Applications of Web Structure Mining:

  • Information retrieval in social networks.
  • To find out the relevance of each web page.
  • Measuring the completeness of Websites.
  • Used in Search engines to find the relevant information.

Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads