Web Scrapping in PHP Using Simple HTML DOM Parser

Web Scraping is a technique used to extract large amounts of data from websites extracted and saved to a local file in your computer or to a database or can be used as API. Data displayed by most websites can be viewed by using a web browser only. They do not offer the functionality to save a copy of this data for use. Thus the only option is to copy and paste the selected data that is required, which in reality, is a very tedious job and may take hours complete. In other terms Web Scraping is the technique of automating such a process, in place of manual work, the Web Scraping software performs the same task within seconds. The web scraping can be done by targeting the selected DOM components and then processing or storing the text between that DOM element of a web page. To do the same in PHP, there is an API which parses the whole page and looks for the required elements within the DOM. It is the Simple HTML DOM Parser. To know more about the Web Scraping visit this article.

One can download it by clicking this link.

Example 1: The below-given example shows the use of this API, to display a google search on the localhost.



  • HTML Code:
    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    <!DOCTYPE html>
    <html lang="en">
      
    <head>
        <meta charset="UTF-8">
          
        <meta name="viewport" content=
            "width=device-width, initial-scale=1.0">
          
        <meta http-equiv="X-UA-Compatible" content="ie=edge">
          
        <title>Document</title>
    </head>
      
    <body>
        <form action="GoogleSearch.php" method="POST">
            <input type="text" name="search">
              
            <br><br>
              
            <button>
                Search
            </button>
        </form>
    </body>
      
    </html>

    chevron_right

    
    

  • PHP code:
    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    <?php
      
    // In case the File is in the API directory 
    include('simple_html_dom.php');
       
    // Extracting DOM
    $html = file_get_html(
    'http://www.google.com/search?q='.$_POST["search"]);
      
    // Displaying DOM
    echo $html;
      
    ?>

    chevron_right

    
    

  • Output: The output on local server is

    Example 2: Here we will try to access the text on the first search result of google. For this we first fetch the DOM Component having the first result to a query asked to the google. Here we fetch the span tag having class ‘kCrYT’ from the DOM, which have the list of details for all searched, but we need the first one only, so loop iterates once only.

    • PHP code: This code will work on if you have already search for anything on Google Search engine.
      filter_none

      edit
      close

      play_arrow

      link
      brightness_4
      code

      <?php
        
      include('simple_html_dom.php');
        
      $html = file_get_html(
      'http://www.google.com/search?q='.$_POST["search"]);
        
      foreach($html->find('div.kCrYT') as $elements) {
          echo $elements->plaintext;
          break;
      }
      ?>

      chevron_right

      
      

    • Output:
      GeeksforGeeks is a very fast-growing community among programmers
      and have a reach of around 10 million+ readers globally. Writing will
      surely enhance your knowledge of the subject as before writing any
       topic, you need to be very crisp and clear about it.


    My Personal Notes arrow_drop_up

    Im a final year MCA student at Panjab University, Chandigarh, one of the most prestigious university of India I am skilled in various aspects related to Web Development and AI I have worked as a freelancer at upwork and thus have knowledge on various aspects related to NLP, image processing and web

    If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

    Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.