Web Scraping is a technique used to extract large amounts of data from websites extracted and saved to a local file in your computer or to a database or can be used as API. Data displayed by most websites can be viewed by using a web browser only. They do not offer the functionality to save a copy of this data for use. Thus the only option is to copy and paste the selected data that is required, which in reality, is a very tedious job and may take hours complete. In other terms Web Scraping is the technique of automating such a process, in place of manual work, the Web Scraping software performs the same task within seconds. The web scraping can be done by targeting the selected DOM components and then processing or storing the text between that DOM element of a web page. To do the same in PHP, there is an API which parses the whole page and looks for the required elements within the DOM. It is the Simple HTML DOM Parser. To know more about the Web Scraping visit this article.
One can download it by clicking this link.
Example 1: The below-given example shows the use of this API, to display a google search on the localhost.
- HTML Code:
<!DOCTYPE html>
<
html
lang
=
"en"
>
<
head
>
<
meta
charset
=
"UTF-8"
>
<
meta
name
=
"viewport"
content
=
"width=device-width, initial-scale=1.0"
>
<
meta
http-equiv
=
"X-UA-Compatible"
content
=
"ie=edge"
>
<
title
>Document</
title
>
</
head
>
<
body
>
<
form
action
=
"GoogleSearch.php"
method
=
"POST"
>
<
input
type
=
"text"
name
=
"search"
>
<
br
><
br
>
<
button
>
Search
</
button
>
</
form
>
</
body
>
</
html
>
chevron_rightfilter_none - PHP code:
<?php
// In case the File is in the API directory
include
(
'simple_html_dom.php'
);
// Extracting DOM
$html
= file_get_html(
// Displaying DOM
echo
$html
;
?>
chevron_rightfilter_none - PHP code: This code will work on if you have already search for anything on Google Search engine.
<?php
include
(
'simple_html_dom.php'
);
$html
= file_get_html(
foreach
(
$html
->find(
'div.kCrYT'
)
as
$elements
) {
echo
$elements
->plaintext;
break
;
}
?>
chevron_rightfilter_none - Output:
GeeksforGeeks is a very fast-growing community among programmers and have a reach of around 10 million+ readers globally. Writing will surely enhance your knowledge of the subject as before writing any topic, you need to be very crisp and clear about it.
Output: The output on local server is
Example 2: Here we will try to access the text on the first search result of google. For this we first fetch the DOM Component having the first result to a query asked to the google. Here we fetch the span tag having class ‘kCrYT’ from the DOM, which have the list of details for all searched, but we need the first one only, so loop iterates once only.