How to Extract Script and CSS Files from Web Pages in Python ?
Prerequisite:
In this article, we will discuss how to extract Script and CSS Files from Web Pages using Python.
For this, we will be downloading the CSS and JavaScript files that were attached to the source code of the website during its coding process. Firstly, the URL of the website needed to be scraped is determined and a request is sent to it. After retrieving Websites’ content two folders for two file types are created and the files are placed into them and then we can perform various operations on them according to our need.
Module Needed
- bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come in built-in with Python.
- requests: Requests allow you to send HTTP/1.1 requests extremely easily. This module also does not come in built-in with Python.
Example 1:
Here we are counting the number of fetched links for each respective type.
Python3
import requests
from bs4 import BeautifulSoup
html = requests.get(web_url).content
soup = BeautifulSoup(html, "html.parser" )
js_files = []
cs_files = []
for script in soup.find_all( "script" ):
if script.attrs.get( "src" ):
url = script.attrs.get( "src" )
js_files.append(web_url + url)
for css in soup.find_all( "link" ):
if css.attrs.get( "href" ):
_url = css.attrs.get( "href" )
cs_files.append(web_url + _url)
print (f "Total {len(js_files)} javascript files found" )
print (f "Total {len(cs_files)} CSS files found" )
|
Output:
Total 7 javascript files found
Total 14 CSS files found
We can also use file handling to import fetched links into the text files.
Example 2:
Python3
import requests
from bs4 import BeautifulSoup
html = requests.get(web_url).content
soup = BeautifulSoup(html, "html.parser" )
js_files = []
cs_files = []
for script in soup.find_all( "script" ):
if script.attrs.get( "src" ):
url = script.attrs.get( "src" )
js_files.append(web_url + url)
for css in soup.find_all( "link" ):
if css.attrs.get( "href" ):
_url = css.attrs.get( "href" )
cs_files.append(web_url + _url)
with open ( "javajavascript_files.txt" , "w" ) as f:
for js_file in js_files:
print (js_file, file = f)
with open ( "css_files.txt" , "w" ) as f:
for css_file in cs_files:
print (css_file, file = f)
|
Output:
Last Updated :
08 Sep, 2021
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...