Open In App

Spoofing IP address when web scraping using Python

In this article, we are going to scrap a website using Requests by rotating proxies in Python.

Modules Required

Syntax: 



requests.get(url, parameter) 

Approach

Syntax:



requests.get(url, proxies=proxies)

Apart from working with the code, there are few more set-ups that need to be done, and given below are the details of these setups.

Using Rapidapi to get a set of proxies: 

Syntax:

headers = {

       ‘x-rapidapi-key’: “paste_api_key_here”,

       ‘x-rapidapi-host’: “proxy-orbit1.p.rapidapi.com”

       }

Syntax:

response = requests.request(“GET”, url, headers=headers)

Syntax:

response = json.loads(response.text)

proxy = response[‘curl’]

Sending Proxy in requests.get() as parameter:

Sending a GET request using requests.get() along with a proxy to this url which will return the proxy server address of current session.

Syntax:

 # Note : Opening https://ipecho.net/plain in browser will show the current ip address of the session.

 proxies = ‘http://78.47.16.54:80’

 page = requests.get(‘https://ipecho.net/plain’, proxies={“http”: proxy, “https”: proxy})

 print(page.text)

Program:




import requests
import json
  
  
# Gets proxies from rapidapi to create
# a set of proxies.
# Use this function only if you have rapidapi key.
def create_proxy():
  
    # Initialise the headers and paste the API key
    # of proxy-orbit1 from rapidapi.
    headers = {
        'x-rapidapi-key': "paste_api_key_here",
        'x-rapidapi-host': "proxy-orbit1.p.rapidapi.com"
    }
  
    # Sends a GET request to the above url along with api
    # keys which returns an object containing data in json
    # format which is then parsed using json.loads.
    response = requests.request("GET", url, headers=headers)
    response = json.loads(response.text)
  
    # The proxy server ip address is present in 'curl' key.
    proxy = response['curl']
    return proxy
  
  
# Main Function
if __name__ == "__main__":
  
    # Create an empty set and call the create_proxy()
    # function to generate a set of proxies from rapidapi.
    # Orbit proxy Rapid api key is required.
    proxies = set()
    print("Creating Proxy List")
    for __ in range(10):
        proxies.add(create_proxy())
  
    # If you do not have rapidapi then create a set of
    # proxies manually.
    # proxies = {'http://78.47.16.54:80',
  
    # Iterate the proxies and check if it is working.
    for proxy in proxies:
        print("\nChecking proxy:", proxy)
        try:
  
            # https://ipecho.net/plain returns the ip address
            # of the current session if a GET request is sent.
            page = requests.get('https://ipecho.net/plain',
                                proxies={"http": proxy, "https": proxy})
            print("Status OK, Output:", page.text)
        except OSError as e:
  
            # Proxy returns Connection error
            print(e)

Output:


Article Tags :