Pagination using Scrapy – Web Scrapping with Python

Pagination using Scrapy. Web scraping is a technique to fetch information from websites .Scrapy is used as a python framework for web scrapping. Getting data from a normal website is easier, and can be just achieved by just pulling HTMl of website and fetching data by filtering tags. But what in case when there is pagination in the data you are trying to fetch, For example – Amazon’s products can have multiple pages and to scrap all products successfully, one would need concept of pagination.

Pagination: Pagination, also known as paging, is the process of dividing a document into discrete pages, that means bundle of data on different page. These different pages have their own url. So we need to take these url one by one and scrape these pages. But to keep in mind is when to stop pagination. Generally pages have next button, this next button is able and it get disable when pages are finished. This method is used to get url of pages till the next page button is able and when it get disable no page is left for scraping.

Project to apply pagination using scrapy

Scraping mobile details from amazon site and applying pagination in the following below project.
The scraped details involves name and price of mobiles and pagination to scrape all the result for the following searched url

Logic behind pagination:
Here next_page variable gets url of next page only if next page is available but if no page is left then, this if condition get false.

filter_none

edit
close

play_arrow

link
brightness_4
code

next_page = response.xpath("//div/div/ul/li[@class='alast']/a/@href").get()
if next_page:
    abs_url = f"https://www.amazon.in{next_page}"
yield scrapy.Request(
    url=abs_url,
    callback=self.parse
)

chevron_right


Note:



abs_url = f"https://www.amazon.in{next_page}"

Here need to take https://www.amazon.in is because next_page is /page2. That is incomplete and the complete url is https://www.amazon.in/page2

  • Fetch xpath of details need to be scraped –
    Follow below steps to get xpath –
    xpath of items:

    xpath of name:

    xpath of price:

    xpath of next page:

  • Spider Code: Scraping name and price from amazon site and applying pagination in the below code.

    filter_none

    edit
    close

    play_arrow

    link
    brightness_4
    code

    import scrapy
      
    class MobilesSpider(scrapy.Spider):
        name = 'mobiles'
        # create request object initially
        def start_requests(self):
            yield scrapy.Request(
                url ='https://www.amazon.in / s?k = xiome + mobile + phone&crid'\
                + '= 2AT2IRC7IKO1K&sprefix = xiome % 2Caps % 2C302&ref = nb_sb_ss_i_1_5',
                callback = self.parse
            )
      
        #  parse products
        def parse(self, response):
            products = response.xpath("//div[@class ='s-include-content-margin s-border-bottom s-latency-cf-section']")
            for product in products:
                yield {
                    'name': product.xpath(".//span[@class ='a-size-medium a-color-base a-text-normal']/text()").get(),
                    'price': product.xpath(".//span[@class ='a-price-whole']/text()").get()
                }
      
            print()
            print("Next page")
            print()
            next_page = response.xpath("//div / div / ul / li[@class ='a-last']/a/@href").get()
            if next_page:
                abs_url = f"https://www.amazon.in{next_page}"
                yield scrapy.Request(
                    url = abs_url,
                    callback = self.parse
                )
            else:
                print()
                print('No Page Left')
                print()

    chevron_right

    
    

Scraped Results:

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.