Skip to content
Related Articles

Related Articles

Improve Article
Save Article
Like Article

Pagination using Scrapy – Web Scraping with Python

  • Last Updated : 30 Sep, 2021

Pagination using Scrapy. Web scraping is a technique to fetch information from websites .Scrapy is used as a python framework for web scraping. Getting data from a normal website is easier, and can be just achieved by just pulling HTMl of website and fetching data by filtering tags. But what in case when there is pagination in the data you are trying to fetch, For example – Amazon’s products can have multiple pages and to scrap all products successfully, one would need concept of pagination.

Pagination: Pagination, also known as paging, is the process of dividing a document into discrete pages, that means bundle of data on different page. These different pages have their own url. So we need to take these url one by one and scrape these pages. But to keep in mind is when to stop pagination. Generally pages have next button, this next button is able and it get disable when pages are finished. This method is used to get url of pages till the next page button is able and when it get disable no page is left for scraping.

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course

Project to apply pagination using scrapy

Scraping mobile details from amazon site and applying pagination in the following below project.
The scraped details involves name and price of mobiles and pagination to scrape all the result for the following searched url



Logic behind pagination:
Here next_page variable gets url of next page only if next page is available but if no page is left then, this if condition get false.




next_page = response.xpath("//div/div/ul/li[@class='alast']/a/@href").get()
if next_page:
    abs_url = f"https://www.amazon.in{next_page}"
yield scrapy.Request(
    url=abs_url,
    callback=self.parse
)

Note:

abs_url = f"https://www.amazon.in{next_page}"

Here need to take https://www.amazon.in is because next_page is /page2. That is incomplete and the complete url is https://www.amazon.in/page2

  • Fetch xpath of details need to be scraped –
    Follow below steps to get xpath –
    xpath of items:

    xpath of name:

    xpath of price:

    xpath of next page:

  • Spider Code: Scraping name and price from amazon site and applying pagination in the below code.




    import scrapy
      
    class MobilesSpider(scrapy.Spider):
        name = 'mobiles'
        # create request object initially
        def start_requests(self):
            yield scrapy.Request(
                url ='https://www.amazon.in / s?k = xiome + mobile + phone&crid'\
                + '= 2AT2IRC7IKO1K&sprefix = xiome % 2Caps % 2C302&ref = nb_sb_ss_i_1_5',
                callback = self.parse
            )
      
        #  parse products
        def parse(self, response):
            products = response.xpath("//div[@class ='s-include-content-margin s-border-bottom s-latency-cf-section']")
            for product in products:
                yield {
                    'name': product.xpath(".//span[@class ='a-size-medium a-color-base a-text-normal']/text()").get(),
                    'price': product.xpath(".//span[@class ='a-price-whole']/text()").get()
                }
      
            print()
            print("Next page")
            print()
            next_page = response.xpath("//div / div / ul / li[@class ='a-last']/a/@href").get()
            if next_page:
                abs_url = f"https://www.amazon.in{next_page}"
                yield scrapy.Request(
                    url = abs_url,
                    callback = self.parse
                )
            else:
                print()
                print('No Page Left')
                print()

Scraped Results:




My Personal Notes arrow_drop_up
Recommended Articles
Page :