Waybackurls – Fetch all the URLs that the Wayback Machine knows about for a domain

Last Updated : 23 Aug, 2021

Web crawling in security testing is an important aspect as this is the process of indexing data on web pages by using automated scripts or crawling programs. These scripts pr crawling programs are known as web crawler, spider, spider bot, and a crawler. Waybackurls is also a Golang based script or tool used for crawling domains on stdin, fetch known URLs from Wayback Machines, also known as Archives for *.targetdomain and output them stdout.

Note: As Waybackurls is a Golang language-based tool, so you need to have a Golang environment on your system. So check this link to install Golang in your system – How to Install Go Programming Language in Linux

Installation of Waybackurls Tool on Kali Linux Machine

Step 1: If you have downloaded Golang in your system, verify the installation by checking the version of Golang, use the following command.

go version

Step 2: Get the Waybackurls tool through the Go utility, use the following command.

sudo go get github.com/tomnomnom/waybackurls

Step 3: Check the help menu page for getting a better understanding of the tool, use the following command.

waybackurls -h

Working with Waybackurls Tool

Example 1: Simple Scan

waybackurls geeksforgeeks.org

Now as below picture we see we have input the command to collect all the possible waybackurls from our target which is geeksforgeeks.org, this tool will collect all the URLs and output them in the terminal itself.

Now as below picture we see we were successfully able to collect all the possible Wayback URLs from Our targeted Domain i.e. geekforgeeks.org. Almost every URL is collected by the WaybackURL tool.

Example 2 : Using –no-subs Tag

echo "geeksforgeeks.org" | waybackurls -no-subs

In this example, our target is geeksforgeeks.org and we have provided -no-subs tags, in this tag, URLs will be fetched only through the main domain. No subdomains are considered while crawling the URLs.

In the below screenshot, you can see waybackurls tool has fetched some URLs but the interesting thing to look at is that it has fetched the URLs related to only the main domain, no subdomains are considered while crawling.

Example 3: Using -dates Tag

echo "geeksforgeeks.org" | waybackurls -dates

In this example, our target is geeksforgeeks.org and we are using the -dates tag for getting the dates in the first column. It displays the date of the fetch of that particular URL.

In the below screenshot, you can see that we have got the dates in the first column which states the exact fetching date of the particular URL in wayback machine. For Example, https://www.geeksforgeeks.org/find-subarray-with-given-sum/ref=leftbar-rightbar , this link was fetched fetch on the date 2020-09-30, along with this time is also mentioned 22:51:11.

Example 4 : Using -get-versions Tag

echo "geeksforgeeks.org" | waybackurls -get-versions

In this example, we are re-fetching the URL which has helped us to get these results or crawled URLs. -get-versions tag is used for getting the URLs.

In the below screenshot, you can see that along with the crawled URLs of geeksforgeeks.org, we have got some extra URLs that specify the sources from which the geeksforgeeks.org URLs are crawled. Example. https://www.geeksforgeeks.org/ URL is fetched by the https://web.archive.org/web/20210715090226if_. This can help you to directly go to the source URL can explore more about the crawled URL of geeksforgeeks.org.