Waybulk – Search A List Of Domains On The Wayback Machine

Last Updated : 17 Apr, 2023

Approaching the target is an essential aspect of the phase of reconnaissance. Gathering more information about the target domain can ease the attack strategy to be used. Gathering the Archive data of the domain can also help in approaching the target. So Web Crawling is the aspect that processes the index data on web-based applications’ web pages by using automated scripts or crawling programs. Waybulk is an automated python script used to crawl the target domain and gather the information from Wayback Machine Archive. Waybulk fetches known URL from Wayback Machines, also known as Archives for *.victimdomain, and output it on the terminal itself. Automation reduces the manual work of searching the domains on the website of Wayback Machine, so in a way, a bulk tool running an automated script by providing the target domain URL will crawl everything in a single click.

Note: Waybulk tool is developed in Python Environment, so make sure to set up Python on your Linux system.

Installation of Waybulk tool in Kali Linux

Step 1: Firstly, open the Linux terminal, from where we will install the Waybulk tool and use it for getting the list of domains.

cd Desktop

Step 2: You are on Desktop, now create a new directory called waybulk using the following command. Once the directory has been created, we will navigate into it and install the tool here.

mkdir waybulk

Step 3: Now switch to waybulk directory using the following command.

cd waybulk/

Step 4: You have to install the basic requirement (requests package) which is used by this tool. Use the below command to install.

pip3 install requests

Step 5: After configuring the Python packages, the next task is to install the waybulk tool. We will download the Waybulk tool package from Github.

git clone https://github.com/sham00n/waybulk

Step 6: After executing the above command, the tool will be downloaded in waybulk directory which we have created in Step 2. Now to see the contents of the directory, we will execute the ls command, that is used to view the contents of a directory.

ls

Step 7: There is a new directory created named waybulk. So we have to navigate to that directory, where there will be the executable file. So we will run the below command, to navigate to the waybulk directory.

cd waybulk/

Step 8: After navigating to waybulk directory, we will list out all the files present in the directory and run the executable file for usage.

ls

Working with Waybulk Tool

Note: There is a domains.txt file in waybulk directory. So to get WayBack Urls of the domain target. You need to add your target domains list in that domains.txt file.

After saving the list in the domains.txt file, you need to run the script using the following command.

python3 waybulk.py

Example 1: Single Target Domain

Step 1: In the below Screenshot, we will get the Wayback Machine link of a single Target domain. We have added the target domain URL in the domains.txt file.

Step 2: In the below Screenshot, We have got the Wayback Machine Link for our Target domain, which is geeksforgeeks.org. Now through this Wayback Link, we can get the history or live demo of the geeksforgeeks.org domain.

Step 3: In the below Screenshot, We will copy the open the link in the web browser to explore the geeksforgeeks.org domain.

Step 4: In the below Screenshot, You can see that we are exploring the geekforgeeks.org domain of 24 Sep 2008. It shows the live demonstration of how geeksforgeeks.org was looking on the date of 24 Sep 2008. Wayback Machine has captured 2894 states of the geeksforgeeks.org domain from 24 Sep 2008 to 28 Feb 2023. You can go to any date and see the changes performed on each day.

Example 2: Multiple Target Domain

Step 1: In the below Screenshot, we will get the Wayback Machine links of Multiple Target Domains. We have added multiple target domain URL lists in the domains.txt file.

Step 2: In the below Screenshot we have got the Wayback Machine Links for each target domain we provided into the domains.txt file. We specified our targets like uber.com, facebook.com, and many more. We can select any link and visit.

Step 3: In the below Screenshot, We will copy one of the target domain’s Wayback Machine links and open it in a web browser, Let’s open the link for the uber.com domain.

Step 4: In the below Screenshot, You can see that we are exploring the uber.com domain of 12 Dec 1998 date. So now you can imagine how strong the Wayback Machine is. Today we are in 2023 and it is possible to see the uber.com domain of the year 1998, rather than only seeing we can actually use and experience live demonstration. There are 18906 total captures done by Wayback Machine on uber.com from 1997 to 2023 year.

Conclusion

Wayback Machine can be useful for getting more details about Target. For example, if geeksforgeeks.org is having a secret.html webpage in the year 2023, and if the geeksforgeeks.org team has secured the page with limited access, but in case if the same page was not secured in the year 2017, then you can easily go into the 2017 archive and check that secret.html page and extract information.

Suggest improvement

Waybackurls - Fetch all the URLs that the Wayback Machine knows about for a domain

Share your thoughts in the comments