This repository contains Python scripts to crawl and compare a website for changes.
capture.py
- A web crawler that goes through all the pages of a given domain and exports the URLs, status codes, page sizes, and heights into both.txt
and.html
formats.compare.py
- A script that takes two.txt
files (generated bycrawl_website.py
), representing the old and new versions of a website, and compares them side by side. It exports the differences into an HTML file, highlighting the discrepancies.
-
Windows: Download the installer from Python's official site and follow the installation steps. Make sure to check the "Add Python to PATH" checkbox during installation.
-
macOS: Python comes pre-installed on macOS, but you can also download the latest version from Python's official site.
-
Linux: Use your distribution's package manager to install Python. For example, on Ubuntu:
sudo apt-get update
sudo apt-get install python3
After installing Python, you need to install the required packages. Navigate to the project folder in your terminal and run:
pip install -r requirements.txt
- Crawling a Website:
python crawl_website.py
Follow the prompts to enter the website domain and select the type of crawl.
- Comparing Websites:
python compare.py
Follow the prompts to select the .txt
files to be compared.
-
Migrating to a New Platform/Host: Before switching to a new platform or hosting service, you may want to ensure that all URLs from the old platform exist in the new platform and function as expected.
-
Switching WordPress Themes: A change in theme may result in differences in content display, load times, or even broken links. Comparing the website before and after the switch can highlight these issues.
-
SEO Analysis: Ensuring that URLs, especially high-traffic ones, remain consistent during any changes can help preserve SEO rankings.
-
Quality Assurance: Before rolling out a redesigned website, comparing the old and new sites can help identify bugs, missing content, or other issues that need to be addressed.