I was recently doing some research and found a few sites that I really liked, whether it was their styling or layout or something, I was drawn in. I wanted to dive deeper into how they implemented their solution, but using the developer tools and poking around at minified files was too manual and time consuming. I wanted a way to download everything so I could use my code editor and other tools I was used to using.
The Solution
I found this handy article on downloading an entire website using wget. The final command I used looked something like this:
wget -r --reject mp3,mp4 -e robots=off https://brianchildress.co/
Breaking this down:
wget
: the tool that’s available for the GNU operating system
-r
: Recursive, downloads all assets and resources recursively
--reject
: Rejects all file types based on the array of file extensions provided
-e robots=off
Respects the robots.txt file and ignores any files that are off limits to the robots
<url>
The URL of the site we want to download from
Additional options worth considering
-m
: Mirror, enables recursion and time-stamping, maintains FTP directory listings.
-p
: Page-requisites, retrieves all images, etc. needed to display HTML page.
-k
: Convert-links, alters links in downloaded HTML point to local files.