How to Create an Archive of a Website

Recently, I was asked to archive a website in such a way that the static HTML files could be browsed with links to scripts, stylesheets, and images continuing to work properly. Options such as the Wayback Machine or Webrecorder required me to manually visit every page I wanted archived, and weren’t as reliable about getting every resource as I wanted. Eventually, I found HTTRack, which was perfect for my needs.

HTTRack offers a command line interface and does a fantastic job of getting everything on a website. I tried running the command with several different combinations of flags, but found that something like this worked best for my needs:

$ httrack -O ./ --mirrorlinks -%v -**

This command will look through, and act upon the flags:

The exclude/include part at the end was necessary to prevent HTTrack from also archiving any sites linked from the main one, for example seeing a Twitter profile and then trying to archive the entirety of Twitter. I suspect there might be a flag to do the same thing, but haven’t checked.

Tada! You’ll now have a completely working local copy of the website you chose to archive.