arrow_upward

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to download an entire website from the Internet Archive Wayback Machine
#1
The following are  abbreviated steps for downloading a Website from the Internet Archive Wayback Machine using CentOS 7.0.  

If you check GitHub there are a number of  Downloaders available and I don't think they're all equal in quality.  I was lucky to find this one by the author hartator as it worked for me:

https://github.com/hartator/wayback-machine-downloader

If you want, you can check through the above github page, as it does give a long list of options of how to control the download for larger Websites. If you check through the issues you'll note that pages are missed - there were a number of report backs about this.  So it's not perfect by a long shot. But it's fun for less serious projects. 

Before you start the download it is important that you check how large the Website is that you want to download and whether you have enough resources on your VPS, particularly memory, bandwidth and disk space to handle the download efficiently.  One flaw in the downloader is that it doesn't provide you with the size it is going to download and it also doesn't give you the option to say "no".  You need to do that research before you use the download command.

Here is an abbreviation of the commands using CentOS 7.0:

Step 1:  Install Ruby

yum install ruby


Step 2:  Install the Wayback Machine Downloader

gem install wayback_machine_downloader


Step 3:  Use the downloader command to start the download:

wayback_machine_downloader http://domainname.com



If you want to interrupt the download you can use Ctrl C.  If you want to resume the download at the point where it was interrupted just repeat the downloader command:

wayback_machine_downloader http://domainname.com


I'm happy with the outcome so far, however haven't taken it to its conclusion yet.  My download project is very tricky in that it is a Forum instead of a static Website.  I think for uncomplicated static Websites this will work fine.  Not sure about Forums and Blogs though.  There's an issue with time stamps and the way the Forums and Blogs have been archived.  And of course no database.  The layers of .html pages don't go that deep.  Hopefully I'll be able to report back about this at a later stage.  I'm hoping to get a snapshot of the Forum on X date.  Will be interesting to see what will appear.

In retrospect have decided to give up on megatools. The backup was so slow and almost a tenth of the way it just came to a complete stop. I'll probably go about this in a different way.
Terminal
Thank you to Post4VPS and VirMach for my awesome VPS 9!  



Possibly Related Threads…
Thread
Author
Replies
Views
Last Post
7,919
04-04-2017, 06:12 PM
Last Post: FacTioN
2,348
02-03-2017, 12:41 PM
Last Post: Hero^

person_pin_circle Users browsing this thread: 1 Guest(s)
Sponsors: VirMach - Host4Fun - CubeData - Evolution-Host - HostDare - Hyper Expert - Shadow Hosting - Bladenode - Hostlease - RackNerd - ReadyDedis - Limitless Hosting