arrow_upward

Pages (2):
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Wayback Machine - Internet Archive
#1
Would like to share this Wayback Machine - Internet Archive that lets you check and go back from the past content of a website.

For those who are not yet know Wayback Machine - Internet Archive is a 501©(3) non-profit, building a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, they provide free access to researchers, historians, scholars, the print disabled, and the general public. Their mission is to provide Universal Access to All Knowledge.

I've using this to check the old website that was already down to gather some information, ideas and images sometimes it can also download files that are available that time, take note that website should have a snapshot during that time unless you can't view it. Another thing is don't expect too much with it, sometimes it can only view the front page of the website and once you browse other of the website page it may not work, but it was cool to know that there is a archive like this isn't?

Lets make an example for Post4VPS.com, it has a snapshot wayback 15th of November year 2015.

For those who are interested what does Post4VPS.com looks like in 2015, here it is:

Post4VPS.com November 15, 2015

Seeing some familiar names like @Dudi, @RickB, @karatekidmonkey also @TrK during that time is not yet an administrator.

[Image: Screenshot_2018_09_19_Post4_VPS_Forum.png]
#2
This is a useful website for caching static web pages and to have a record of website history, also this can be said as a trusted proof for taking screenshots, I think wayback machine is one of the oldest website recording websites by crawling the huge data.
Started on 2001, is serving the internet users to see old pages and websites.
A great digital archive, that must be appreciated.


Thank you  Sweet



#3
Great sharing @Vuluts. Wayback Machine is indeed a very helpful tool for browsing history of websites.

This reminds me of a nice command-line Wayback Machine tool that I used before. It allows users to download snapshots of websites easily.

https://github.com/sangaline/wayback-machine-scraper

The script is written in python and is available for pip install.


#4
@tryp4vps Woop! Not aware of this Wayback Machine Scraper this is what I really needed of. Have you or does this scrapper can able to download data file like 500mb above? as I be needed to download some old installers from an old website.
#5
(09-24-2018, 11:55 AM)Vuluts Wrote: @tryp4vps Woop! Not aware of this Wayback Machine Scraper this is what I really needed of. Have you or does this scrapper can able to download data file like 500mb above? as I be needed to download some old installers from an old website.

I think this paragraph below explains your case?

Quote:The command-line utility is highly configurable in terms of what it scrapes but it only saves the unparsed content of the pages on the site. If you're interested in parsing data from the pages that are crawled then you might want to check out scrapy-wayback-machine instead. It's a downloader middleware that handles all of the tricky parts and passes normal response objects to your Scrapy spiders with archive timestamp information attached. The middleware is very unobtrusive and should work seamlessly with existing Scrapy middlewares, extensions, and spiders. It's what wayback-machine-scraper uses behind the scenes and it offers more flexibility for advanced use cases.

It may be able to download your content from that old website since it only has access to the scraped part of the site?
No one knows what the future holds, that's why its potential is infinite
#6
I've been a user of the Wayward Machine as well and have found it very useful when I've been doing research on Forums. Only thing is that I don't find it as good as it used to be a few years back. Like for Forums I was able to get a much deeper search into a Forum a few years back than I do today. For example if you click on a link today you may get a list of topics, but when you click on the topics, you can't get any further, whereas a few years ago you could click a few layers deep. Searches are much more superficial today.
Terminal
Thank you to Post4VPS and VirMach for my awesome VPS 9!  
#7
(09-24-2018, 11:55 AM)Vuluts Wrote: ....... Have you or does this scrapper can able to download data file like 500mb above? as I be needed to download some old installers from an old website.


My download sizes were relatively small. Because I only needed to download snapshots of websites by specifying a limited duration of time, for example, within a certain week.

I suppose the scraper tool should not have a hard limit on the download size though.

But I doubt if you can still download the old installers if that old website is no longer existing. Because as far as I know, Wayback Machine does not keep this kind of files on their server. They just redirect you to the original download destinations.

That said, you won't be able to download the old installers through the Wayback Machine if the original download links are not accessible.


#8
Its a very useful tool to webmasters. I recovered the content of several long-dead sites and was able to set up those sites just like those used to be. Fortunately, those were static html page sites so it was easy.


~ Be yourself everybody else is taken ~




#9
Wayback Machine is heavily on my mind at present. I've been working on a download of frihost.com and completely forgot that this Forum has to be very big as it's been around from 2005-3018.  Reason I'm working on the download is that Frihost went down (and possibly out) during mid-May and the owner hasn't been in touch with any of us.  I'm almost certain that there must have been no recent enough backups in place and he has given up on the Forum.  I'd like to create something out of it that we can remember Frihost by - if it is possible. Run it by the other staff of Frihost once I've downloaded everything and it looks as though a reasonable Website is possible from it. Challenge of course is it is  a Forum - one won't get a download of the database, nor a list of the members.

All of the downloads are html pages only.  There are 146,694 pages in the download.  It's taken me almost 24 hours to get almost half way - still have a bit to go to half way.  Been interesting as it's been testing my HostUS paid VPS as well - like to its limits.  I've been hanging close to almost 90-99% RAM and at one stage wondered whether the VPS will be big enough.  I only have 20GB on my Junior VPS and 768MB RAM.  So at least figured out too how to pause the download if there would have been a need for it, and then how to resume it.  I'll cover those in the tutorial as well. 

If successful, will post a tutorial about it later.  May take another two days possibly to get to that point.  It's nice to see SSH working overtime - hope it can carry this to its conclusion, otherwise it may have to be an aborted or first attempt only.
Terminal
Thank you to Post4VPS and VirMach for my awesome VPS 9!  
#10
@deanhills

That's nearly 10 years of data! So since you download those as static pages it will take lots of space I guess. If there's a way to take it's data base .. I guess it's already gone if site went down in May. This is one reason I like about social media. We can find a way to be in touch with people who are far away from us. I'm really sad about what happened to Frihost specially to the forum. Sad

You might have to download all those data from VPS to your PC time to time. Better not wait till you download the whole thing. But I can't think of anyway to restore it as a forum though.


~ Be yourself everybody else is taken ~




Pages (2):


person_pin_circle Users browsing this thread: 1 Guest(s)
Sponsors: VirMach - Host4Fun - CubeData - Evolution-Host - HostDare - Hyper Expert - Shadow Hosting - Bladenode - Hostlease - RackNerd - ReadyDedis - Limitless Hosting