Best Web Archive

Nov 12, 2009

—

in Helping, Information, WebMaster Stuff

Web Archive is a great tool. You could check how a website looked before and what difference it make. For webmaster and domainer its a great tool.

Asif2BD.info at Nov 2007

The Web changes constantly, and sometimes that page that had just the information you needed yesterday (or last month or two years ago) is not available today. At other times you may want to see how a page’s content or design has changed. There are several sources for finding Web pages as they used to exist.

While Google’s cache is probably the best known, the others are important alternatives that may have pages not available at Google or the Wayback Machine plus they may have an archived page from a different date. The table below notes the name of the service, the way to find the archived page, and some notes that should give some idea as to how old a page the archive may contain.

Multiple copies of pages
Wayback Machine	Enter URL in search box to view	From late 1996 to 8-14 months ago; from the Internet Archive. Often includes cached images, CSS, and JavaScript.
Archive-It Collections	Full-text search or enter URL	Highly selective collections, primarily of state agencies and organizations. Some general pages included as well. Full-text searching. Use “Search All Collections” for broadest coverage. Multiple dates. Images, PDFs, text.
WebCite	Enter URL in search box to view	Launched in late 2005, this site only archives Web pages cited in certain journal articles (mosty health-related). Multiple dates may be available via drop down box in upper right. No text search access. Text, images, PDFs cached. Examples: Web page, newspaper article, PDF.
Single “Cached” Copy of a Page
Google Review	`cache:URL` or `Cached` link	Estimate from yesterday to 3 months old. Crawl date given. Text only cache as well.
Live Review	`Cached page` link	Estimate from yesterday to 3 months old. Crawl date given.
Yahoo! Review	`Cached` link	Estimate from yesterday to 3 months old. No cache date given.
Ask Review	`Cached` link	Estimate from yesterday to 3 months old. Crawl date given. Incomplete coverage.
Gigablast Review	`[cached]` link or `[stripped]` for text	From recent to a year old. Gives date of cache. Text only cache and links to Wayback Machine (older copies link) as well.
Exalead Review	`Preview` or link	From recent to a 6 months old. Gives date of cache. Incomplete coverage.
Alexa	`Cached` link	Estimate from yesterday to 3 months old. No cache date given.
ScrubTheWeb	`Cached` link	Small database, from 1-7 months old. No cache date given.
Family Source	`Cached` link	Small database, 1 million+ “family friendly” pages. Most pages cached in 2005. No cache date given.
Healia	`Cached` link	Estimate 2-4 months old. Small database of consumer health documents. No date given.
DiplomacyMonitor	`Cached` link	Small database of “more than 16,000 diplomatic and trade documents issued in the past 90 days.” Date indexed given.
Baidu	`????` link	Chinese search engine. Crawl date on results page, not in cache.
Baidu Japan	`?????`link	Japanese search engine. Crawl date on results page, not in cache.
Yandex (??????)	`??????????? ?????` link	Russian search engine, with primarily Russian pages. Estimate yesterday to several months old. No date given.
ZoomInfo	`[Cached]` link	Only in people search, associated pages have cached links. Cache date given, but no searchable or URL access. You have to know what person might be on a specific page.

Note that none of these include all Web pages. A robots.txt file or a <meta name=”ROBOTS” content=”NOINDEX”> in the header of a file can prohibit the crawling of the page. Google and other should look for a <meta name=”ROBOTS” content=”NOARCHIVE”> in the header and not cache such pages. But the exclusions do not always work. Other possible ways to resurrect a dead link include checking in your local browser’s cache if you visited the page recently or hope that someone else copied and posted the file on the Web.

For more details on searching the Wayback Machine, see my article “The Wayback Machine: The Web’s Archive.” ONLINE 26(2): 59-61, Mar.-Apr. 2002.

Services that Used to Have a Cached Copy
IncyWincy	`cached` link	Small database based on ODP, about 6 months old, gave date of cache. Defunct as of Fall 2007.
BoardReader	`Cached` link to view	Web forum postings only, date unreliable; cache link gone as of Fall 2006.
Daypop	`Cached date` link to view	Last two weeks, blog postings and news articles, gives date of cache. Daypop no longer available as of Fall 2006.
Feedster	`Cached` link to view	Typically caches only the first few lines from blog & news RSS feeds. Cached copy no longer available as of Fall 2006.
Blogging Ecosystem	`c` link to view	Very small: top linked and linking blogs only; no longer updated as of Fall 2006.
SearchEdu SearchGov SearchMil	All from MaxBot	These used to have their own database and cached copies. As of 2003, SearchGov and SearchEdu just give Google results. SearchMil no longer has cached copies.
Google News	Formerly `cache:URL` to view	Cached capability removed in March 2003.

find old pages old view wayback Web Archive

Comments

5 responses to “Best Web Archive”

?????????

November 20, 2009

?????? ?? ??? ???? ??? ?????? ??????????? ? ????????? ??????: ???? ?? ??????? ??????? ??? ??????? ???????? ????????.???????? ??? ??????????? ??????, ??? ?????????? ????? ????????????? ?????? ? ?????????, ? ??????? ????????????? ????????? ??? ? ????????? ??? ??? ???? ???????.

Reply
???????

November 18, 2009

ha ))

Reply
Scimmo

November 17, 2009

‘ll Show who knows a free extension for joomla e-shop.
———————————————————————
fun fun games flash games shockwave games

Reply
uberVU – social comments

November 12, 2009

Social comments and analytics for this post…

This post was mentioned on Twitter by rajanshu: Best Web Archive: Web Archive is a great tool. You could check how a website looked before and what difference .. http://bit.ly/2nQLvo…

Reply
Tweets that mention Best Web Archive | Asif.im — Topsy.com

November 12, 2009

[…] This post was mentioned on Twitter by Haku Hadakhai, Rajeev Ranjan and Del Persian, Pedro Byzantine. Pedro Byzantine said: Best Web Archive: Web Archive is a great tool. You could check how a website looked before and what difference .. http://bit.ly/2nQLvo […]

Reply

Best Web Archive

Comments

5 responses to “Best Web Archive”

Leave a Reply Cancel reply

Stay Ahead of the Curve, With Me