不要把网页 cache 起来的语法

一些动态网页,可能会被浏览器 cache 起来,而看不出其变化,原来有以下几个语法可防止被 cache ,而每次 query 就都到网站来抓该网页。
一般HTML上的语法:

<META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"><!-- IE可能不见得有效 --><META HTTP-EQUIV="EXPIRES" CONTENT="0"><!-- 设定成马上就过期 --><META HTTP-EQUIV="CACHE-CONTROL" CONTENT="NO-CACHE"><!-- 与第一行是同样的作用 --># 其参数可以用这些方式设定:#HTTP 1.1. Allowed values = PUBLIC | PRIVATE | NO-CACHE | NO-STORE.#Public - may be cached in public shared caches#Private - may only be cached in private cache#no-Cache - may not be cached#no-Store - may be cached but not archived <META HTTP-EQUIV="EXPIRES" CONTENT="Mon, 22 Jul 2002 11:12:01 GMT"><!-- 常见此写法 -->

在 perl 的 CGI 的写法:

print "Content-type: text/html\; charset=big5\n";print "Pragma: no-cache\n";print "expires: Mon, 22 Jul 2002 11:12:01 GMT\n\n";

要不要给 Google 或其他 Spider 抓的写法:

<META NAME="ROBOTS" CONTENT="ALL"><META NAME="ROBOTS" CONTENT="INDEX,NOFOLLOW"><META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW"><META NAME="ROBOTS" CONTENT="NONE"> # 可用的参数:#CONTENT="ALL | NONE | NOINDEX | INDEX| NOFOLLOW | FOLLOW | NOARCHIVE"#default = empty = "ALL"#"NONE" = "NOINDEX, NOFOLLOW"##The CONTENT field is a comma separated list:#INDEX: search engine robots should include this page.#FOLLOW: robots should follow links from this page to other pages.#NOINDEX: links can be explored, although the page is not indexed.#NOFOLLOW: the page can be indexed, but no links are explored.#NONE: robots can ignore the page.#NOARCHIVE: Google uses this to prevent archiving of the page. See http://www.google.com/bot.html <META NAME="GOOGLEBOT" CONTENT="NOARCHIVE"> 

最简单是在网页根目录加上robots.txt防止任何Spider来抓。

User-agent: *Disallow: /

关于作者: 网站小编

码农网专注IT技术教程资源分享平台,学习资源下载网站,58码农网包含计算机技术、网站程序源码下载、编程技术论坛、互联网资源下载等产品服务,提供原创、优质、完整内容的专业码农交流分享平台。

热门文章