不要把网页 cache 起来的语法-58码农网

一些动态网页，可能会被浏览器 cache 起来，而看不出其变化，原来有以下几个语法可防止被 cache ，而每次 query 就都到网站来抓该网页。
一般HTML上的语法：

<META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"><!-- IE可能不见得有效 --><META HTTP-EQUIV="EXPIRES" CONTENT="0"><!-- 设定成马上就过期 --><META HTTP-EQUIV="CACHE-CONTROL" CONTENT="NO-CACHE"><!-- 与第一行是同样的作用 --># 其参数可以用这些方式设定：#HTTP 1.1. Allowed values = PUBLIC | PRIVATE | NO-CACHE | NO-STORE.#Public - may be cached in public shared caches#Private - may only be cached in private cache#no-Cache - may not be cached#no-Store - may be cached but not archived <META HTTP-EQUIV="EXPIRES" CONTENT="Mon, 22 Jul 2002 11:12:01 GMT"><!-- 常见此写法 -->

在 perl 的 CGI 的写法：

print "Content-type: text/html\; charset=big5\n";print "Pragma: no-cache\n";print "expires: Mon, 22 Jul 2002 11:12:01 GMT\n\n";

要不要给 Google 或其他 Spider 抓的写法：

<META NAME="ROBOTS" CONTENT="ALL"><META NAME="ROBOTS" CONTENT="INDEX,NOFOLLOW"><META NAME="ROBOTS" CONTENT="NOINDEX,FOLLOW"><META NAME="ROBOTS" CONTENT="NONE"> # 可用的参数：#CONTENT="ALL | NONE | NOINDEX | INDEX| NOFOLLOW | FOLLOW | NOARCHIVE"#default = empty = "ALL"#"NONE" = "NOINDEX, NOFOLLOW"##The CONTENT field is a comma separated list:#INDEX: search engine robots should include this page.#FOLLOW: robots should follow links from this page to other pages.#NOINDEX: links can be explored, although the page is not indexed.#NOFOLLOW: the page can be indexed, but no links are explored.#NONE: robots can ignore the page.#NOARCHIVE: Google uses this to prevent archiving of the page. See http://www.google.com/bot.html <META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">

最简单是在网页根目录加上robots.txt防止任何Spider来抓。

User-agent: *Disallow: /

给这篇文章的作者打赏

关于作者: 网站小编

相关文章

HBO Max vs.Netflix：当你负担不起两者时如何选择

课内笔记整理---作业系统实务(资安相关篇)

excel vba捞网页数据问题

热门文章

1不要把网页 cache 起来的语法

2深入浅出Android程式设计(29)-如何将Eclipse中文化及方便的小工具DroidDraw

3以指令方式管理服务

4Excel同时进行多列跨栏置中的小技巧

5如何使用WinInstall LE将EXE档转换成MSI档