网站日志常见的搜索引擎蜘蛛

FontSize: 【

日常接触到的搜索引擎有百度、360搜索、搜狗、谷歌(Google),其实国内还有神马搜索,国外有雅虎搜索,搜索引擎蜘蛛在网站日志中的标识为Baiduspider=百度蜘蛛、Googlebot=谷歌蜘蛛、Sogou=搜狗蜘蛛、360Spider=360蜘蛛、YisouSpider=神马蜘蛛、Yahoo=雅虎蜘蛛

根据某网站日志分析,看看搜索引擎蜘蛛都有哪些代码:

1、谷歌(Google)蜘蛛有两种,Googlebot-Image/1.0应该是专用于图像的蜘蛛。

66.249.68.58 - - [17/Oct/2022:19:55:54 +0800] "GET / HTTP/1.1" 200 4899 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.5249.103 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.68.58 - - [17/Oct/2022:22:29:00 +0800] "GET /views/index/img/co2.jpg HTTP/1.1" 200 82891 "-" "Googlebot-Image/1.0"

2、百度蜘蛛也是两种,Baiduspider/2.0用于html,Baiduspider-render/2.0用于js,css,img。

116.179.32.227 - - [14/Oct/2022:18:08:59 +0800] "GET /news/detail/114.html HTTP/2.0" 200 5226 "-" "Mozilla/5.0 (Linux;u;Android 4.2.2;zh-cn;) AppleWebKit/534.46 (KHTML,like Gecko) Version/5.1 Mobile Safari/10600.6.3 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

116.179.37.82 - - [14/Oct/2022:19:03:36 +0800] "GET /uploadfile/images/20210610090751977.jpg HTTP/1.1" 200 280539 "https://www.domain.com/" "Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)"

3、搜狐蜘蛛只有一种,Sogou web spider/4.0

118.184.177.9 - - [17/Oct/2022:23:12:02 +0800] "GET /about/culture HTTP/1.1" 200 6237 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"

4、360蜘蛛有一种,360Spider

42.236.10.76 - - [20/Oct/2022:18:50:07 +0800] "GET /news/detail/97.html HTTP/1.1" 200 5351 "https://www.waterenping.com/news/detail/97.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 11_0 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15A5341f Safari/604.1;360Spider"

5、神马蜘蛛有一种YisouSpider/5.0,YisouSpider,一种带版号一种不带,神马蜘蛛爬取的频繁高于百度,但发现它经常会爬css,js之内的文件。

60.188.10.174 - - [20/Oct/2022:19:11:53 +0800] "GET /views/index/css/load.css HTTP/1.1" 200 495 "https://www.waterenping.com/Article/ShowInfo.asp?InfoID=81" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 YisouSpider/5.0 Safari/537.36"

101.67.50.4 - - [20/Oct/2022:21:11:16 +0800] "GET /robots.txt HTTP/1.1" 404 208 "-" "YisouSpider"

发现了一些新的蜘蛛,日常不被大家所谈起的:

6、 新品种的蜘蛛,PetalBot,花瓣蜘蛛,华为旗下。

114.119.152.235 - - [21/Oct/2022:01:12:34 +0800] "GET /notice/detail/95.html HTTP/1.1" 200 4344 "https://www.domain.com/notice/2/4" "Mozilla/5.0 (Linux; Android 7.0;) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)"

7、新品种的蜘蛛,Bytespider,字节蜘蛛,今日头条旗下。

110.249.202.57 - - [21/Oct/2022:06:49:59 +0800] "GET /robots.txt HTTP/1.1" 404 208 "-" "Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; https://zhanzhang.toutiao.com/)"

8、这个蜘蛛不算新,必应蜘蛛。

40.77.167.103 - - [21/Oct/2022:17:58:21 +0800] "GET /robots.txt HTTP/2.0" 404 208 "-" "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)"

雅虎蜘蛛没有爬行记录,估计它对中文网站不感兴趣。

很多蜘蛛会爬取robots.txt文件,是不是代表这些蜘蛛很遵守robots协议?如果觉得有些蜘蛛完全没有必要,可以屏蔽掉。方法详见robots.txt协议

 

转载时请以链接形式注明原始出处及本声明。
微信扫一扫
SEO文章代写加微信
回到顶部