mirror of
https://github.com/alibaba/higress.git
synced 2026-04-23 13:07:25 +08:00
update plugins doc (#1305)
This commit is contained in:
@@ -1,11 +1,21 @@
|
||||
<p>
|
||||
<a href="README_EN.md"> English </a> | 中文
|
||||
</p>
|
||||
---
|
||||
title: Bot 拦截
|
||||
keywords: [higress,bot detect]
|
||||
description: Bot 拦截插件配置参考
|
||||
---
|
||||
|
||||
|
||||
## 功能说明
|
||||
|
||||
# 功能说明
|
||||
`bot-detect`插件可以用于识别并阻止互联网爬虫对站点资源的爬取
|
||||
|
||||
# 配置字段
|
||||
## 运行属性
|
||||
|
||||
插件执行阶段:`授权阶段`
|
||||
插件执行优先级:`310`
|
||||
|
||||
|
||||
## 配置字段
|
||||
|
||||
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
|
||||
| -------- | -------- | -------- | -------- | -------- |
|
||||
@@ -33,9 +43,9 @@
|
||||
(CSimpleSpider|Cityreview Robot|CrawlDaddy|CrawlFire|Finderbots|Index crawler|Job Roboter|KiwiStatus Spider|Lijit Crawler|QuerySeekerSpider|ScollSpider|Trends Crawler|USyd-NLP-Spider|SiteCat Webbot|BotName\/\$BotVersion|123metaspider-Bot|1470\.net crawler|50\.nu|8bo Crawler Bot|Aboundex|Accoona-[A-z]{1,30}-Agent|AdsBot-Google(?:-[a-z]{1,30}|)|altavista|AppEngine-Google|archive.{0,30}\.org_bot|archiver|Ask Jeeves|[Bb]ai[Dd]u[Ss]pider(?:-[A-Za-z]{1,30})(?:-[A-Za-z]{1,30}|)|bingbot|BingPreview|blitzbot|BlogBridge|Bloglovin|BoardReader Blog Indexer|BoardReader Favicon Fetcher|boitho.com-dc|BotSeer|BUbiNG|\b\w{0,30}favicon\w{0,30}\b|\bYeti(?:-[a-z]{1,30}|)|Catchpoint(?: bot|)|[Cc]harlotte|Checklinks|clumboot|Comodo HTTP\(S\) Crawler|Comodo-Webinspector-Crawler|ConveraCrawler|CRAWL-E|CrawlConvera|Daumoa(?:-feedfetcher|)|Feed Seeker Bot|Feedbin|findlinks|Flamingo_SearchEngine|FollowSite Bot|furlbot|Genieo|gigabot|GomezAgent|gonzo1|(?:[a-zA-Z]{1,30}-|)Googlebot(?:-[a-zA-Z]{1,30}|)|Google SketchUp|grub-client|gsa-crawler|heritrix|HiddenMarket|holmes|HooWWWer|htdig|ia_archiver|ICC-Crawler|Icarus6j|ichiro(?:/mobile|)|IconSurf|IlTrovatore(?:-Setaccio|)|InfuzApp|Innovazion Crawler|InternetArchive|IP2[a-z]{1,30}Bot|jbot\b|KaloogaBot|Kraken|Kurzor|larbin|LEIA|LesnikBot|Linguee Bot|LinkAider|LinkedInBot|Lite Bot|Llaut|lycos|Mail\.RU_Bot|masscan|masidani_bot|Mediapartners-Google|Microsoft .{0,30} Bot|mogimogi|mozDex|MJ12bot|msnbot(?:-media {0,2}|)|msrbot|Mtps Feed Aggregation System|netresearch|Netvibes|NewsGator[^/]{0,30}|^NING|Nutch[^/]{0,30}|Nymesis|ObjectsSearch|OgScrper|Orbiter|OOZBOT|PagePeeker|PagesInventory|PaxleFramework|Peeplo Screenshot Bot|PlantyNet_WebRobot|Pompos|Qwantify|Read%20Later|Reaper|RedCarpet|Retreiver|Riddler|Rival IQ|scooter|Scrapy|Scrubby|searchsight|seekbot|semanticdiscovery|SemrushBot|Simpy|SimplePie|SEOstats|SimpleRSS|SiteCon|Slackbot-LinkExpanding|Slack-ImgProxy|Slurp|snappy|Speedy Spider|Squrl Java|Stringer|TheUsefulbot|ThumbShotsBot|Thumbshots\.ru|Tiny Tiny RSS|Twitterbot|WhatsApp|URL2PNG|Vagabondo|VoilaBot|^vortex|Votay bot|^voyager|WASALive.Bot|Web-sniffer|WebThumb|WeSEE:[A-z]{1,30}|WhatWeb|WIRE|WordPress|Wotbox|www\.almaden\.ibm\.com|Xenu(?:.s|) Link Sleuth|Xerka [A-z]{1,30}Bot|yacy(?:bot|)|YahooSeeker|Yahoo! Slurp|Yandex\w{1,30}|YodaoBot(?:-[A-z]{1,30}|)|YottaaMonitor|Yowedo|^Zao|^Zao-Crawler|ZeBot_www\.ze\.bz|ZooShot|ZyBorg)(?:[ /]v?(\d+)(?:\.(\d+)(?:\.(\d+)|)|)|)
|
||||
```
|
||||
|
||||
# 配置示例
|
||||
## 配置示例
|
||||
|
||||
## 放行原本命中爬虫规则的请求
|
||||
### 放行原本命中爬虫规则的请求
|
||||
```yaml
|
||||
allow:
|
||||
- ".*Go-http-client.*"
|
||||
@@ -44,7 +54,7 @@ allow:
|
||||
若不作该配置,默认的 Golang 网络库请求会被视做爬虫,被禁止访问
|
||||
|
||||
|
||||
## 增加爬虫判断
|
||||
### 增加爬虫判断
|
||||
```yaml
|
||||
deny:
|
||||
- "spd-tools.*"
|
||||
@@ -56,24 +66,3 @@ deny:
|
||||
curl http://example.com -H 'User-Agent: spd-tools/1.1'
|
||||
curl http://exmaple.com -H 'User-Agent: spd-tools'
|
||||
```
|
||||
|
||||
## 对特定路由或域名开启
|
||||
```yaml
|
||||
# 使用 _rules_ 字段进行细粒度规则配置
|
||||
_rules_:
|
||||
# 规则一:按路由名称匹配生效
|
||||
- _match_route_:
|
||||
- route-a
|
||||
- route-b
|
||||
# 规则二:按域名匹配生效
|
||||
- _match_domain_:
|
||||
- "*.example.com"
|
||||
- test.com
|
||||
allow:
|
||||
- ".*Go-http-client.*"
|
||||
```
|
||||
此例 `_match_route_` 中指定的 `route-a` 和 `route-b` 即在创建网关路由时填写的路由名称,当匹配到这两个路由时,将使用此段配置;
|
||||
此例 `_match_domain_` 中指定的 `*.example.com` 和 `test.com` 用于匹配请求的域名,当发现域名匹配时,将使用此段配置;
|
||||
配置的匹配生效顺序,将按照 `_rules_` 下规则的排列顺序,匹配第一个规则后生效对应配置,后续规则将被忽略。
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user