higress

jiazhizhong/higress

Fork 0

mirror of https://github.com/alibaba/higress.git synced 2026-02-21 22:40:59 +08:00

Files

History

纪卓志 6a1bf90d42 feat: supports custom prepare build script (#1490 )

2024-11-12 13:45:28 +08:00

res

feat: ai敏感词拦截插件 (#1190 )

2024-08-16 17:24:32 +08:00

src

Ai data mask deny word match optimize (#1453 )

2024-11-05 15:26:55 +08:00

.buildrc

feat: supports custom prepare build script (#1490 )

2024-11-12 13:45:28 +08:00

Cargo.toml

Ai data mask deny word match optimize (#1453 )

2024-11-05 15:26:55 +08:00

README_EN.md

Ai data mask deny word match optimize (#1453 )

2024-11-05 15:26:55 +08:00

README.md

Ai data mask deny word match optimize (#1453 )

2024-11-05 15:26:55 +08:00

README_EN.md

title, keywords, description

title

keywords

description

AI Data Masking

higress

ai data masking

AI Data Masking Plugin Configuration Reference

Function Description

Interception and replacement of sensitive words in requests/responses

Data Handling Scope

openai protocol: Request/response conversation content
jsonpath: Only process specified fields
raw: Entire request/response body

Sensitive Word Interception

Directly intercept sensitive words in the data handling scope and return preset error messages
Supports system's built-in sensitive word library and custom sensitive words

Sensitive Word Replacement

Replace sensitive words in request data with masked strings before passing to back-end services. Ensures that sensitive data does not leave the domain
Some masked data can be restored after being returned by the back-end service
Custom rules support standard regular expressions and grok rules, and replacement strings support variable substitution

Execution Properties

Plugin Execution Phase: Authentication Phase
Plugin Execution Priority: 991

Configuration Fields

Name	Data Type	Default Value	Description
deny_openai	bool	true	Intercept openai protocol
deny_jsonpath	string	[]	Intercept specified jsonpath
deny_raw	bool	false	Intercept raw body
system_deny	bool	false	Enable built-in interception rules
deny_code	int	200	HTTP status code when intercepted
deny_message	string	Sensitive words found in the question or answer have been blocked	AI returned message when intercepted
deny_raw_message	string	{"errmsg":"Sensitive words found in the question or answer have been blocked"}	Content returned when not openai intercepted
deny_content_type	string	application/json	Content type header returned when not openai intercepted
deny_words	array of string	[]	Custom sensitive word list
replace_roles	array	-	Custom sensitive word regex replacement
replace_roles.regex	string	-	Rule regex (built-in GROK rule)
replace_roles.type	[replace, hash]	-	Replacement type
replace_roles.restore	bool	false	Whether to restore
replace_roles.value	string	-	Replacement value (supports regex variables)

Configuration Example

system_deny: true
deny_openai: true
deny_jsonpath:
  - "$.messages[*].content"
deny_raw: true
deny_code: 200
deny_message: "Sensitive words found in the question or answer have been blocked"
deny_raw_message: "{\"errmsg\":\"Sensitive words found in the question or answer have been blocked\"}"
deny_content_type: "application/json"
deny_words:
  - "Custom sensitive word 1"
  - "Custom sensitive word 2"
replace_roles:
  - regex: "%{MOBILE}"
    type: "replace"
    value: "****"
    # Mobile number  13800138000 -> ****
  - regex: "%{EMAILLOCALPART}@%{HOSTNAME:domain}"
    type: "replace"
    restore: true
    value: "****@$domain"
    # Email  admin@gmail.com -> ****@gmail.com
  - regex: "%{IP}"
    type: "replace"
    restore: true
    value: "***.***.***.***"
    # IP 192.168.0.1 -> ***.***.***.***
  - regex: "%{IDCARD}"
    type: "replace"
    value: "****"
    # ID card number 110000000000000000 -> ****
  - regex: "sk-[0-9a-zA-Z]*"
    restore: true
    type: "hash"
    # hash sk-12345 -> 9cb495455da32f41567dab1d07f1973d
    # The hashed value is provided to the large model, and the hash value will be restored to the original value from the data returned by the large model

Sensitive Word Replacement Example

User Request Content

Please change curl http://172.20.5.14/api/openai/v1/chat/completions -H "Authorization: sk-12345" -H "Auth: test@gmail.com" to POST method

Processed Request Large Model Content

curl http://***.***.***.***/api/openai/v1/chat/completions -H "Authorization: 48a7e98a91d93896d8dac522c5853948" -H "Auth: ****@gmail.com" change to POST method

Large Model Returned Content

You want to convert a curl GET request to a POST request, and this request is sending data to a specific API. Below is the modified curl command to send as POST:

curl -X POST \
     -H "Authorization: 48a7e98a91d93896d8dac522c5853948" \
     -H "Auth: ****@gmail.com" \
     -H "Content-Type: application/json" \
     -d '{"key":"value"}' \
     http://***.***.***.***/api/openai/v1/chat/completions

Here are the following modifications made:

-X POST sets the request method to POST.
-H "Content-Type: application/json" sets the Content-Type in the request header to application/json, which is typically used to inform the server that the data you are sending is in JSON format.
-d '{"key":"value"}' sets the data to be sent, where '{"key":"value"}' is a simple example of a JSON object. You need to replace it with the actual data you want to send.

Please note that you need to replace "key":"value" with the actual data content you want to send. If your API accepts a different data structure or requires specific fields, please adjust this part according to your actual situation.

Processed Return to User Content

You want to convert a curl GET request to a POST request, and this request is sending data to a specific API. Below is the modified curl command to send as POST:

curl -X POST \
     -H "Authorization: sk-12345" \
     -H "Auth: test@gmail.com" \
     -H "Content-Type: application/json" \
     -d '{"key":"value"}' \
     http://172.20.5.14/api/openai/v1/chat/completions

Here are the following modifications made:

-X POST sets the request method to POST.
-H "Content-Type: application/json" sets the Content-Type in the request header to application/json, which is typically used to inform the server that the data you are sending is in JSON format.
-d '{"key":"value"}' sets the data to be sent, where '{"key":"value"}' is a simple example of a JSON object. You need to replace it with the actual data you want to send.

In streaming mode, if the masked words are split across multiple chunks, restoration may not be possible
In streaming mode, if sensitive words are split across multiple chunks, there may be cases where part of the sensitive word is returned to the user
Grok built-in rule list: https://help.aliyun.com/zh/sls/user-guide/grok-patterns
Built-in sensitive word library data source: https://github.com/houbb/sensitive-word/tree/master/src/main/resources
Since the sensitive word list is matched after tokenizing the text, please set deny_words to single words. In the case of multiple words in English, such as hello world, the match may not be successful.

README_EN.md

Function Description

Data Handling Scope

Sensitive Word Interception

Sensitive Word Replacement

Execution Properties

Configuration Fields

Configuration Example

Sensitive Word Replacement Example

User Request Content

Processed Request Large Model Content

Large Model Returned Content

Processed Return to User Content

Related Notes