higress/plugins/wasm-rust/extensions/ai-data-masking/README_EN.md

---
title: AI Data Masking
keywords: [higress, ai data masking]
description: AI Data Masking Plugin Configuration Reference
---
## Function Description
  Interception and replacement of sensitive words in requests/responses
![image](https://img.alicdn.com/imgextra/i4/O1CN0156Wtko1T9JO0RiWow_!!6000000002339-0-tps-1314-638.jpg)

### Data Handling Scope
  - openai protocol: Request/response conversation content
  - jsonpath: Only process specified fields
  - raw: Entire request/response body

### Sensitive Word Interception
  - Directly intercept sensitive words in the data handling scope and return preset error messages
  - Supports system's built-in sensitive word library and custom sensitive words

### Sensitive Word Replacement
  - Replace sensitive words in request data with masked strings before passing to back-end services. Ensures that sensitive data does not leave the domain
  - Some masked data can be restored after being returned by the back-end service
  - Custom rules support standard regular expressions and grok rules, and replacement strings support variable substitution

## Execution Properties
Plugin Execution Phase: `Authentication Phase`
Plugin Execution Priority: `991`

## Configuration Fields
| Name                   | Data Type       | Default Value | Description                          |
| ---------------------- | ---------------- | -------------- | ------------------------------------ |
|  deny_openai           | bool             | true           |  Intercept openai protocol          |
|  deny_jsonpath         | string           |   []           |  Intercept specified jsonpath       |
|  deny_raw              | bool             | false          |  Intercept raw body                 |
|  system_deny           | bool             | true           |  Enable built-in interception rules  |
|  deny_code             | int              | 200            |  HTTP status code when intercepted   |
|  deny_message          | string           | Sensitive words found in the question or answer have been blocked | AI returned message when intercepted |
|  deny_raw_message      | string           | {"errmsg":"Sensitive words found in the question or answer have been blocked"} | Content returned when not openai intercepted |
|  deny_content_type     | string           | application/json | Content type header returned when not openai intercepted |
|  deny_words            | array of string  | []             | Custom sensitive word list           |
|  replace_roles         | array            |   -            | Custom sensitive word regex replacement |
|  replace_roles.regex   | string           |   -            | Rule regex (built-in GROK rule)    |
|  replace_roles.type    | [replace, hash]  |   -            | Replacement type                     |
|  replace_roles.restore  | bool             | false          | Whether to restore                   |
|  replace_roles.value    | string          |   -            | Replacement value (supports regex variables) |

## Configuration Example
```yaml
system_deny: true
deny_openai: true
deny_jsonpath:
  - "$.messages[*].content"
deny_raw: true
deny_code: 200
deny_message: "Sensitive words found in the question or answer have been blocked"
deny_raw_message: "{\"errmsg\":\"Sensitive words found in the question or answer have been blocked\"}"
deny_content_type: "application/json"
deny_words:
  - "Custom sensitive word 1"
  - "Custom sensitive word 2"
replace_roles:
  - regex: "%{MOBILE}"
    type: "replace"
    value: "****"
    # Mobile number  13800138000 -> ****
  - regex: "%{EMAILLOCALPART}@%{HOSTNAME:domain}"
    type: "replace"
    restore: true
    value: "****@$domain"
    # Email  admin@gmail.com -> ****@gmail.com
  - regex: "%{IP}"
    type: "replace"
    restore: true
    value: "***.***.***.***"
    # IP 192.168.0.1 -> ***.***.***.***
  - regex: "%{IDCARD}"
    type: "replace"
    value: "****"
    # ID card number 110000000000000000 -> ****
  - regex: "sk-[0-9a-zA-Z]*"
    restore: true
    type: "hash"
    # hash sk-12345 -> 9cb495455da32f41567dab1d07f1973d
    # The hashed value is provided to the large model, and the hash value will be restored to the original value from the data returned by the large model
```

## Sensitive Word Replacement Example
### User Request Content
  Please change `curl http://172.20.5.14/api/openai/v1/chat/completions -H "Authorization: sk-12345" -H "Auth: test@gmail.com"` to POST method

### Processed Request Large Model Content
  `curl http://***.***.***.***/api/openai/v1/chat/completions -H "Authorization: 48a7e98a91d93896d8dac522c5853948" -H "Auth: ****@gmail.com"` change to POST method

### Large Model Returned Content
  You want to convert a `curl` GET request to a POST request, and this request is sending data to a specific API. Below is the modified `curl` command to send as POST:
```sh
curl -X POST \
     -H "Authorization: 48a7e98a91d93896d8dac522c5853948" \
     -H "Auth: ****@gmail.com" \
     -H "Content-Type: application/json" \
     -d '{"key":"value"}' \
     http://***.***.***.***/api/openai/v1/chat/completions
```
Here are the following modifications made:
- `-X POST` sets the request method to POST.
- `-H "Content-Type: application/json"` sets the `Content-Type` in the request header to `application/json`, which is typically used to inform the server that the data you are sending is in JSON format.
- `-d '{"key":"value"}'` sets the data to be sent, where `'{"key":"value"}'` is a simple example of a JSON object. You need to replace it with the actual data you want to send.

Please note that you need to replace `"key":"value"` with the actual data content you want to send. If your API accepts a different data structure or requires specific fields, please adjust this part according to your actual situation.

### Processed Return to User Content
  You want to convert a `curl` GET request to a POST request, and this request is sending data to a specific API. Below is the modified `curl` command to send as POST:
```sh
curl -X POST \
     -H "Authorization: sk-12345" \
     -H "Auth: test@gmail.com" \
     -H "Content-Type: application/json" \
     -d '{"key":"value"}' \
     http://172.20.5.14/api/openai/v1/chat/completions
```
Here are the following modifications made:
- `-X POST` sets the request method to POST.
- `-H "Content-Type: application/json"` sets the `Content-Type` in the request header to `application/json`, which is typically used to inform the server that the data you are sending is in JSON format.
- `-d '{"key":"value"}'` sets the data to be sent, where `'{"key":"value"}'` is a simple example of a JSON object. You need to replace it with the actual data you want to send.

Please note that you need to replace `"key":"value"` with the actual data content you want to send. If your API accepts a different data structure or requires specific fields, please adjust this part according to your actual situation.

## Related Notes
 - In streaming mode, if the masked words are split across multiple chunks, restoration may not be possible
 - In streaming mode, if sensitive words are split across multiple chunks, there may be cases where part of the sensitive word is returned to the user
 - Grok built-in rule list: https://help.aliyun.com/zh/sls/user-guide/grok-patterns
 - Built-in sensitive word library data source: https://github.com/houbb/sensitive-word/tree/master/src/main/resources