Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: rinfx <yucheng.lxr@alibaba-inc.com>
title, keywords, description
| title | keywords | description | |||
|---|---|---|---|---|---|
| AI Content Security |
|
Alibaba Cloud content security |
Introduction
Integrate with Aliyun content security service for detections of input and output of LLMs, ensuring that application content is legal and compliant.
Runtime Properties
Plugin Phase: CUSTOM
Plugin Priority: 300
Configuration
| Name | Type | Requirement | Default | Description |
|---|---|---|---|---|
serviceName |
string | requried | - | service name |
servicePort |
string | requried | - | service port |
serviceHost |
string | requried | - | Host of Aliyun content security service endpoint |
accessKey |
string | requried | - | Aliyun accesskey |
secretKey |
string | requried | - | Aliyun secretkey |
action |
string | requried | - | Aliyun ai guardrails business interface |
checkRequest |
bool | optional | false | check if the input is legal |
checkResponse |
bool | optional | false | check if the output is legal |
requestCheckService |
string | optional | llm_query_moderation | Aliyun yundun service name for input check |
responseCheckService |
string | optional | llm_response_moderation | Aliyun yundun service name for output check |
requestContentJsonPath |
string | optional | messages.@reverse.0.content |
Specify the jsonpath of the content to be detected in the request body |
responseContentJsonPath |
string | optional | choices.0.message.content |
Specify the jsonpath of the content to be detected in the response body |
responseStreamContentJsonPath |
string | optional | choices.0.delta.content |
Specify the jsonpath of the content to be detected in the streaming response body |
responseContentFallbackJsonPaths |
array | optional | [choices.0.message.content, content.#(type=="text")#.text] |
Fallback paths tried in order when responseContentJsonPath extracts empty content; entries equal to the primary path are skipped automatically; set to [] to disable fallback explicitly |
responseStreamContentFallbackJsonPaths |
array | optional | [choices.0.delta.content, delta.text] |
Streaming fallback paths tried in order when responseStreamContentJsonPath extracts empty content; entries equal to the primary path are skipped automatically; set to [] to disable fallback explicitly |
denyCode |
int | optional | 200 | Response status code when the specified content is illegal |
denyMessage |
string | optional | Drainage/non-streaming response in openai format, the answer content is the suggested answer from Alibaba Cloud content security | Response content when the specified content is illegal |
protocol |
string | optional | openai | protocol format, openai or original |
openAIDenyResponseFormat |
string | optional | legacy | OpenAI-wrapped deny response format, legacy or structured. The default legacy preserves historical compatibility; structured embeds blocking details at choices[0].x_higress_guardrail |
contentModerationLevelBar |
string | optional | max | contentModeration risk level threshold, max, high, medium or low |
promptAttackLevelBar |
string | optional | max | promptAttack risk level threshold, max, high, medium or low |
sensitiveDataLevelBar |
string | optional | S4 | sensitiveData risk level threshold, S4, S3, S2 or S1 |
customLabelLevelBar |
string | optional | max | Custom label detection risk level threshold, value can be max, high, medium, or low |
riskAction |
string | optional | block | Risk action, value can be block or mask. block means blocking requests based on risk level thresholds, mask means replacing sensitive fields with desensitized content when API returns mask suggestion. Note: masking only works with MultiModalGuard mode |
timeout |
int | optional | 2000 | timeout for lvwang service |
bufferLimit |
int | optional | 1000 | Limit the length of each text when calling the lvwang service |
consumerRequestCheckService |
map | optional | - | Specify specific request detection services for different consumers |
consumerResponseCheckService |
map | optional | - | Specify specific response detection services for different consumers |
consumerRiskLevel |
map | optional | - | Specify interception risk levels for different consumers in different dimensions |
Risk level explanations for each detection dimension:
-
For content moderation and prompt attack detection (contentModeration, promptAttack):
max: Detect request/response content but do not blockhigh: Block when risk level ishighmedium: Block when risk level >=mediumlow: Block when risk level >=low
-
For sensitive data detection (sensitiveData):
S4: Detect request/response content but do not blockS3: Block when risk level isS3S2: Block when risk level >=S2S1: Block when risk level >=S1
-
For custom label detection (customLabel):
max: Detect request/response content but do not blockhigh: Block when custom label detection result risk level ishigh- Note: The Alibaba Cloud API only returns
highandnonefor the customLabel dimension, unlike other dimensions which have four levels. Set tohighto block on detection hit, set tomaxto not block.mediumandloware kept for configuration compatibility but will not be returned by the API.
-
For risk action (riskAction):
block: Block requests based on risk level thresholds for each dimensionmask: Replace sensitive fields with desensitized content when API returnsSuggestion=mask, still block whenSuggestion=block- Note: Masking only works with MultiModalGuard mode (action configured as MultiModalGuard), other modes do not support masking
Deny Response Body
When content is blocked, the plugin (MultiModalGuard action) builds the following structured JSON object. protocol: original, MCP, and image-generation paths return it directly or indirectly; OpenAI text-generation wrapping keeps the historical response shape by default, and embeds this object only when openAIDenyResponseFormat: structured is configured.
{
"code": 200,
"denyMessage": "Sorry, I cannot answer your question.",
"blockedDetails": [
{
"type": "contentModeration",
"level": "high"
}
]
}
Field descriptions:
| Field | Type | Description |
|---|---|---|
code |
int | For text_generation (OpenAI wrapping) and image_generation paths, this is the HTTP status the gateway returns, sourced from denyCode (default 200). For protocol=original and mcp paths, this is the business code returned by the security service (Response.Code; 200 indicates a successful check that detected a risk). |
denyMessage |
string | Human-readable deny text. Always present on OpenAI-wrapping paths, taken from denyMessage (defaults to Sorry, I cannot answer your question.). On protocol=original / image_generation / mcp paths the value is taken from denyMessage and omitted (omitempty) when unconfigured. |
blockedDetails |
array | Details of the triggered blocking dimensions. Synthesised from top-level RiskLevel/AttackLevel when the security service returns no Detail entries. Returns [] when no dimension is hit. |
blockedDetails[].type |
string | Risk type: contentModeration / promptAttack / sensitiveData / maliciousUrl / modelHallucination / customLabel |
blockedDetails[].level |
string | Risk level: high / medium / low; for sensitive data: S1–S4 |
Note: the current implementation emits only the fields above. The security service's
RequestId, per-detailSuggestion, and raw business code (guardCode) are not embedded in the deny body. The security service'sRequestIdis exposed via the AI access log fieldsafecheck_request_ids(see the AI Log section below).
How the body is embedded per protocol:
text_generation(OpenAI, defaultlegacy): emits neitherx_higress_guardrailnor the historicalx_higressfield;choices[0].message.content/ the firstdelta.contentframe keeps the historical content shape (a JSON string for RiskBlock, deny text for mask fallback),finish_reasonis"stop", and streaming responses still end withdata: [DONE]text_generation(OpenAI,structurednon-streaming):choices[0].message.contentcarries the human-readable deny text (denyMessage, defaults toSorry, I cannot answer your question.when unconfigured); the structure above is placed atchoices[0].x_higress_guardrailas an embedded object (not a JSON string)text_generation(OpenAI,structuredstreaming SSE): the first frame'sdelta.contentcarries the human-readable deny text; the structure above is attached only to the last chunk atchoices[0].x_higress_guardrailas an embedded object, followed bydata: [DONE]text_generation(protocol=original): returned directly as the JSON response body (no OpenAI wrapper, nox_higress_guardrail)image_generation: returned directly as the JSON response body (HTTP 403)mcp(JSON-RPC): serialised as a JSON string and placed inerror.messagemcp(SSE): same, returned via SSE event
openAIDenyResponseFormat only changes the OpenAI-wrapped deny body shape; blocking decisions, fail-open behavior, metrics, and AI Log fields do not vary by format. Configure this field only at plugin global scope, not under consumerRiskLevel.
Examples of configuration
Check if the input is legal
serviceName: safecheck.dns
servicePort: 443
serviceHost: "green-cip.cn-shanghai.aliyuncs.com"
accessKey: "XXXXXXXXX"
secretKey: "XXXXXXXXXXXXXXX"
checkRequest: true
Check if both the input and output are legal
serviceName: safecheck.dns
servicePort: 443
serviceHost: green-cip.cn-shanghai.aliyuncs.com
accessKey: "XXXXXXXXX"
secretKey: "XXXXXXXXXXXXXXX"
checkRequest: true
checkResponse: true
Configure OpenAI Structured Deny Responses
The default openAIDenyResponseFormat: legacy keeps the historical response shape. To emit structured blocking details in OpenAI responses, configure:
openAIDenyResponseFormat: structured
Configure response fallback extraction paths
When primary extraction paths are empty, you can configure ordered fallback paths to support multiple response formats:
serviceName: safecheck.dns
servicePort: 443
serviceHost: green-cip.cn-shanghai.aliyuncs.com
accessKey: "XXXXXXXXX"
secretKey: "XXXXXXXXXXXXXXX"
checkResponse: true
responseContentJsonPath: "choices.0.message.content"
responseStreamContentJsonPath: "choices.0.delta.content"
responseContentFallbackJsonPaths:
- "output.text"
- 'content.#(type=="text")#.text'
responseStreamContentFallbackJsonPaths:
- "payload.delta"
- "delta.text"
To enforce strict mode (no fallback), configure both fields as empty arrays:
responseContentFallbackJsonPaths: []
responseStreamContentFallbackJsonPaths: []
Observability
Metric
ai-security-guard plugin provides following metrics:
ai_sec_request_deny: count of requests denied at request phaseai_sec_response_deny: count of requests denied at response phase
Image response-phase metric / ai_log rename (transition window)
The image generation handlers (lvwang/multi_modal_guard/image/openai.go and lvwang/multi_modal_guard/image/qwen.go) historically emitted request-phase field names for response-phase events. This release corrects the semantics and keeps a double-write transition for 1–2 release cycles:
| Signal | Legacy value (wrong; removed in a future release) | New value (recommended) |
|---|---|---|
| Counter (deny) | ai_sec_request_deny |
ai_sec_response_deny |
| ai_log latency (pass + deny) | safecheck_request_rt |
safecheck_response_rt |
| ai_log status (deny) | safecheck_status="reqeust deny" (typo; dropped immediately, no longer emitted) |
safecheck_status="response deny" |
During the transition window, the image response phase emits both the new and the legacy *_deny counters and safecheck_*_rt attributes; safecheck_status only emits the new value. Migrate dashboards / alerts to the response_* names; any image-response alert that still keys off the typo'd reqeust deny status string must move to response deny immediately.
Trace
ai-security-guard plugin provides following span attributes:
ai_sec_risklabel: risk type of this requestai_sec_deny_phase: denied phase of this request, value can be request/response
AI Log
ai-security-guard writes each submission to the content security service into the AI access log, so gateway logs can be correlated with Alibaba Cloud content security requests:
| Field | Type | Description |
|---|---|---|
safecheck_requests |
array | Submission event array. Each item is {"requestId"?: string, "phase": string, "modality": string, "result": string} |
safecheck_request_ids |
array | All valid content security RequestId values for the current gateway request, preserved in submission completion order without deduplication or truncation |
safecheck_request_id |
string | The latest valid content security RequestId, kept for consumers that only read a single value |
safecheck_status |
string | Legacy compatibility field reflecting the last status transition for this gateway request (see enum below) |
safecheck_request_rt / safecheck_response_rt |
int | Latency (ms) of the security check during the request / response phase |
safecheck_riskLabel / safecheck_riskWords |
string | Risk label and risk words when a risk is hit (taken from the first result returned by the security service) |
safecheck_requests[].phase is request or response; modality is text, image, or mcp; result describes the processing outcome of that submission event itself (not the gateway's final outbound action). Values:
result value |
Meaning |
|---|---|
pass |
The submission passed the check |
deny |
The submission hit a risk; the gateway returned a deny response |
mask |
The submission hit a risk with Action=Mask; the security service returned desensitized text and the request body was rewritten |
error |
The submission itself failed (HTTP non-200, business Code non-200, unmarshal failure, deny-response build failure, dispatch failure, etc.). When the failure occurs in the streaming response callback because building the deny response failed, the gateway fails open (injects buffered upstream content as-is); in that case safecheck_status=build_fallback_pass and the corresponding event has result=error to indicate the security submission did not complete |
The plugin writes requestId, safecheck_request_ids, and safecheck_request_id only when the security service response contains a JSON string RequestId and strings.TrimSpace(RequestId) != ""; missing, empty, whitespace-only, or non-string values do not produce empty placeholders.
Every submission attempt emits one safecheck_requests event, including HTTP non-200 responses, business failures, and failures to dispatch the security service call. These error paths are recorded as result=error. Use safecheck_requests for precise auditing across multiple submissions, streaming chunks, or multiple image checks.
safecheck_status enum (legacy field; overwritten on each status transition, so only the last transition's value is preserved when there are multiple submissions):
safecheck_status value |
Meaning |
|---|---|
request pass |
All request-phase submissions passed |
request mask |
A request-phase submission hit mask; the request body was rewritten with desensitized text |
reqeust deny |
A request-phase submission hit a risk; the gateway returned a deny response (note: typo reqeust is preserved for backward compatibility) |
request error |
A request-phase security submission itself failed (HTTP / unmarshal / dispatch / etc.); the gateway fails open |
response pass |
All response-phase submissions passed |
response deny |
A response-phase submission hit a risk; the gateway returned a deny response |
response error |
A response-phase security submission itself failed; the gateway fails open |
build_fallback_pass |
In the streaming response callback, building the deny response failed; the gateway fails open and injects the buffered upstream content as-is |