mirror of
https://github.com/alibaba/higress.git
synced 2026-06-09 20:57:32 +08:00
feat: improve ai statistic plugin (#2671)
This commit is contained in:
@@ -5,6 +5,7 @@ description: AI可观测配置参考
|
|||||||
---
|
---
|
||||||
|
|
||||||
## 介绍
|
## 介绍
|
||||||
|
|
||||||
提供 AI 可观测基础能力,包括 metric, log, trace,其后需接 ai-proxy 插件,如果不接 ai-proxy 插件的话,则需要用户进行相应配置才可生效。
|
提供 AI 可观测基础能力,包括 metric, log, trace,其后需接 ai-proxy 插件,如果不接 ai-proxy 插件的话,则需要用户进行相应配置才可生效。
|
||||||
|
|
||||||
## 运行属性
|
## 运行属性
|
||||||
@@ -13,6 +14,7 @@ description: AI可观测配置参考
|
|||||||
插件执行优先级:`200`
|
插件执行优先级:`200`
|
||||||
|
|
||||||
## 配置说明
|
## 配置说明
|
||||||
|
|
||||||
插件默认请求符合 openai 协议格式,并提供了以下基础可观测值,用户无需特殊配置:
|
插件默认请求符合 openai 协议格式,并提供了以下基础可观测值,用户无需特殊配置:
|
||||||
|
|
||||||
- metric:提供了输入 token、输出 token、首个 token 的 rt(流式请求)、请求总 rt 等指标,支持在网关、路由、服务、模型四个维度上进行观测
|
- metric:提供了输入 token、输出 token、首个 token 的 rt(流式请求)、请求总 rt 等指标,支持在网关、路由、服务、模型四个维度上进行观测
|
||||||
@@ -25,11 +27,13 @@ description: AI可观测配置参考
|
|||||||
| `attributes` | []Attribute | 非必填 | - | 用户希望记录在log/span中的信息 |
|
| `attributes` | []Attribute | 非必填 | - | 用户希望记录在log/span中的信息 |
|
||||||
| `disable_openai_usage` | bool | 非必填 | false | 非openai兼容协议时,model、token的支持非标,配置为true时可以避免报错 |
|
| `disable_openai_usage` | bool | 非必填 | false | 非openai兼容协议时,model、token的支持非标,配置为true时可以避免报错 |
|
||||||
| `value_length_limit` | int | 非必填 | 4000 | 记录的单个value的长度限制 |
|
| `value_length_limit` | int | 非必填 | 4000 | 记录的单个value的长度限制 |
|
||||||
|
| `enable_path_suffixes` | []string | 非必填 | [] | 只对这些特定路径后缀的请求生效,可以配置为 "\*" 以匹配所有路径(通配符检查会优先进行以提高性能)。如果为空数组,则对所有路径生效 |
|
||||||
|
| `enable_content_types` | []string | 非必填 | [] | 只对这些内容类型的响应进行缓冲处理。如果为空数组,则对所有内容类型生效 |
|
||||||
|
|
||||||
Attribute 配置说明:
|
Attribute 配置说明:
|
||||||
|
|
||||||
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
|
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
|
||||||
|----------------|-------|-----|-----|------------------------|
|
| ----------------------- | -------- | -------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||||
| `key` | string | 必填 | - | attribute 名称 |
|
| `key` | string | 必填 | - | attribute 名称 |
|
||||||
| `value_source` | string | 必填 | - | attribute 取值来源,可选值为 `fixed_value`, `request_header`, `request_body`, `response_header`, `response_body`, `response_streaming_body` |
|
| `value_source` | string | 必填 | - | attribute 取值来源,可选值为 `fixed_value`, `request_header`, `request_body`, `response_header`, `response_body`, `response_streaming_body` |
|
||||||
| `value` | string | 必填 | - | attribute 取值 key value/path |
|
| `value` | string | 必填 | - | attribute 取值 key value/path |
|
||||||
@@ -49,7 +53,6 @@ Attribute 配置说明:
|
|||||||
- `response_body` :attribute 值通过响应 body 获取,value 配置格式为 gjson 的 jsonpath
|
- `response_body` :attribute 值通过响应 body 获取,value 配置格式为 gjson 的 jsonpath
|
||||||
- `response_streaming_body` :attribute 值通过流式响应 body 获取,value 配置格式为 gjson 的 jsonpath
|
- `response_streaming_body` :attribute 值通过流式响应 body 获取,value 配置格式为 gjson 的 jsonpath
|
||||||
|
|
||||||
|
|
||||||
当 `value_source` 为 `response_streaming_body` 时,应当配置 `rule`,用于指定如何从流式 body 中获取指定值,取值含义如下:
|
当 `value_source` 为 `response_streaming_body` 时,应当配置 `rule`,用于指定如何从流式 body 中获取指定值,取值含义如下:
|
||||||
|
|
||||||
- `first`:多个 chunk 中取第一个有效 chunk 的值
|
- `first`:多个 chunk 中取第一个有效 chunk 的值
|
||||||
@@ -57,6 +60,7 @@ Attribute 配置说明:
|
|||||||
- `append`:拼接多个有效 chunk 中的值,可用于获取回答内容
|
- `append`:拼接多个有效 chunk 中的值,可用于获取回答内容
|
||||||
|
|
||||||
## 配置示例
|
## 配置示例
|
||||||
|
|
||||||
如果希望在网关访问日志中记录 ai-statistic 相关的统计值,需要修改 log_format,在原 log_format 基础上添加一个新字段,示例如下:
|
如果希望在网关访问日志中记录 ai-statistic 相关的统计值,需要修改 log_format,在原 log_format 基础上添加一个新字段,示例如下:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
@@ -64,6 +68,7 @@ Attribute 配置说明:
|
|||||||
```
|
```
|
||||||
|
|
||||||
如果字段设置了 `as_separate_log_field`,例如:
|
如果字段设置了 `as_separate_log_field`,例如:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
attributes:
|
attributes:
|
||||||
- key: consumer
|
- key: consumer
|
||||||
@@ -74,11 +79,13 @@ attributes:
|
|||||||
```
|
```
|
||||||
|
|
||||||
那么要在日志中打印,需要额外设置 log_format:
|
那么要在日志中打印,需要额外设置 log_format:
|
||||||
|
|
||||||
```
|
```
|
||||||
'{"consumer":"%FILTER_STATE(wasm.consumer:PLAIN)%"}'
|
'{"consumer":"%FILTER_STATE(wasm.consumer:PLAIN)%"}'
|
||||||
```
|
```
|
||||||
|
|
||||||
### 空配置
|
### 空配置
|
||||||
|
|
||||||
#### 监控
|
#### 监控
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -120,6 +127,7 @@ irate(route_upstream_model_consumer_metric_llm_duration_count[2m])
|
|||||||
```
|
```
|
||||||
|
|
||||||
#### 日志
|
#### 日志
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"ai_log": "{\"model\":\"qwen-turbo\",\"input_token\":\"10\",\"output_token\":\"69\",\"llm_first_token_duration\":\"309\",\"llm_service_duration\":\"1955\"}"
|
"ai_log": "{\"model\":\"qwen-turbo\",\"input_token\":\"10\",\"output_token\":\"69\",\"llm_first_token_duration\":\"309\",\"llm_service_duration\":\"1955\"}"
|
||||||
@@ -127,9 +135,11 @@ irate(route_upstream_model_consumer_metric_llm_duration_count[2m])
|
|||||||
```
|
```
|
||||||
|
|
||||||
#### 链路追踪
|
#### 链路追踪
|
||||||
|
|
||||||
配置为空时,不会在 span 中添加额外的 attribute
|
配置为空时,不会在 span 中添加额外的 attribute
|
||||||
|
|
||||||
### 从非 openai 协议提取 token 使用信息
|
### 从非 openai 协议提取 token 使用信息
|
||||||
|
|
||||||
在 ai-proxy 中设置协议为 original 时,以百炼为例,可作如下配置指定如何提取 model, input_token, output_token
|
在 ai-proxy 中设置协议为 original 时,以百炼为例,可作如下配置指定如何提取 model, input_token, output_token
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
@@ -150,6 +160,7 @@ attributes:
|
|||||||
apply_to_log: true
|
apply_to_log: true
|
||||||
apply_to_span: false
|
apply_to_span: false
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 监控
|
#### 监控
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -160,7 +171,9 @@ route_upstream_model_consumer_metric_llm_duration_count{ai_route="bailian",ai_cl
|
|||||||
```
|
```
|
||||||
|
|
||||||
#### 日志
|
#### 日志
|
||||||
|
|
||||||
此配置下日志效果如下:
|
此配置下日志效果如下:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"ai_log": "{\"model\":\"qwen-max\",\"input_token\":\"343\",\"output_token\":\"153\",\"llm_service_duration\":\"19110\"}"
|
"ai_log": "{\"model\":\"qwen-max\",\"input_token\":\"343\",\"output_token\":\"153\",\"llm_service_duration\":\"19110\"}"
|
||||||
@@ -168,10 +181,13 @@ route_upstream_model_consumer_metric_llm_duration_count{ai_route="bailian",ai_cl
|
|||||||
```
|
```
|
||||||
|
|
||||||
#### 链路追踪
|
#### 链路追踪
|
||||||
|
|
||||||
链路追踪的 span 中可以看到 model, input_token, output_token 三个额外的 attribute
|
链路追踪的 span 中可以看到 model, input_token, output_token 三个额外的 attribute
|
||||||
|
|
||||||
### 配合认证鉴权记录 consumer
|
### 配合认证鉴权记录 consumer
|
||||||
|
|
||||||
举例如下:
|
举例如下:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
attributes:
|
attributes:
|
||||||
- key: consumer # 配合认证鉴权记录consumer
|
- key: consumer # 配合认证鉴权记录consumer
|
||||||
@@ -181,6 +197,7 @@ attributes:
|
|||||||
```
|
```
|
||||||
|
|
||||||
### 记录问题与回答
|
### 记录问题与回答
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
attributes:
|
attributes:
|
||||||
- key: question # 记录问题
|
- key: question # 记录问题
|
||||||
@@ -199,11 +216,12 @@ attributes:
|
|||||||
```
|
```
|
||||||
|
|
||||||
## 进阶
|
## 进阶
|
||||||
|
|
||||||
配合阿里云 SLS 数据加工,可以将 ai 相关的字段进行提取加工,例如原始日志为:
|
配合阿里云 SLS 数据加工,可以将 ai 相关的字段进行提取加工,例如原始日志为:
|
||||||
|
|
||||||
```
|
````
|
||||||
ai_log:{"question":"用python计算2的3次方","answer":"你可以使用 Python 的乘方运算符 `**` 来计算一个数的次方。计算2的3次方,即2乘以自己2次,可以用以下代码表示:\n\n```python\nresult = 2 ** 3\nprint(result)\n```\n\n运行这段代码,你会得到输出结果为8,因为2乘以自己两次等于8。","model":"qwen-max","input_token":"16","output_token":"76","llm_service_duration":"5913"}
|
ai_log:{"question":"用python计算2的3次方","answer":"你可以使用 Python 的乘方运算符 `**` 来计算一个数的次方。计算2的3次方,即2乘以自己2次,可以用以下代码表示:\n\n```python\nresult = 2 ** 3\nprint(result)\n```\n\n运行这段代码,你会得到输出结果为8,因为2乘以自己两次等于8。","model":"qwen-max","input_token":"16","output_token":"76","llm_service_duration":"5913"}
|
||||||
```
|
````
|
||||||
|
|
||||||
使用如下数据加工脚本,可以提取出 question 和 answer:
|
使用如下数据加工脚本,可以提取出 question 和 answer:
|
||||||
|
|
||||||
@@ -215,7 +233,7 @@ e_set("answer", json_select(v("json"), "answer", default="-"))
|
|||||||
|
|
||||||
提取后,SLS 中会添加 question 和 answer 两个字段,示例如下:
|
提取后,SLS 中会添加 question 和 answer 两个字段,示例如下:
|
||||||
|
|
||||||
```
|
````
|
||||||
ai_log:{"question":"用python计算2的3次方","answer":"你可以使用 Python 的乘方运算符 `**` 来计算一个数的次方。计算2的3次方,即2乘以自己2次,可以用以下代码表示:\n\n```python\nresult = 2 ** 3\nprint(result)\n```\n\n运行这段代码,你会得到输出结果为8,因为2乘以自己两次等于8。","model":"qwen-max","input_token":"16","output_token":"76","llm_service_duration":"5913"}
|
ai_log:{"question":"用python计算2的3次方","answer":"你可以使用 Python 的乘方运算符 `**` 来计算一个数的次方。计算2的3次方,即2乘以自己2次,可以用以下代码表示:\n\n```python\nresult = 2 ** 3\nprint(result)\n```\n\n运行这段代码,你会得到输出结果为8,因为2乘以自己两次等于8。","model":"qwen-max","input_token":"16","output_token":"76","llm_service_duration":"5913"}
|
||||||
|
|
||||||
question:用python计算2的3次方
|
question:用python计算2的3次方
|
||||||
@@ -227,4 +245,57 @@ print(result)
|
|||||||
|
|
||||||
运行这段代码,你会得到输出结果为8,因为2乘以自己两次等于8。
|
运行这段代码,你会得到输出结果为8,因为2乘以自己两次等于8。
|
||||||
|
|
||||||
|
````
|
||||||
|
|
||||||
|
### 路径和内容类型过滤配置示例
|
||||||
|
|
||||||
|
#### 只处理特定 AI 路径
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
enable_path_suffixes:
|
||||||
|
- "/v1/chat/completions"
|
||||||
|
- "/v1/embeddings"
|
||||||
|
- "/generateContent"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 只处理特定内容类型
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
enable_content_types:
|
||||||
|
- "text/event-stream"
|
||||||
|
- "application/json"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 处理所有路径(通配符)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
enable_path_suffixes:
|
||||||
|
- "*"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 处理所有内容类型(空数组)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
enable_content_types: []
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 完整配置示例
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
enable_path_suffixes:
|
||||||
|
- "/v1/chat/completions"
|
||||||
|
- "/v1/embeddings"
|
||||||
|
- "/generateContent"
|
||||||
|
enable_content_types:
|
||||||
|
- "text/event-stream"
|
||||||
|
- "application/json"
|
||||||
|
attributes:
|
||||||
|
- key: model
|
||||||
|
value_source: request_body
|
||||||
|
value: model
|
||||||
|
apply_to_log: true
|
||||||
|
- key: consumer
|
||||||
|
value_source: request_header
|
||||||
|
value: x-mse-consumer
|
||||||
|
apply_to_log: true
|
||||||
```
|
```
|
||||||
@@ -5,6 +5,7 @@ description: AI Statistics plugin configuration reference
|
|||||||
---
|
---
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
Provides basic AI observability capabilities, including metric, log, and trace. The ai-proxy plug-in needs to be connected afterwards. If the ai-proxy plug-in is not connected, the user needs to configure it accordingly to take effect.
|
Provides basic AI observability capabilities, including metric, log, and trace. The ai-proxy plug-in needs to be connected afterwards. If the ai-proxy plug-in is not connected, the user needs to configure it accordingly to take effect.
|
||||||
|
|
||||||
## Runtime Properties
|
## Runtime Properties
|
||||||
@@ -13,6 +14,7 @@ Plugin Phase: `CUSTOM`
|
|||||||
Plugin Priority: `200`
|
Plugin Priority: `200`
|
||||||
|
|
||||||
## Configuration instructions
|
## Configuration instructions
|
||||||
|
|
||||||
The default request of the plug-in conforms to the openai protocol format and provides the following basic observable values. Users do not need special configuration:
|
The default request of the plug-in conforms to the openai protocol format and provides the following basic observable values. Users do not need special configuration:
|
||||||
|
|
||||||
- metric: It provides indicators such as input token, output token, rt of the first token (streaming request), total request rt, etc., and supports observation in the four dimensions of gateway, routing, service, and model.
|
- metric: It provides indicators such as input token, output token, rt of the first token (streaming request), total request rt, etc., and supports observation in the four dimensions of gateway, routing, service, and model.
|
||||||
@@ -25,12 +27,13 @@ Users can also expand observable values through configuration:
|
|||||||
| `attributes` | []Attribute | optional | - | Information that the user wants to record in log/span |
|
| `attributes` | []Attribute | optional | - | Information that the user wants to record in log/span |
|
||||||
| `disable_openai_usage` | bool | optional | false | When using a non-OpenAI-compatible protocol, the support for model and token is non-standard. Setting the configuration to true can prevent errors. |
|
| `disable_openai_usage` | bool | optional | false | When using a non-OpenAI-compatible protocol, the support for model and token is non-standard. Setting the configuration to true can prevent errors. |
|
||||||
| `value_length_limit` | int | optional | 4000 | length limit for each value |
|
| `value_length_limit` | int | optional | 4000 | length limit for each value |
|
||||||
|
| `enable_path_suffixes` | []string | optional | ["/v1/chat/completions","/v1/completions","/v1/embeddings","/v1/models","/generateContent","/streamGenerateContent"] | Only effective for requests with these specific path suffixes, can be configured as "\*" to match all paths |
|
||||||
|
| `enable_content_types` | []string | optional | ["text/event-stream","application/json"] | Only buffer response body for these content types |
|
||||||
|
|
||||||
Attribute Configuration instructions:
|
Attribute Configuration instructions:
|
||||||
|
|
||||||
| Name | Type | Required | Default | Description |
|
| Name | Type | Required | Default | Description |
|
||||||
|----------------|-------|-----|-----|------------------------|
|
| ----------------------- | ------ | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||||
| `key` | string | required | - | attribute key |
|
| `key` | string | required | - | attribute key |
|
||||||
| `value_source` | string | required | - | attribute value source, optional values are `fixed_value`, `request_header`, `request_body`, `response_header`, `response_body`, `response_streaming_body` |
|
| `value_source` | string | required | - | attribute value source, optional values are `fixed_value`, `request_header`, `request_body`, `response_header`, `response_body`, `response_streaming_body` |
|
||||||
| `value` | string | required | - | how to get attribute value |
|
| `value` | string | required | - | how to get attribute value |
|
||||||
@@ -50,7 +53,6 @@ The meanings of various values for `value_source` are as follows:
|
|||||||
- `response_body`: The attribute is obtained through the http response body
|
- `response_body`: The attribute is obtained through the http response body
|
||||||
- `response_streaming_body`: The attribute is obtained through the http streaming response body
|
- `response_streaming_body`: The attribute is obtained through the http streaming response body
|
||||||
|
|
||||||
|
|
||||||
When `value_source` is `response_streaming_body`, `rule` should be configured to specify how to obtain the specified value from the streaming body. The meaning of the value is as follows:
|
When `value_source` is `response_streaming_body`, `rule` should be configured to specify how to obtain the specified value from the streaming body. The meaning of the value is as follows:
|
||||||
|
|
||||||
- `first`: extract value from the first valid chunk
|
- `first`: extract value from the first valid chunk
|
||||||
@@ -58,6 +60,7 @@ When `value_source` is `response_streaming_body`, `rule` should be configured to
|
|||||||
- `append`: join value pieces from all valid chunks
|
- `append`: join value pieces from all valid chunks
|
||||||
|
|
||||||
## Configuration example
|
## Configuration example
|
||||||
|
|
||||||
If you want to record ai-statistic related statistical values in the gateway access log, you need to modify log_format and add a new field based on the original log_format. The example is as follows:
|
If you want to record ai-statistic related statistical values in the gateway access log, you need to modify log_format and add a new field based on the original log_format. The example is as follows:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
@@ -65,6 +68,7 @@ If you want to record ai-statistic related statistical values in the gateway acc
|
|||||||
```
|
```
|
||||||
|
|
||||||
If the field is set with `as_separate_log_field`, for example:
|
If the field is set with `as_separate_log_field`, for example:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
attributes:
|
attributes:
|
||||||
- key: consumer
|
- key: consumer
|
||||||
@@ -75,11 +79,13 @@ attributes:
|
|||||||
```
|
```
|
||||||
|
|
||||||
Then to print in the log, you need to set log_format additionally:
|
Then to print in the log, you need to set log_format additionally:
|
||||||
|
|
||||||
```
|
```
|
||||||
'{"consumer":"%FILTER_STATE(wasm.consumer:PLAIN)%"}'
|
'{"consumer":"%FILTER_STATE(wasm.consumer:PLAIN)%"}'
|
||||||
```
|
```
|
||||||
|
|
||||||
### Empty
|
### Empty
|
||||||
|
|
||||||
#### Metric
|
#### Metric
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -121,6 +127,7 @@ irate(route_upstream_model_consumer_metric_llm_duration_count[2m])
|
|||||||
```
|
```
|
||||||
|
|
||||||
#### Log
|
#### Log
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"ai_log": "{\"model\":\"qwen-turbo\",\"input_token\":\"10\",\"output_token\":\"69\",\"llm_first_token_duration\":\"309\",\"llm_service_duration\":\"1955\"}"
|
"ai_log": "{\"model\":\"qwen-turbo\",\"input_token\":\"10\",\"output_token\":\"69\",\"llm_first_token_duration\":\"309\",\"llm_service_duration\":\"1955\"}"
|
||||||
@@ -128,9 +135,11 @@ irate(route_upstream_model_consumer_metric_llm_duration_count[2m])
|
|||||||
```
|
```
|
||||||
|
|
||||||
#### Trace
|
#### Trace
|
||||||
|
|
||||||
When the configuration is empty, no additional attributes will be added to the span.
|
When the configuration is empty, no additional attributes will be added to the span.
|
||||||
|
|
||||||
### Extract token usage information from non-openai protocols
|
### Extract token usage information from non-openai protocols
|
||||||
|
|
||||||
When setting the protocol to original in ai-proxy, taking Alibaba Cloud Bailian as an example, you can make the following configuration to specify how to extract `model`, `input_token`, `output_token`
|
When setting the protocol to original in ai-proxy, taking Alibaba Cloud Bailian as an example, you can make the following configuration to specify how to extract `model`, `input_token`, `output_token`
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
@@ -151,6 +160,7 @@ attributes:
|
|||||||
apply_to_log: true
|
apply_to_log: true
|
||||||
apply_to_span: false
|
apply_to_span: false
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Metric
|
#### Metric
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -161,6 +171,7 @@ route_upstream_model_consumer_metric_llm_duration_count{ai_route="bailian",ai_cl
|
|||||||
```
|
```
|
||||||
|
|
||||||
#### Log
|
#### Log
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"ai_log": "{\"model\":\"qwen-max\",\"input_token\":\"343\",\"output_token\":\"153\",\"llm_service_duration\":\"19110\"}"
|
"ai_log": "{\"model\":\"qwen-max\",\"input_token\":\"343\",\"output_token\":\"153\",\"llm_service_duration\":\"19110\"}"
|
||||||
@@ -168,9 +179,11 @@ route_upstream_model_consumer_metric_llm_duration_count{ai_route="bailian",ai_cl
|
|||||||
```
|
```
|
||||||
|
|
||||||
#### Trace
|
#### Trace
|
||||||
|
|
||||||
Three additional attributes `model`, `input_token`, and `output_token` can be seen in the trace spans.
|
Three additional attributes `model`, `input_token`, and `output_token` can be seen in the trace spans.
|
||||||
|
|
||||||
### Cooperate with authentication and authentication record consumer
|
### Cooperate with authentication and authentication record consumer
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
attributes:
|
attributes:
|
||||||
- key: consumer
|
- key: consumer
|
||||||
@@ -180,6 +193,7 @@ attributes:
|
|||||||
```
|
```
|
||||||
|
|
||||||
### Record questions and answers
|
### Record questions and answers
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
attributes:
|
attributes:
|
||||||
- key: question
|
- key: question
|
||||||
@@ -196,3 +210,50 @@ attributes:
|
|||||||
value: choices.0.message.content
|
value: choices.0.message.content
|
||||||
apply_to_log: true
|
apply_to_log: true
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Path and Content Type Filtering Configuration Examples
|
||||||
|
|
||||||
|
#### Process Only Specific AI Paths
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
enable_path_suffixes:
|
||||||
|
- "/v1/chat/completions"
|
||||||
|
- "/v1/embeddings"
|
||||||
|
- "/generateContent"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Process Only Specific Content Types
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
enable_content_types:
|
||||||
|
- "text/event-stream"
|
||||||
|
- "application/json"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Process All Paths (Wildcard)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
enable_path_suffixes:
|
||||||
|
- "*"
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Complete Configuration Example
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
enable_path_suffixes:
|
||||||
|
- "/v1/chat/completions"
|
||||||
|
- "/v1/embeddings"
|
||||||
|
- "/generateContent"
|
||||||
|
enable_content_types:
|
||||||
|
- "text/event-stream"
|
||||||
|
- "application/json"
|
||||||
|
attributes:
|
||||||
|
- key: model
|
||||||
|
value_source: request_body
|
||||||
|
value: model
|
||||||
|
apply_to_log: true
|
||||||
|
- key: consumer
|
||||||
|
value_source: request_header
|
||||||
|
value: x-mse-consumer
|
||||||
|
apply_to_log: true
|
||||||
|
```
|
||||||
|
|||||||
@@ -44,6 +44,15 @@ const (
|
|||||||
APIName = "api"
|
APIName = "api"
|
||||||
ConsumerKey = "x-mse-consumer"
|
ConsumerKey = "x-mse-consumer"
|
||||||
RequestPath = "request_path"
|
RequestPath = "request_path"
|
||||||
|
SkipProcessing = "skip_processing"
|
||||||
|
|
||||||
|
// AI API Paths
|
||||||
|
PathOpenAIChatCompletions = "/v1/chat/completions"
|
||||||
|
PathOpenAICompletions = "/v1/completions"
|
||||||
|
PathOpenAIEmbeddings = "/v1/embeddings"
|
||||||
|
PathOpenAIModels = "/v1/models"
|
||||||
|
PathGeminiGenerateContent = "/generateContent"
|
||||||
|
PathGeminiStreamGenerateContent = "/streamGenerateContent"
|
||||||
|
|
||||||
// Source Type
|
// Source Type
|
||||||
FixedValue = "fixed_value"
|
FixedValue = "fixed_value"
|
||||||
@@ -100,6 +109,10 @@ type AIStatisticsConfig struct {
|
|||||||
// If disableOpenaiUsage is true, model/input_token/output_token logs will be skipped
|
// If disableOpenaiUsage is true, model/input_token/output_token logs will be skipped
|
||||||
disableOpenaiUsage bool
|
disableOpenaiUsage bool
|
||||||
valueLengthLimit int
|
valueLengthLimit int
|
||||||
|
// Path suffixes to enable the plugin on
|
||||||
|
enablePathSuffixes []string
|
||||||
|
// Content types to enable response body buffering
|
||||||
|
enableContentTypes []string
|
||||||
}
|
}
|
||||||
|
|
||||||
func generateMetricName(route, cluster, model, consumer, metricName string) string {
|
func generateMetricName(route, cluster, model, consumer, metricName string) string {
|
||||||
@@ -147,6 +160,41 @@ func (config *AIStatisticsConfig) incrementCounter(metricName string, inc uint64
|
|||||||
counter.Increment(inc)
|
counter.Increment(inc)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// isPathEnabled checks if the request path matches any of the enabled path suffixes
|
||||||
|
func isPathEnabled(requestPath string, enabledSuffixes []string) bool {
|
||||||
|
if len(enabledSuffixes) == 0 {
|
||||||
|
return true // If no path suffixes configured, enable for all
|
||||||
|
}
|
||||||
|
|
||||||
|
// Remove query parameters from path
|
||||||
|
pathWithoutQuery := requestPath
|
||||||
|
if queryPos := strings.Index(requestPath, "?"); queryPos != -1 {
|
||||||
|
pathWithoutQuery = requestPath[:queryPos]
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if path ends with any enabled suffix
|
||||||
|
for _, suffix := range enabledSuffixes {
|
||||||
|
if strings.HasSuffix(pathWithoutQuery, suffix) {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
|
// isContentTypeEnabled checks if the content type matches any of the enabled content types
|
||||||
|
func isContentTypeEnabled(contentType string, enabledContentTypes []string) bool {
|
||||||
|
if len(enabledContentTypes) == 0 {
|
||||||
|
return true // If no content types configured, enable for all
|
||||||
|
}
|
||||||
|
|
||||||
|
for _, enabledType := range enabledContentTypes {
|
||||||
|
if strings.Contains(contentType, enabledType) {
|
||||||
|
return true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return false
|
||||||
|
}
|
||||||
|
|
||||||
func parseConfig(configJson gjson.Result, config *AIStatisticsConfig) error {
|
func parseConfig(configJson gjson.Result, config *AIStatisticsConfig) error {
|
||||||
// Parse tracing span attributes setting.
|
// Parse tracing span attributes setting.
|
||||||
attributeConfigs := configJson.Get("attributes").Array()
|
attributeConfigs := configJson.Get("attributes").Array()
|
||||||
@@ -177,10 +225,49 @@ func parseConfig(configJson gjson.Result, config *AIStatisticsConfig) error {
|
|||||||
// Parse openai usage config setting.
|
// Parse openai usage config setting.
|
||||||
config.disableOpenaiUsage = configJson.Get("disable_openai_usage").Bool()
|
config.disableOpenaiUsage = configJson.Get("disable_openai_usage").Bool()
|
||||||
|
|
||||||
|
// Parse path suffix configuration
|
||||||
|
pathSuffixes := configJson.Get("enable_path_suffixes").Array()
|
||||||
|
config.enablePathSuffixes = make([]string, 0, len(pathSuffixes))
|
||||||
|
|
||||||
|
for _, suffix := range pathSuffixes {
|
||||||
|
suffixStr := suffix.String()
|
||||||
|
if suffixStr == "*" {
|
||||||
|
// Clear the suffixes list since * means all paths are enabled
|
||||||
|
config.enablePathSuffixes = make([]string, 0)
|
||||||
|
break
|
||||||
|
}
|
||||||
|
config.enablePathSuffixes = append(config.enablePathSuffixes, suffixStr)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Parse content type configuration
|
||||||
|
contentTypes := configJson.Get("enable_content_types").Array()
|
||||||
|
config.enableContentTypes = make([]string, 0, len(contentTypes))
|
||||||
|
|
||||||
|
for _, contentType := range contentTypes {
|
||||||
|
contentTypeStr := contentType.String()
|
||||||
|
if contentTypeStr == "*" {
|
||||||
|
// Clear the content types list since * means all content types are enabled
|
||||||
|
config.enableContentTypes = make([]string, 0)
|
||||||
|
break
|
||||||
|
}
|
||||||
|
config.enableContentTypes = append(config.enableContentTypes, contentTypeStr)
|
||||||
|
}
|
||||||
|
|
||||||
return nil
|
return nil
|
||||||
}
|
}
|
||||||
|
|
||||||
func onHttpRequestHeaders(ctx wrapper.HttpContext, config AIStatisticsConfig) types.Action {
|
func onHttpRequestHeaders(ctx wrapper.HttpContext, config AIStatisticsConfig) types.Action {
|
||||||
|
// Check if request path matches enabled suffixes
|
||||||
|
requestPath, _ := proxywasm.GetHttpRequestHeader(":path")
|
||||||
|
if !isPathEnabled(requestPath, config.enablePathSuffixes) {
|
||||||
|
log.Debugf("ai-statistics: skipping request for path %s (not in enabled suffixes)", requestPath)
|
||||||
|
// Set skip processing flag and avoid reading request/response body
|
||||||
|
ctx.SetContext(SkipProcessing, true)
|
||||||
|
ctx.DontReadRequestBody()
|
||||||
|
ctx.DontReadResponseBody()
|
||||||
|
return types.ActionContinue
|
||||||
|
}
|
||||||
|
|
||||||
ctx.DisableReroute()
|
ctx.DisableReroute()
|
||||||
route, _ := getRouteName()
|
route, _ := getRouteName()
|
||||||
cluster, _ := getClusterName()
|
cluster, _ := getClusterName()
|
||||||
@@ -212,6 +299,11 @@ func onHttpRequestHeaders(ctx wrapper.HttpContext, config AIStatisticsConfig) ty
|
|||||||
}
|
}
|
||||||
|
|
||||||
func onHttpRequestBody(ctx wrapper.HttpContext, config AIStatisticsConfig, body []byte) types.Action {
|
func onHttpRequestBody(ctx wrapper.HttpContext, config AIStatisticsConfig, body []byte) types.Action {
|
||||||
|
// Check if processing should be skipped
|
||||||
|
if ctx.GetBoolContext(SkipProcessing, false) {
|
||||||
|
return types.ActionContinue
|
||||||
|
}
|
||||||
|
|
||||||
// Set user defined log & span attributes.
|
// Set user defined log & span attributes.
|
||||||
setAttributeBySource(ctx, config, RequestBody, body)
|
setAttributeBySource(ctx, config, RequestBody, body)
|
||||||
// Set span attributes for ARMS.
|
// Set span attributes for ARMS.
|
||||||
@@ -254,6 +346,15 @@ func onHttpRequestBody(ctx wrapper.HttpContext, config AIStatisticsConfig, body
|
|||||||
|
|
||||||
func onHttpResponseHeaders(ctx wrapper.HttpContext, config AIStatisticsConfig) types.Action {
|
func onHttpResponseHeaders(ctx wrapper.HttpContext, config AIStatisticsConfig) types.Action {
|
||||||
contentType, _ := proxywasm.GetHttpResponseHeader("content-type")
|
contentType, _ := proxywasm.GetHttpResponseHeader("content-type")
|
||||||
|
|
||||||
|
if !isContentTypeEnabled(contentType, config.enableContentTypes) {
|
||||||
|
log.Debugf("ai-statistics: skipping response for content type %s (not in enabled content types)", contentType)
|
||||||
|
// Set skip processing flag and avoid reading response body
|
||||||
|
ctx.SetContext(SkipProcessing, true)
|
||||||
|
ctx.DontReadResponseBody()
|
||||||
|
return types.ActionContinue
|
||||||
|
}
|
||||||
|
|
||||||
if !strings.Contains(contentType, "text/event-stream") {
|
if !strings.Contains(contentType, "text/event-stream") {
|
||||||
ctx.BufferResponseBody()
|
ctx.BufferResponseBody()
|
||||||
}
|
}
|
||||||
@@ -265,6 +366,11 @@ func onHttpResponseHeaders(ctx wrapper.HttpContext, config AIStatisticsConfig) t
|
|||||||
}
|
}
|
||||||
|
|
||||||
func onHttpStreamingBody(ctx wrapper.HttpContext, config AIStatisticsConfig, data []byte, endOfStream bool) []byte {
|
func onHttpStreamingBody(ctx wrapper.HttpContext, config AIStatisticsConfig, data []byte, endOfStream bool) []byte {
|
||||||
|
// Check if processing should be skipped
|
||||||
|
if ctx.GetBoolContext(SkipProcessing, false) {
|
||||||
|
return data
|
||||||
|
}
|
||||||
|
|
||||||
// Buffer stream body for record log & span attributes
|
// Buffer stream body for record log & span attributes
|
||||||
if config.shouldBufferStreamingBody {
|
if config.shouldBufferStreamingBody {
|
||||||
streamingBodyBuffer, ok := ctx.GetContext(CtxStreamingBodyBuffer).([]byte)
|
streamingBodyBuffer, ok := ctx.GetContext(CtxStreamingBodyBuffer).([]byte)
|
||||||
@@ -334,6 +440,11 @@ func onHttpStreamingBody(ctx wrapper.HttpContext, config AIStatisticsConfig, dat
|
|||||||
}
|
}
|
||||||
|
|
||||||
func onHttpResponseBody(ctx wrapper.HttpContext, config AIStatisticsConfig, body []byte) types.Action {
|
func onHttpResponseBody(ctx wrapper.HttpContext, config AIStatisticsConfig, body []byte) types.Action {
|
||||||
|
// Check if processing should be skipped
|
||||||
|
if ctx.GetBoolContext(SkipProcessing, false) {
|
||||||
|
return types.ActionContinue
|
||||||
|
}
|
||||||
|
|
||||||
// Get requestStartTime from http context
|
// Get requestStartTime from http context
|
||||||
requestStartTime, _ := ctx.GetContext(StatisticsRequestStartTime).(int64)
|
requestStartTime, _ := ctx.GetContext(StatisticsRequestStartTime).(int64)
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user