mirror of
https://github.com/alibaba/higress.git
synced 2026-04-21 20:17:29 +08:00
Update ai statistics (#1303)
This commit is contained in:
@@ -1,69 +1,178 @@
|
||||
# 介绍
|
||||
提供AI可观测基础能力,其后需接ai-proxy插件,如果不接ai-proxy插件的话,则只支持openai协议。
|
||||
---
|
||||
title: AI可观测
|
||||
keywords: [higress, AI, observability]
|
||||
description: AI可观测配置参考
|
||||
---
|
||||
|
||||
# 配置说明
|
||||
## 介绍
|
||||
提供AI可观测基础能力,包括 metric, log, trace,其后需接ai-proxy插件,如果不接ai-proxy插件的话,则需要用户进行相应配置才可生效。
|
||||
|
||||
## 运行属性
|
||||
|
||||
插件执行阶段:`默认阶段`
|
||||
插件执行优先级:`200`
|
||||
|
||||
## 配置说明
|
||||
插件默认请求符合openai协议格式,并提供了以下基础可观测值,用户无需特殊配置:
|
||||
|
||||
- metric:提供了输入token、输出token、首个token的rt(流式请求)、请求总rt等指标,支持在网关、路由、服务、模型四个维度上进行观测
|
||||
- log:提供了 input_token, output_token, model, llm_service_duration, llm_first_token_duration 等字段
|
||||
|
||||
用户还可以通过配置的方式对可观测的值进行扩展:
|
||||
|
||||
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
|
||||
|----------------|-------|------|-----|------------------------|
|
||||
| `enable` | bool | 必填 | - | 是否开启ai统计功能 |
|
||||
| `tracing_span` | array | 非必填 | - | 自定义tracing span tag 配置 |
|
||||
| `attributes` | []Attribute | 非必填 | - | 用户希望记录在log/span中的信息 |
|
||||
|
||||
Attribute 配置说明:
|
||||
|
||||
## tracing_span 配置说明
|
||||
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
|
||||
|----------------|-------|-----|-----|------------------------|
|
||||
| `key` | string | 必填 | - | tracing tag 名称 |
|
||||
| `value_source` | string | 必填 | - | tag 取值来源 |
|
||||
| `value` | string | 必填 | - | tag 取值 key value/path |
|
||||
| `key` | string | 必填 | - | attrribute 名称 |
|
||||
| `value_source` | string | 必填 | - | attrribute 取值来源,可选值为 `fixed_value`, `request_header`, `request_body`, `response_header`, `response_body`, `response_streaming_body` |
|
||||
| `value` | string | 必填 | - | attrribute 取值 key value/path |
|
||||
| `rule` | string | 非必填 | - | 从流式响应中提取 attrribute 的规则,可选值为 `first`, `replace`, `append`|
|
||||
| `apply_to_log` | bool | 非必填 | false | 是否将提取的信息记录在日志中 |
|
||||
| `apply_to_span` | bool | 非必填 | false | 是否将提取的信息记录在链路追踪span中 |
|
||||
|
||||
value_source为 tag 值的取值来源,可选配置值有 4 个:
|
||||
- property : tag 值通过proxywasm.GetProperty()方法获取,value配置GetProperty()方法要提取的key名
|
||||
- requeset_header : tag 值通过http请求头获取,value配置为header key
|
||||
- request_body :tag 值通过请求body获取,value配置格式为 gjson的 GJSON PATH 语法
|
||||
- response_header : tag 值通过http响应头获取,value配置为header key
|
||||
`value_source` 的各种取值含义如下:
|
||||
|
||||
- `fixed_value`:固定值
|
||||
- `requeset_header` : attrribute 值通过 http 请求头获取,value 配置为 header key
|
||||
- `request_body` :attrribute 值通过请求 body 获取,value 配置格式为 gjson 的 jsonpath
|
||||
- `response_header` :attrribute 值通过 http 响应头获取,value 配置为header key
|
||||
- `response_body` :attrribute 值通过响应 body 获取,value 配置格式为 gjson 的 jsonpath
|
||||
- `response_streaming_body` :attrribute 值通过流式响应 body 获取,value 配置格式为 gjson 的 jsonpath
|
||||
|
||||
|
||||
当 `value_source` 为 `response_streaming_body` 时,应当配置 `rule`,用于指定如何从流式body中获取指定值,取值含义如下:
|
||||
|
||||
- `first`:多个chunk中取第一个有效chunk的值
|
||||
- `replace`:多个chunk中取最后一个有效chunk的值
|
||||
- `append`:拼接多个有效chunk中的值,可用于获取回答内容
|
||||
|
||||
## 配置示例
|
||||
如果希望在网关访问日志中记录ai-statistic相关的统计值,需要修改log_format,在原log_format基础上添加一个新字段,示例如下:
|
||||
|
||||
举例如下:
|
||||
```yaml
|
||||
tracing_label:
|
||||
- key: "session_id"
|
||||
value_source: "requeset_header"
|
||||
value: "session_id"
|
||||
- key: "user_content"
|
||||
value_source: "request_body"
|
||||
value: "input.messages.1.content"
|
||||
'{"ai_log":"%FILTER_STATE(wasm.ai_log:PLAIN)%"}'
|
||||
```
|
||||
|
||||
开启后 metrics 示例:
|
||||
### 空配置
|
||||
#### 监控
|
||||
```
|
||||
route_upstream_model_input_token{ai_route="openai",ai_cluster="qwen",ai_model="qwen-max"} 21
|
||||
route_upstream_model_output_token{ai_route="openai",ai_cluster="qwen",ai_model="qwen-max"} 17
|
||||
route_upstream_model_metric_input_token{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 10
|
||||
route_upstream_model_metric_llm_duration_count{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 1
|
||||
route_upstream_model_metric_llm_first_token_duration{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 309
|
||||
route_upstream_model_metric_llm_service_duration{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 1955
|
||||
route_upstream_model_metric_output_token{ai_route="llm",ai_cluster="outbound|443||qwen.dns",ai_model="qwen-turbo"} 69
|
||||
```
|
||||
|
||||
日志示例:
|
||||
|
||||
#### 日志
|
||||
```json
|
||||
{
|
||||
"model": "qwen-max",
|
||||
"input_token": "21",
|
||||
"output_token": "17",
|
||||
"authority": "dashscope.aliyuncs.com",
|
||||
"bytes_received": "336",
|
||||
"bytes_sent": "1675",
|
||||
"duration": "1590",
|
||||
"istio_policy_status": "-",
|
||||
"method": "POST",
|
||||
"path": "/v1/chat/completions",
|
||||
"protocol": "HTTP/1.1",
|
||||
"request_id": "5895f5a9-e4e3-425b-98db-6c6a926195b7",
|
||||
"requested_server_name": "-",
|
||||
"response_code": "200",
|
||||
"response_flags": "-",
|
||||
"route_name": "openai",
|
||||
"start_time": "2024-06-18T09:37:14.078Z",
|
||||
"trace_id": "-",
|
||||
"upstream_cluster": "qwen",
|
||||
"upstream_service_time": "496",
|
||||
"upstream_transport_failure_reason": "-",
|
||||
"user_agent": "PostmanRuntime/7.37.3",
|
||||
"x_forwarded_for": "-"
|
||||
"ai_log":"{\"model\":\"qwen-turbo\",\"input_token\":\"10\",\"output_token\":\"69\",\"llm_first_token_duration\":\"309\",\"llm_service_duration\":\"1955\"}"
|
||||
}
|
||||
```
|
||||
|
||||
#### 链路追踪
|
||||
配置为空时,不会在span中添加额外的attribute
|
||||
|
||||
### 从非openai协议提取token使用信息
|
||||
在ai-proxy中设置协议为original时,以百炼为例,可作如下配置指定如何提取model, input_token, output_token
|
||||
|
||||
```yaml
|
||||
attributes:
|
||||
- key: model
|
||||
value_source: response_body
|
||||
value: usage.models.0.model_id
|
||||
apply_to_log: true
|
||||
apply_to_span: false
|
||||
- key: input_token
|
||||
value_source: response_body
|
||||
value: usage.models.0.input_tokens
|
||||
apply_to_log: true
|
||||
apply_to_span: false
|
||||
- key: output_token
|
||||
value_source: response_body
|
||||
value: usage.models.0.output_tokens
|
||||
apply_to_log: true
|
||||
apply_to_span: false
|
||||
```
|
||||
#### 监控
|
||||
```
|
||||
route_upstream_model_metric_input_token{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 343
|
||||
route_upstream_model_metric_output_token{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 153
|
||||
route_upstream_model_metric_llm_service_duration{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 3725
|
||||
route_upstream_model_metric_llm_duration_count{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 1
|
||||
```
|
||||
|
||||
#### 日志
|
||||
此配置下日志效果如下:
|
||||
```json
|
||||
{
|
||||
"ai_log": "{\"model\":\"qwen-max\",\"input_token\":\"343\",\"output_token\":\"153\",\"llm_service_duration\":\"19110\"}"
|
||||
}
|
||||
```
|
||||
|
||||
#### 链路追踪
|
||||
链路追踪的 span 中可以看到 model, input_token, output_token 三个额外的 attribute
|
||||
|
||||
### 配合认证鉴权记录consumer
|
||||
举例如下:
|
||||
```yaml
|
||||
attributes:
|
||||
- key: consumer # 配合认证鉴权记录consumer
|
||||
value_source: request_header
|
||||
value: x-mse-consumer
|
||||
apply_to_log: true
|
||||
```
|
||||
|
||||
### 记录问题与回答
|
||||
```yaml
|
||||
attributes:
|
||||
- key: question # 记录问题
|
||||
value_source: request_body
|
||||
value: messages.@reverse.0.content
|
||||
apply_to_log: true
|
||||
- key: answer # 在流式响应中提取大模型的回答
|
||||
value_source: response_streaming_body
|
||||
value: choices.0.delta.content
|
||||
rule: append
|
||||
apply_to_log: true
|
||||
- key: answer # 在非流式响应中提取大模型的回答
|
||||
value_source: response_body
|
||||
value: choices.0.message.content
|
||||
apply_to_log: true
|
||||
```
|
||||
|
||||
## 进阶
|
||||
配合阿里云SLS数据加工,可以将ai相关的字段进行提取加工,例如原始日志为:
|
||||
|
||||
```
|
||||
ai_log:{"question":"用python计算2的3次方","answer":"你可以使用 Python 的乘方运算符 `**` 来计算一个数的次方。计算2的3次方,即2乘以自己2次,可以用以下代码表示:\n\n```python\nresult = 2 ** 3\nprint(result)\n```\n\n运行这段代码,你会得到输出结果为8,因为2乘以自己两次等于8。","model":"qwen-max","input_token":"16","output_token":"76","llm_service_duration":"5913"}
|
||||
```
|
||||
|
||||
使用如下数据加工脚本,可以提取出question和answer:
|
||||
|
||||
```
|
||||
e_regex("ai_log", grok("%{EXTRACTJSON}"))
|
||||
e_set("question", json_select(v("json"), "question", default="-"))
|
||||
e_set("answer", json_select(v("json"), "answer", default="-"))
|
||||
```
|
||||
|
||||
提取后,SLS中会添加question和answer两个字段,示例如下:
|
||||
|
||||
```
|
||||
ai_log:{"question":"用python计算2的3次方","answer":"你可以使用 Python 的乘方运算符 `**` 来计算一个数的次方。计算2的3次方,即2乘以自己2次,可以用以下代码表示:\n\n```python\nresult = 2 ** 3\nprint(result)\n```\n\n运行这段代码,你会得到输出结果为8,因为2乘以自己两次等于8。","model":"qwen-max","input_token":"16","output_token":"76","llm_service_duration":"5913"}
|
||||
|
||||
question:用python计算2的3次方
|
||||
|
||||
answer:你可以使用 Python 的乘方运算符 `**` 来计算一个数的次方。计算2的3次方,即2乘以自己2次,可以用以下代码表示:
|
||||
|
||||
result = 2 ** 3
|
||||
print(result)
|
||||
|
||||
运行这段代码,你会得到输出结果为8,因为2乘以自己两次等于8。
|
||||
|
||||
```
|
||||
Reference in New Issue
Block a user