mirror of
https://github.com/alibaba/higress.git
synced 2026-03-06 17:40:51 +08:00
feat: improve ai statistic plugin (#2671)
This commit is contained in:
@@ -5,7 +5,8 @@ description: AI可观测配置参考
|
||||
---
|
||||
|
||||
## 介绍
|
||||
提供AI可观测基础能力,包括 metric, log, trace,其后需接ai-proxy插件,如果不接ai-proxy插件的话,则需要用户进行相应配置才可生效。
|
||||
|
||||
提供 AI 可观测基础能力,包括 metric, log, trace,其后需接 ai-proxy 插件,如果不接 ai-proxy 插件的话,则需要用户进行相应配置才可生效。
|
||||
|
||||
## 运行属性
|
||||
|
||||
@@ -13,9 +14,10 @@ description: AI可观测配置参考
|
||||
插件执行优先级:`200`
|
||||
|
||||
## 配置说明
|
||||
插件默认请求符合openai协议格式,并提供了以下基础可观测值,用户无需特殊配置:
|
||||
|
||||
- metric:提供了输入token、输出token、首个token的rt(流式请求)、请求总rt等指标,支持在网关、路由、服务、模型四个维度上进行观测
|
||||
插件默认请求符合 openai 协议格式,并提供了以下基础可观测值,用户无需特殊配置:
|
||||
|
||||
- metric:提供了输入 token、输出 token、首个 token 的 rt(流式请求)、请求总 rt 等指标,支持在网关、路由、服务、模型四个维度上进行观测
|
||||
- log:提供了 input_token, output_token, model, llm_service_duration, llm_first_token_duration 等字段
|
||||
|
||||
用户还可以通过配置的方式对可观测的值进行扩展:
|
||||
@@ -25,45 +27,48 @@ description: AI可观测配置参考
|
||||
| `attributes` | []Attribute | 非必填 | - | 用户希望记录在log/span中的信息 |
|
||||
| `disable_openai_usage` | bool | 非必填 | false | 非openai兼容协议时,model、token的支持非标,配置为true时可以避免报错 |
|
||||
| `value_length_limit` | int | 非必填 | 4000 | 记录的单个value的长度限制 |
|
||||
| `enable_path_suffixes` | []string | 非必填 | [] | 只对这些特定路径后缀的请求生效,可以配置为 "\*" 以匹配所有路径(通配符检查会优先进行以提高性能)。如果为空数组,则对所有路径生效 |
|
||||
| `enable_content_types` | []string | 非必填 | [] | 只对这些内容类型的响应进行缓冲处理。如果为空数组,则对所有内容类型生效 |
|
||||
|
||||
Attribute 配置说明:
|
||||
|
||||
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
|
||||
|----------------|-------|-----|-----|------------------------|
|
||||
| `key` | string | 必填 | - | attribute 名称 |
|
||||
| `value_source` | string | 必填 | - | attribute 取值来源,可选值为 `fixed_value`, `request_header`, `request_body`, `response_header`, `response_body`, `response_streaming_body` |
|
||||
| `value` | string | 必填 | - | attribute 取值 key value/path |
|
||||
| `default_value` | string | 非必填 | - | attribute 默认值 |
|
||||
| `rule` | string | 非必填 | - | 从流式响应中提取 attribute 的规则,可选值为 `first`, `replace`, `append`|
|
||||
| `apply_to_log` | bool | 非必填 | false | 是否将提取的信息记录在日志中 |
|
||||
| `apply_to_span` | bool | 非必填 | false | 是否将提取的信息记录在链路追踪span中 |
|
||||
| `trace_span_key` | string | 非必填 | - | 链路追踪attribute key,默认会使用`key`的设置 |
|
||||
| `as_separate_log_field` | bool | 非必填 | false | 记录日志时是否作为单独的字段,日志字段名使用`key`的设置 |
|
||||
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
|
||||
| ----------------------- | -------- | -------- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| `key` | string | 必填 | - | attribute 名称 |
|
||||
| `value_source` | string | 必填 | - | attribute 取值来源,可选值为 `fixed_value`, `request_header`, `request_body`, `response_header`, `response_body`, `response_streaming_body` |
|
||||
| `value` | string | 必填 | - | attribute 取值 key value/path |
|
||||
| `default_value` | string | 非必填 | - | attribute 默认值 |
|
||||
| `rule` | string | 非必填 | - | 从流式响应中提取 attribute 的规则,可选值为 `first`, `replace`, `append` |
|
||||
| `apply_to_log` | bool | 非必填 | false | 是否将提取的信息记录在日志中 |
|
||||
| `apply_to_span` | bool | 非必填 | false | 是否将提取的信息记录在链路追踪 span 中 |
|
||||
| `trace_span_key` | string | 非必填 | - | 链路追踪 attribute key,默认会使用`key`的设置 |
|
||||
| `as_separate_log_field` | bool | 非必填 | false | 记录日志时是否作为单独的字段,日志字段名使用`key`的设置 |
|
||||
|
||||
`value_source` 的各种取值含义如下:
|
||||
|
||||
- `fixed_value`:固定值
|
||||
- `request_header` : attribute 值通过 http 请求头获取,value 配置为 header key
|
||||
- `request_body` :attribute 值通过请求 body 获取,value 配置格式为 gjson 的 jsonpath
|
||||
- `response_header` :attribute 值通过 http 响应头获取,value 配置为header key
|
||||
- `response_header` :attribute 值通过 http 响应头获取,value 配置为 header key
|
||||
- `response_body` :attribute 值通过响应 body 获取,value 配置格式为 gjson 的 jsonpath
|
||||
- `response_streaming_body` :attribute 值通过流式响应 body 获取,value 配置格式为 gjson 的 jsonpath
|
||||
|
||||
当 `value_source` 为 `response_streaming_body` 时,应当配置 `rule`,用于指定如何从流式 body 中获取指定值,取值含义如下:
|
||||
|
||||
当 `value_source` 为 `response_streaming_body` 时,应当配置 `rule`,用于指定如何从流式body中获取指定值,取值含义如下:
|
||||
|
||||
- `first`:多个chunk中取第一个有效chunk的值
|
||||
- `replace`:多个chunk中取最后一个有效chunk的值
|
||||
- `append`:拼接多个有效chunk中的值,可用于获取回答内容
|
||||
- `first`:多个 chunk 中取第一个有效 chunk 的值
|
||||
- `replace`:多个 chunk 中取最后一个有效 chunk 的值
|
||||
- `append`:拼接多个有效 chunk 中的值,可用于获取回答内容
|
||||
|
||||
## 配置示例
|
||||
如果希望在网关访问日志中记录ai-statistic相关的统计值,需要修改log_format,在原log_format基础上添加一个新字段,示例如下:
|
||||
|
||||
如果希望在网关访问日志中记录 ai-statistic 相关的统计值,需要修改 log_format,在原 log_format 基础上添加一个新字段,示例如下:
|
||||
|
||||
```yaml
|
||||
'{"ai_log":"%FILTER_STATE(wasm.ai_log:PLAIN)%"}'
|
||||
```
|
||||
|
||||
如果字段设置了 `as_separate_log_field`,例如:
|
||||
|
||||
```yaml
|
||||
attributes:
|
||||
- key: consumer
|
||||
@@ -73,12 +78,14 @@ attributes:
|
||||
as_separate_log_field: true
|
||||
```
|
||||
|
||||
那么要在日志中打印,需要额外设置log_format:
|
||||
那么要在日志中打印,需要额外设置 log_format:
|
||||
|
||||
```
|
||||
'{"consumer":"%FILTER_STATE(wasm.consumer:PLAIN)%"}'
|
||||
```
|
||||
|
||||
### 空配置
|
||||
|
||||
#### 监控
|
||||
|
||||
```
|
||||
@@ -120,17 +127,20 @@ irate(route_upstream_model_consumer_metric_llm_duration_count[2m])
|
||||
```
|
||||
|
||||
#### 日志
|
||||
|
||||
```json
|
||||
{
|
||||
"ai_log":"{\"model\":\"qwen-turbo\",\"input_token\":\"10\",\"output_token\":\"69\",\"llm_first_token_duration\":\"309\",\"llm_service_duration\":\"1955\"}"
|
||||
"ai_log": "{\"model\":\"qwen-turbo\",\"input_token\":\"10\",\"output_token\":\"69\",\"llm_first_token_duration\":\"309\",\"llm_service_duration\":\"1955\"}"
|
||||
}
|
||||
```
|
||||
|
||||
#### 链路追踪
|
||||
配置为空时,不会在span中添加额外的attribute
|
||||
|
||||
### 从非openai协议提取token使用信息
|
||||
在ai-proxy中设置协议为original时,以百炼为例,可作如下配置指定如何提取model, input_token, output_token
|
||||
配置为空时,不会在 span 中添加额外的 attribute
|
||||
|
||||
### 从非 openai 协议提取 token 使用信息
|
||||
|
||||
在 ai-proxy 中设置协议为 original 时,以百炼为例,可作如下配置指定如何提取 model, input_token, output_token
|
||||
|
||||
```yaml
|
||||
attributes:
|
||||
@@ -150,6 +160,7 @@ attributes:
|
||||
apply_to_log: true
|
||||
apply_to_span: false
|
||||
```
|
||||
|
||||
#### 监控
|
||||
|
||||
```
|
||||
@@ -160,7 +171,9 @@ route_upstream_model_consumer_metric_llm_duration_count{ai_route="bailian",ai_cl
|
||||
```
|
||||
|
||||
#### 日志
|
||||
|
||||
此配置下日志效果如下:
|
||||
|
||||
```json
|
||||
{
|
||||
"ai_log": "{\"model\":\"qwen-max\",\"input_token\":\"343\",\"output_token\":\"153\",\"llm_service_duration\":\"19110\"}"
|
||||
@@ -168,10 +181,13 @@ route_upstream_model_consumer_metric_llm_duration_count{ai_route="bailian",ai_cl
|
||||
```
|
||||
|
||||
#### 链路追踪
|
||||
|
||||
链路追踪的 span 中可以看到 model, input_token, output_token 三个额外的 attribute
|
||||
|
||||
### 配合认证鉴权记录consumer
|
||||
### 配合认证鉴权记录 consumer
|
||||
|
||||
举例如下:
|
||||
|
||||
```yaml
|
||||
attributes:
|
||||
- key: consumer # 配合认证鉴权记录consumer
|
||||
@@ -181,31 +197,33 @@ attributes:
|
||||
```
|
||||
|
||||
### 记录问题与回答
|
||||
|
||||
```yaml
|
||||
attributes:
|
||||
- key: question # 记录问题
|
||||
value_source: request_body
|
||||
value: messages.@reverse.0.content
|
||||
apply_to_log: true
|
||||
- key: answer # 在流式响应中提取大模型的回答
|
||||
- key: answer # 在流式响应中提取大模型的回答
|
||||
value_source: response_streaming_body
|
||||
value: choices.0.delta.content
|
||||
rule: append
|
||||
apply_to_log: true
|
||||
- key: answer # 在非流式响应中提取大模型的回答
|
||||
- key: answer # 在非流式响应中提取大模型的回答
|
||||
value_source: response_body
|
||||
value: choices.0.message.content
|
||||
apply_to_log: true
|
||||
```
|
||||
|
||||
## 进阶
|
||||
配合阿里云SLS数据加工,可以将ai相关的字段进行提取加工,例如原始日志为:
|
||||
|
||||
```
|
||||
配合阿里云 SLS 数据加工,可以将 ai 相关的字段进行提取加工,例如原始日志为:
|
||||
|
||||
````
|
||||
ai_log:{"question":"用python计算2的3次方","answer":"你可以使用 Python 的乘方运算符 `**` 来计算一个数的次方。计算2的3次方,即2乘以自己2次,可以用以下代码表示:\n\n```python\nresult = 2 ** 3\nprint(result)\n```\n\n运行这段代码,你会得到输出结果为8,因为2乘以自己两次等于8。","model":"qwen-max","input_token":"16","output_token":"76","llm_service_duration":"5913"}
|
||||
```
|
||||
````
|
||||
|
||||
使用如下数据加工脚本,可以提取出question和answer:
|
||||
使用如下数据加工脚本,可以提取出 question 和 answer:
|
||||
|
||||
```
|
||||
e_regex("ai_log", grok("%{EXTRACTJSON}"))
|
||||
@@ -213,9 +231,9 @@ e_set("question", json_select(v("json"), "question", default="-"))
|
||||
e_set("answer", json_select(v("json"), "answer", default="-"))
|
||||
```
|
||||
|
||||
提取后,SLS中会添加question和answer两个字段,示例如下:
|
||||
提取后,SLS 中会添加 question 和 answer 两个字段,示例如下:
|
||||
|
||||
```
|
||||
````
|
||||
ai_log:{"question":"用python计算2的3次方","answer":"你可以使用 Python 的乘方运算符 `**` 来计算一个数的次方。计算2的3次方,即2乘以自己2次,可以用以下代码表示:\n\n```python\nresult = 2 ** 3\nprint(result)\n```\n\n运行这段代码,你会得到输出结果为8,因为2乘以自己两次等于8。","model":"qwen-max","input_token":"16","output_token":"76","llm_service_duration":"5913"}
|
||||
|
||||
question:用python计算2的3次方
|
||||
@@ -227,4 +245,57 @@ print(result)
|
||||
|
||||
运行这段代码,你会得到输出结果为8,因为2乘以自己两次等于8。
|
||||
|
||||
```
|
||||
````
|
||||
|
||||
### 路径和内容类型过滤配置示例
|
||||
|
||||
#### 只处理特定 AI 路径
|
||||
|
||||
```yaml
|
||||
enable_path_suffixes:
|
||||
- "/v1/chat/completions"
|
||||
- "/v1/embeddings"
|
||||
- "/generateContent"
|
||||
```
|
||||
|
||||
#### 只处理特定内容类型
|
||||
|
||||
```yaml
|
||||
enable_content_types:
|
||||
- "text/event-stream"
|
||||
- "application/json"
|
||||
```
|
||||
|
||||
#### 处理所有路径(通配符)
|
||||
|
||||
```yaml
|
||||
enable_path_suffixes:
|
||||
- "*"
|
||||
```
|
||||
|
||||
#### 处理所有内容类型(空数组)
|
||||
|
||||
```yaml
|
||||
enable_content_types: []
|
||||
```
|
||||
|
||||
#### 完整配置示例
|
||||
|
||||
```yaml
|
||||
enable_path_suffixes:
|
||||
- "/v1/chat/completions"
|
||||
- "/v1/embeddings"
|
||||
- "/generateContent"
|
||||
enable_content_types:
|
||||
- "text/event-stream"
|
||||
- "application/json"
|
||||
attributes:
|
||||
- key: model
|
||||
value_source: request_body
|
||||
value: model
|
||||
apply_to_log: true
|
||||
- key: consumer
|
||||
value_source: request_header
|
||||
value: x-mse-consumer
|
||||
apply_to_log: true
|
||||
```
|
||||
|
||||
@@ -5,6 +5,7 @@ description: AI Statistics plugin configuration reference
|
||||
---
|
||||
|
||||
## Introduction
|
||||
|
||||
Provides basic AI observability capabilities, including metric, log, and trace. The ai-proxy plug-in needs to be connected afterwards. If the ai-proxy plug-in is not connected, the user needs to configure it accordingly to take effect.
|
||||
|
||||
## Runtime Properties
|
||||
@@ -13,6 +14,7 @@ Plugin Phase: `CUSTOM`
|
||||
Plugin Priority: `200`
|
||||
|
||||
## Configuration instructions
|
||||
|
||||
The default request of the plug-in conforms to the openai protocol format and provides the following basic observable values. Users do not need special configuration:
|
||||
|
||||
- metric: It provides indicators such as input token, output token, rt of the first token (streaming request), total request rt, etc., and supports observation in the four dimensions of gateway, routing, service, and model.
|
||||
@@ -25,21 +27,22 @@ Users can also expand observable values through configuration:
|
||||
| `attributes` | []Attribute | optional | - | Information that the user wants to record in log/span |
|
||||
| `disable_openai_usage` | bool | optional | false | When using a non-OpenAI-compatible protocol, the support for model and token is non-standard. Setting the configuration to true can prevent errors. |
|
||||
| `value_length_limit` | int | optional | 4000 | length limit for each value |
|
||||
|
||||
| `enable_path_suffixes` | []string | optional | ["/v1/chat/completions","/v1/completions","/v1/embeddings","/v1/models","/generateContent","/streamGenerateContent"] | Only effective for requests with these specific path suffixes, can be configured as "\*" to match all paths |
|
||||
| `enable_content_types` | []string | optional | ["text/event-stream","application/json"] | Only buffer response body for these content types |
|
||||
|
||||
Attribute Configuration instructions:
|
||||
|
||||
| Name | Type | Required | Default | Description |
|
||||
|----------------|-------|-----|-----|------------------------|
|
||||
| `key` | string | required | - | attribute key |
|
||||
| `value_source` | string | required | - | attribute value source, optional values are `fixed_value`, `request_header`, `request_body`, `response_header`, `response_body`, `response_streaming_body` |
|
||||
| `value` | string | required | - | how to get attribute value |
|
||||
| `default_value` | string | optional | - | default value for attribute |
|
||||
| `rule` | string | optional | - | Rule to extract attribute from streaming response, optional values are `first`, `replace`, `append`|
|
||||
| `apply_to_log` | bool | optional | false | Whether to record the extracted information in the log |
|
||||
| `apply_to_span` | bool | optional | false | Whether to record the extracted information in the link tracking span |
|
||||
| `trace_span_key` | string | optional | - | span attribute key, default is the value of `key` |
|
||||
| `as_separate_log_field` | bool | optional | false | Whether to use a separate log field, the field name is equal to the value of `key` |
|
||||
| Name | Type | Required | Default | Description |
|
||||
| ----------------------- | ------ | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ |
|
||||
| `key` | string | required | - | attribute key |
|
||||
| `value_source` | string | required | - | attribute value source, optional values are `fixed_value`, `request_header`, `request_body`, `response_header`, `response_body`, `response_streaming_body` |
|
||||
| `value` | string | required | - | how to get attribute value |
|
||||
| `default_value` | string | optional | - | default value for attribute |
|
||||
| `rule` | string | optional | - | Rule to extract attribute from streaming response, optional values are `first`, `replace`, `append` |
|
||||
| `apply_to_log` | bool | optional | false | Whether to record the extracted information in the log |
|
||||
| `apply_to_span` | bool | optional | false | Whether to record the extracted information in the link tracking span |
|
||||
| `trace_span_key` | string | optional | - | span attribute key, default is the value of `key` |
|
||||
| `as_separate_log_field` | bool | optional | false | Whether to use a separate log field, the field name is equal to the value of `key` |
|
||||
|
||||
The meanings of various values for `value_source` are as follows:
|
||||
|
||||
@@ -50,7 +53,6 @@ The meanings of various values for `value_source` are as follows:
|
||||
- `response_body`: The attribute is obtained through the http response body
|
||||
- `response_streaming_body`: The attribute is obtained through the http streaming response body
|
||||
|
||||
|
||||
When `value_source` is `response_streaming_body`, `rule` should be configured to specify how to obtain the specified value from the streaming body. The meaning of the value is as follows:
|
||||
|
||||
- `first`: extract value from the first valid chunk
|
||||
@@ -58,6 +60,7 @@ When `value_source` is `response_streaming_body`, `rule` should be configured to
|
||||
- `append`: join value pieces from all valid chunks
|
||||
|
||||
## Configuration example
|
||||
|
||||
If you want to record ai-statistic related statistical values in the gateway access log, you need to modify log_format and add a new field based on the original log_format. The example is as follows:
|
||||
|
||||
```yaml
|
||||
@@ -65,6 +68,7 @@ If you want to record ai-statistic related statistical values in the gateway acc
|
||||
```
|
||||
|
||||
If the field is set with `as_separate_log_field`, for example:
|
||||
|
||||
```yaml
|
||||
attributes:
|
||||
- key: consumer
|
||||
@@ -75,11 +79,13 @@ attributes:
|
||||
```
|
||||
|
||||
Then to print in the log, you need to set log_format additionally:
|
||||
|
||||
```
|
||||
'{"consumer":"%FILTER_STATE(wasm.consumer:PLAIN)%"}'
|
||||
```
|
||||
|
||||
### Empty
|
||||
|
||||
#### Metric
|
||||
|
||||
```
|
||||
@@ -121,16 +127,19 @@ irate(route_upstream_model_consumer_metric_llm_duration_count[2m])
|
||||
```
|
||||
|
||||
#### Log
|
||||
|
||||
```json
|
||||
{
|
||||
"ai_log":"{\"model\":\"qwen-turbo\",\"input_token\":\"10\",\"output_token\":\"69\",\"llm_first_token_duration\":\"309\",\"llm_service_duration\":\"1955\"}"
|
||||
"ai_log": "{\"model\":\"qwen-turbo\",\"input_token\":\"10\",\"output_token\":\"69\",\"llm_first_token_duration\":\"309\",\"llm_service_duration\":\"1955\"}"
|
||||
}
|
||||
```
|
||||
|
||||
#### Trace
|
||||
|
||||
When the configuration is empty, no additional attributes will be added to the span.
|
||||
|
||||
### Extract token usage information from non-openai protocols
|
||||
|
||||
When setting the protocol to original in ai-proxy, taking Alibaba Cloud Bailian as an example, you can make the following configuration to specify how to extract `model`, `input_token`, `output_token`
|
||||
|
||||
```yaml
|
||||
@@ -151,6 +160,7 @@ attributes:
|
||||
apply_to_log: true
|
||||
apply_to_span: false
|
||||
```
|
||||
|
||||
#### Metric
|
||||
|
||||
```
|
||||
@@ -161,6 +171,7 @@ route_upstream_model_consumer_metric_llm_duration_count{ai_route="bailian",ai_cl
|
||||
```
|
||||
|
||||
#### Log
|
||||
|
||||
```json
|
||||
{
|
||||
"ai_log": "{\"model\":\"qwen-max\",\"input_token\":\"343\",\"output_token\":\"153\",\"llm_service_duration\":\"19110\"}"
|
||||
@@ -168,9 +179,11 @@ route_upstream_model_consumer_metric_llm_duration_count{ai_route="bailian",ai_cl
|
||||
```
|
||||
|
||||
#### Trace
|
||||
|
||||
Three additional attributes `model`, `input_token`, and `output_token` can be seen in the trace spans.
|
||||
|
||||
### Cooperate with authentication and authentication record consumer
|
||||
|
||||
```yaml
|
||||
attributes:
|
||||
- key: consumer
|
||||
@@ -180,6 +193,7 @@ attributes:
|
||||
```
|
||||
|
||||
### Record questions and answers
|
||||
|
||||
```yaml
|
||||
attributes:
|
||||
- key: question
|
||||
@@ -195,4 +209,51 @@ attributes:
|
||||
value_source: response_body
|
||||
value: choices.0.message.content
|
||||
apply_to_log: true
|
||||
```
|
||||
```
|
||||
|
||||
### Path and Content Type Filtering Configuration Examples
|
||||
|
||||
#### Process Only Specific AI Paths
|
||||
|
||||
```yaml
|
||||
enable_path_suffixes:
|
||||
- "/v1/chat/completions"
|
||||
- "/v1/embeddings"
|
||||
- "/generateContent"
|
||||
```
|
||||
|
||||
#### Process Only Specific Content Types
|
||||
|
||||
```yaml
|
||||
enable_content_types:
|
||||
- "text/event-stream"
|
||||
- "application/json"
|
||||
```
|
||||
|
||||
#### Process All Paths (Wildcard)
|
||||
|
||||
```yaml
|
||||
enable_path_suffixes:
|
||||
- "*"
|
||||
```
|
||||
|
||||
#### Complete Configuration Example
|
||||
|
||||
```yaml
|
||||
enable_path_suffixes:
|
||||
- "/v1/chat/completions"
|
||||
- "/v1/embeddings"
|
||||
- "/generateContent"
|
||||
enable_content_types:
|
||||
- "text/event-stream"
|
||||
- "application/json"
|
||||
attributes:
|
||||
- key: model
|
||||
value_source: request_body
|
||||
value: model
|
||||
apply_to_log: true
|
||||
- key: consumer
|
||||
value_source: request_header
|
||||
value: x-mse-consumer
|
||||
apply_to_log: true
|
||||
```
|
||||
|
||||
@@ -44,6 +44,15 @@ const (
|
||||
APIName = "api"
|
||||
ConsumerKey = "x-mse-consumer"
|
||||
RequestPath = "request_path"
|
||||
SkipProcessing = "skip_processing"
|
||||
|
||||
// AI API Paths
|
||||
PathOpenAIChatCompletions = "/v1/chat/completions"
|
||||
PathOpenAICompletions = "/v1/completions"
|
||||
PathOpenAIEmbeddings = "/v1/embeddings"
|
||||
PathOpenAIModels = "/v1/models"
|
||||
PathGeminiGenerateContent = "/generateContent"
|
||||
PathGeminiStreamGenerateContent = "/streamGenerateContent"
|
||||
|
||||
// Source Type
|
||||
FixedValue = "fixed_value"
|
||||
@@ -100,6 +109,10 @@ type AIStatisticsConfig struct {
|
||||
// If disableOpenaiUsage is true, model/input_token/output_token logs will be skipped
|
||||
disableOpenaiUsage bool
|
||||
valueLengthLimit int
|
||||
// Path suffixes to enable the plugin on
|
||||
enablePathSuffixes []string
|
||||
// Content types to enable response body buffering
|
||||
enableContentTypes []string
|
||||
}
|
||||
|
||||
func generateMetricName(route, cluster, model, consumer, metricName string) string {
|
||||
@@ -147,6 +160,41 @@ func (config *AIStatisticsConfig) incrementCounter(metricName string, inc uint64
|
||||
counter.Increment(inc)
|
||||
}
|
||||
|
||||
// isPathEnabled checks if the request path matches any of the enabled path suffixes
|
||||
func isPathEnabled(requestPath string, enabledSuffixes []string) bool {
|
||||
if len(enabledSuffixes) == 0 {
|
||||
return true // If no path suffixes configured, enable for all
|
||||
}
|
||||
|
||||
// Remove query parameters from path
|
||||
pathWithoutQuery := requestPath
|
||||
if queryPos := strings.Index(requestPath, "?"); queryPos != -1 {
|
||||
pathWithoutQuery = requestPath[:queryPos]
|
||||
}
|
||||
|
||||
// Check if path ends with any enabled suffix
|
||||
for _, suffix := range enabledSuffixes {
|
||||
if strings.HasSuffix(pathWithoutQuery, suffix) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// isContentTypeEnabled checks if the content type matches any of the enabled content types
|
||||
func isContentTypeEnabled(contentType string, enabledContentTypes []string) bool {
|
||||
if len(enabledContentTypes) == 0 {
|
||||
return true // If no content types configured, enable for all
|
||||
}
|
||||
|
||||
for _, enabledType := range enabledContentTypes {
|
||||
if strings.Contains(contentType, enabledType) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func parseConfig(configJson gjson.Result, config *AIStatisticsConfig) error {
|
||||
// Parse tracing span attributes setting.
|
||||
attributeConfigs := configJson.Get("attributes").Array()
|
||||
@@ -177,10 +225,49 @@ func parseConfig(configJson gjson.Result, config *AIStatisticsConfig) error {
|
||||
// Parse openai usage config setting.
|
||||
config.disableOpenaiUsage = configJson.Get("disable_openai_usage").Bool()
|
||||
|
||||
// Parse path suffix configuration
|
||||
pathSuffixes := configJson.Get("enable_path_suffixes").Array()
|
||||
config.enablePathSuffixes = make([]string, 0, len(pathSuffixes))
|
||||
|
||||
for _, suffix := range pathSuffixes {
|
||||
suffixStr := suffix.String()
|
||||
if suffixStr == "*" {
|
||||
// Clear the suffixes list since * means all paths are enabled
|
||||
config.enablePathSuffixes = make([]string, 0)
|
||||
break
|
||||
}
|
||||
config.enablePathSuffixes = append(config.enablePathSuffixes, suffixStr)
|
||||
}
|
||||
|
||||
// Parse content type configuration
|
||||
contentTypes := configJson.Get("enable_content_types").Array()
|
||||
config.enableContentTypes = make([]string, 0, len(contentTypes))
|
||||
|
||||
for _, contentType := range contentTypes {
|
||||
contentTypeStr := contentType.String()
|
||||
if contentTypeStr == "*" {
|
||||
// Clear the content types list since * means all content types are enabled
|
||||
config.enableContentTypes = make([]string, 0)
|
||||
break
|
||||
}
|
||||
config.enableContentTypes = append(config.enableContentTypes, contentTypeStr)
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
func onHttpRequestHeaders(ctx wrapper.HttpContext, config AIStatisticsConfig) types.Action {
|
||||
// Check if request path matches enabled suffixes
|
||||
requestPath, _ := proxywasm.GetHttpRequestHeader(":path")
|
||||
if !isPathEnabled(requestPath, config.enablePathSuffixes) {
|
||||
log.Debugf("ai-statistics: skipping request for path %s (not in enabled suffixes)", requestPath)
|
||||
// Set skip processing flag and avoid reading request/response body
|
||||
ctx.SetContext(SkipProcessing, true)
|
||||
ctx.DontReadRequestBody()
|
||||
ctx.DontReadResponseBody()
|
||||
return types.ActionContinue
|
||||
}
|
||||
|
||||
ctx.DisableReroute()
|
||||
route, _ := getRouteName()
|
||||
cluster, _ := getClusterName()
|
||||
@@ -212,6 +299,11 @@ func onHttpRequestHeaders(ctx wrapper.HttpContext, config AIStatisticsConfig) ty
|
||||
}
|
||||
|
||||
func onHttpRequestBody(ctx wrapper.HttpContext, config AIStatisticsConfig, body []byte) types.Action {
|
||||
// Check if processing should be skipped
|
||||
if ctx.GetBoolContext(SkipProcessing, false) {
|
||||
return types.ActionContinue
|
||||
}
|
||||
|
||||
// Set user defined log & span attributes.
|
||||
setAttributeBySource(ctx, config, RequestBody, body)
|
||||
// Set span attributes for ARMS.
|
||||
@@ -254,6 +346,15 @@ func onHttpRequestBody(ctx wrapper.HttpContext, config AIStatisticsConfig, body
|
||||
|
||||
func onHttpResponseHeaders(ctx wrapper.HttpContext, config AIStatisticsConfig) types.Action {
|
||||
contentType, _ := proxywasm.GetHttpResponseHeader("content-type")
|
||||
|
||||
if !isContentTypeEnabled(contentType, config.enableContentTypes) {
|
||||
log.Debugf("ai-statistics: skipping response for content type %s (not in enabled content types)", contentType)
|
||||
// Set skip processing flag and avoid reading response body
|
||||
ctx.SetContext(SkipProcessing, true)
|
||||
ctx.DontReadResponseBody()
|
||||
return types.ActionContinue
|
||||
}
|
||||
|
||||
if !strings.Contains(contentType, "text/event-stream") {
|
||||
ctx.BufferResponseBody()
|
||||
}
|
||||
@@ -265,6 +366,11 @@ func onHttpResponseHeaders(ctx wrapper.HttpContext, config AIStatisticsConfig) t
|
||||
}
|
||||
|
||||
func onHttpStreamingBody(ctx wrapper.HttpContext, config AIStatisticsConfig, data []byte, endOfStream bool) []byte {
|
||||
// Check if processing should be skipped
|
||||
if ctx.GetBoolContext(SkipProcessing, false) {
|
||||
return data
|
||||
}
|
||||
|
||||
// Buffer stream body for record log & span attributes
|
||||
if config.shouldBufferStreamingBody {
|
||||
streamingBodyBuffer, ok := ctx.GetContext(CtxStreamingBodyBuffer).([]byte)
|
||||
@@ -334,6 +440,11 @@ func onHttpStreamingBody(ctx wrapper.HttpContext, config AIStatisticsConfig, dat
|
||||
}
|
||||
|
||||
func onHttpResponseBody(ctx wrapper.HttpContext, config AIStatisticsConfig, body []byte) types.Action {
|
||||
// Check if processing should be skipped
|
||||
if ctx.GetBoolContext(SkipProcessing, false) {
|
||||
return types.ActionContinue
|
||||
}
|
||||
|
||||
// Get requestStartTime from http context
|
||||
requestStartTime, _ := ctx.GetContext(StatisticsRequestStartTime).(int64)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user