fix(ai-statistics): use RuleAppend as default for streaming answer extraction (#3505)这个

2026-05-10 13:57:27 +08:00 · 2026-02-14 13:58:55 +08:00
parent 67e2913f3d
commit 6f8ef2ff69
2 changed files with 142 additions and 0 deletions
--- a/plugins/wasm-go/extensions/ai-statistics/README.md
+++ b/plugins/wasm-go/extensions/ai-statistics/README.md
@@ -332,6 +332,146 @@ attributes:
 2. **性能分析**：分析推理 token 占比，评估推理模型的实际开销
 3. **使用统计**：细粒度统计各类 token 的使用情况

+## 流式响应观测能力
+
+流式（Streaming）响应是 AI 对话的常见场景，插件提供了完善的流式观测支持，能够正确拼接和提取流式响应中的关键信息。
+
+### 流式响应的挑战
+
+流式响应将完整内容拆分为多个 SSE chunk 逐步返回，例如：
+
+```
+data: {"choices":[{"delta":{"content":"Hello"}}]}
+data: {"choices":[{"delta":{"content":" 👋"}}]}
+data: {"choices":[{"delta":{"content":"!"}}]}
+data: [DONE]
+```
+
+要获取完整的回答内容，需要将各个 chunk 中的 `delta.content` 拼接起来。
+
+### 自动拼接机制
+
+插件针对不同类型的内容提供了自动拼接能力：
+
+| 内容类型 | 拼接方式 | 说明 |
+|---------|---------|------|
+| `answer` | 文本追加（append） | 将各 chunk 的 `delta.content` 按顺序拼接成完整回答 |
+| `reasoning` | 文本追加（append） | 将各 chunk 的 `delta.reasoning_content` 按顺序拼接 |
+| `tool_calls` | 按 index 组装 | 识别每个工具调用的 `index`，分别拼接各自的 `arguments` |
+
+#### answer 和 reasoning 拼接示例
+
+流式响应：
+```
+data: {"choices":[{"delta":{"content":"你好"}}]}
+data: {"choices":[{"delta":{"content":"，我是"}}]}
+data: {"choices":[{"delta":{"content":"AI助手"}}]}
+```
+
+最终提取的 `answer`：`"你好，我是AI助手"`
+
+#### tool_calls 拼接示例
+
+流式响应（多个并行工具调用）：
+```
+data: {"choices":[{"delta":{"tool_calls":[{"index":0,"id":"call_001","function":{"name":"get_weather"}}]}}]}
+data: {"choices":[{"delta":{"tool_calls":[{"index":1,"id":"call_002","function":{"name":"get_time"}}]}}]}
+data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"city\":"}}]}}]}
+data: {"choices":[{"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\"Beijing\"}"}}]}}]}
+data: {"choices":[{"delta":{"tool_calls":[{"index":1,"function":{"arguments":"{\"city\":\"Shanghai\"}"}}]}}]}
+```
+
+最终提取的 `tool_calls`：
+```json
+[
+  {"index":0,"id":"call_001","function":{"name":"get_weather","arguments":"{\"city\":\"Beijing\"}"}},
+  {"index":1,"id":"call_002","function":{"name":"get_time","arguments":"{\"city\":\"Shanghai\"}"}}
+]
+```
+
+### 使用默认配置快速启用
+
+通过 `use_default_attributes: true` 可以一键启用完整的流式观测能力：
+
+```yaml
+use_default_attributes: true
+```
+
+此配置会自动记录以下字段：
+
+| 字段 | 说明 |
+|------|------|
+| `messages` | 完整对话历史 |
+| `question` | 最后一条用户消息 |
+| `answer` | AI 回答（自动拼接流式 chunk） |
+| `reasoning` | 推理过程（自动拼接流式 chunk） |
+| `tool_calls` | 工具调用（自动按 index 组装） |
+| `reasoning_tokens` | 推理 token 数 |
+| `cached_tokens` | 缓存命中 token 数 |
+| `input_token_details` | 输入 token 详情 |
+| `output_token_details` | 输出 token 详情 |
+
+### 流式日志示例
+
+启用默认配置后，一个流式请求的日志输出示例：
+
+```json
+{
+  "answer": "2 plus 2 equals 4.",
+  "question": "What is 2+2?",
+  "response_type": "stream",
+  "tool_calls": null,
+  "reasoning": null,
+  "model": "glm-4-flash",
+  "input_token": 10,
+  "output_token": 8,
+  "llm_first_token_duration": 425,
+  "llm_service_duration": 985,
+  "chat_id": "chat_abc123"
+}
+```
+
+包含工具调用的流式日志示例：
+
+```json
+{
+  "answer": null,
+  "question": "What's the weather in Beijing?",
+  "response_type": "stream",
+  "tool_calls": [
+    {
+      "id": "call_abc123",
+      "type": "function",
+      "function": {
+        "name": "get_weather",
+        "arguments": "{\"location\": \"Beijing\"}"
+      }
+    }
+  ],
+  "model": "glm-4-flash",
+  "input_token": 50,
+  "output_token": 15,
+  "llm_first_token_duration": 300,
+  "llm_service_duration": 500
+}
+```
+
+### 流式特有指标
+
+流式响应会额外记录以下指标：
+
+- `llm_first_token_duration`：从请求发出到收到首个 token 的时间（首字延迟）
+- `llm_stream_duration_count`：流式请求次数
+
+可用于监控流式响应的用户体验：
+
+```promql
+# 平均首字延迟
+irate(route_upstream_model_consumer_metric_llm_first_token_duration[5m])
+/
+irate(route_upstream_model_consumer_metric_llm_stream_duration_count[5m])
+```
+
 ## 调试

 ### 验证 ai_log 内容
--- a/plugins/wasm-go/extensions/ai-statistics/main.go
+++ b/plugins/wasm-go/extensions/ai-statistics/main.go
@@ -153,10 +153,12 @@ func getDefaultAttributes() []Attribute {
 		{
 			Key:        BuiltinAnswerKey,
 			ApplyToLog: true,
+			Rule:       RuleAppend, // Streaming responses need to append content from all chunks
 		},
 		{
 			Key:        BuiltinReasoningKey,
 			ApplyToLog: true,
+			Rule:       RuleAppend, // Streaming responses need to append content from all chunks
 		},
 		{
 			Key:        BuiltinToolCallsKey,