Add release notes

Add release notes (#3524 )
Co-authored-by: johnlanni <6763318+johnlanni@users.noreply.github.com>
2026-02-26 05:30:50 +08:00 · 2026-02-22 12:31:15 +00:00 · 2026-02-22 20:14:09 +08:00 · 2026-02-21 14:14:22 +08:00 · 2026-02-20 23:30:48 +08:00 · 2026-02-19 12:47:34 +08:00
245 changed files with 40330 additions and 3896 deletions
--- a/.claude/skills/agent-session-monitor/QUICKSTART.md
+++ b/.claude/skills/agent-session-monitor/QUICKSTART.md
@@ -0,0 +1,138 @@
+# Agent Session Monitor - Quick Start
+
+实时Agent对话观测程序，用于监控Higress访问日志，追踪多轮对话的token开销和模型使用情况。
+
+## 快速开始
+
+### 1. 运行Demo
+
+```bash
+cd example
+bash demo.sh
+```
+
+这将：
+- 解析示例日志文件
+- 列出所有session
+- 显示session详细信息（包括完整的messages、question、answer、reasoning、tool_calls）
+- 按模型和日期统计token开销
+- 导出FinOps报表
+
+### 2. 启动Web界面（推荐）
+
+```bash
+# 先解析日志生成session数据
+python3 main.py --log-path /var/log/higress/access.log --output-dir ./sessions
+
+# 启动Web服务器
+python3 scripts/webserver.py --data-dir ./sessions --port 8888
+
+# 浏览器访问
+open http://localhost:8888
+```
+
+Web界面功能：
+- 📊 总览所有session，按模型分组统计
+- 🔍 点击session ID下钻查看完整对话
+- 💬 查看每轮的messages、question、answer、reasoning、tool_calls
+- 💰 实时计算token开销和成本
+- 🔄 每30秒自动刷新
+
+### 3. 在Clawdbot对话中使用
+
+当用户询问当前会话token消耗时，生成观测链接：
+
+```
+你的当前会话ID: agent:main:discord:channel:1465367993012981988
+
+查看详情：http://localhost:8888/session?id=agent:main:discord:channel:1465367993012981988
+
+点击可以看到：
+✅ 完整对话历史（每轮messages）
+✅ Token消耗明细
+✅ 工具调用记录
+✅ 成本统计
+```
+
+### 4. 使用CLI查询（可选）
+
+```bash
+# 查看session详细信息
+python3 scripts/cli.py show <session-id>
+
+# 列出所有session
+python3 scripts/cli.py list
+
+# 按模型统计
+python3 scripts/cli.py stats-model
+
+# 导出报表
+python3 scripts/cli.py export finops-report.json
+```
+
+## 核心功能
+
+✅ **完整对话追踪**：记录每轮对话的完整messages、question、answer、reasoning、tool_calls  
+✅ **Token开销统计**：区分input/output/reasoning/cached token，实时计算成本  
+✅ **Session聚合**：按session_id关联多轮对话  
+✅ **Web可视化界面**：浏览器访问，总览+下钻查看session详情  
+✅ **实时URL生成**：Clawdbot可根据当前会话ID生成观测链接  
+✅ **FinOps报表**：导出JSON/CSV格式的成本分析报告  
+
+## 日志格式要求
+
+Higress访问日志需要包含ai_log字段（JSON格式），示例：
+
+```json
+{
+  "__file_offset__": "1000",
+  "timestamp": "2026-02-01T09:30:15Z",
+  "ai_log": "{\"session_id\":\"sess_abc\",\"messages\":[...],\"question\":\"...\",\"answer\":\"...\",\"input_token\":250,\"output_token\":160,\"model\":\"Qwen3-rerank\"}"
+}
+```
+
+ai_log字段支持的属性：
+- `session_id`: 会话标识（必需）
+- `messages`: 完整对话历史
+- `question`: 当前轮次问题
+- `answer`: AI回答
+- `reasoning`: 思考过程（DeepSeek等模型）
+- `tool_calls`: 工具调用列表
+- `input_token`: 输入token数
+- `output_token`: 输出token数
+- `model`: 模型名称
+- `response_type`: 响应类型
+
+## 输出目录结构
+
+```
+sessions/
+├── agent:main:discord:1465367993012981988.json
+└── agent:test:discord:9999999999999999999.json
+```
+
+每个session文件包含：
+- 基本信息（创建时间、更新时间、模型）
+- Token统计（总输入、总输出、总reasoning、总cached）
+- 对话轮次列表（每轮的完整messages、question、answer、reasoning、tool_calls）
+
+## 常见问题
+
+**Q: 如何在Higress中配置session_id header？**  
+A: 在ai-statistics插件中配置`session_id_header`，或使用默认header（x-openclaw-session-key、x-clawdbot-session-key等）。详见PR #3420。
+
+**Q: 支持哪些模型的pricing？**  
+A: 目前支持Qwen、DeepSeek、GPT-4、Claude等主流模型。可以在main.py的TOKEN_PRICING字典中添加新模型。
+
+**Q: 如何实时监控日志文件变化？**  
+A: 直接运行main.py即可，程序使用定时轮询机制（每秒自动检查一次），无需安装额外依赖。
+
+**Q: CLI查询速度慢？**  
+A: 大量session时，可以使用`--limit`限制结果数量，或按条件过滤（如`--sort-by cost`只查看成本最高的session）。
+
+## 下一步
+
+- 集成到Higress FinOps Dashboard
+- 支持更多模型的pricing
+- 添加趋势预测和异常检测
+- 支持多数据源聚合分析
--- a/.claude/skills/agent-session-monitor/README.md
+++ b/.claude/skills/agent-session-monitor/README.md
@@ -0,0 +1,71 @@
+# Agent Session Monitor
+
+Real-time agent conversation monitoring for Clawdbot, designed to monitor Higress access logs and track token usage across multi-turn conversations.
+
+## Features
+
+- 🔍 **Complete Conversation Tracking**: Records messages, question, answer, reasoning, tool_calls for each turn
+- 💰 **Token Usage Statistics**: Distinguishes input/output/reasoning/cached tokens, calculates costs in real-time
+- 🌐 **Web Visualization**: Browser-based UI with overview and drill-down into session details
+- 🔗 **Real-time URL Generation**: Clawdbot can generate observation links based on current session ID
+- 🔄 **Log Rotation Support**: Automatically handles rotated log files (access.log, access.log.1, etc.)
+- 📊 **FinOps Reporting**: Export usage data in JSON/CSV formats
+
+## Quick Start
+
+### 1. Run Demo
+
+```bash
+cd example
+bash demo.sh
+```
+
+### 2. Start Web UI
+
+```bash
+# Parse logs
+python3 main.py --log-path /var/log/higress/access.log --output-dir ./sessions
+
+# Start web server
+python3 scripts/webserver.py --data-dir ./sessions --port 8888
+
+# Access in browser
+open http://localhost:8888
+```
+
+### 3. Use in Clawdbot
+
+When users ask "How many tokens did this conversation use?", you can respond with:
+
+```
+Your current session statistics:
+- Session ID: agent:main:discord:channel:1465367993012981988
+- View details: http://localhost:8888/session?id=agent:main:discord:channel:1465367993012981988
+
+Click to see:
+✅ Complete conversation history
+✅ Token usage breakdown per turn
+✅ Tool call records
+✅ Cost statistics
+```
+
+## Files
+
+- `main.py`: Background monitor, parses Higress access logs
+- `scripts/webserver.py`: Web server, provides browser-based UI
+- `scripts/cli.py`: Command-line tools for queries and exports
+- `example/`: Demo examples and test data
+
+## Dependencies
+
+- Python 3.8+
+- No external dependencies (uses only standard library)
+
+## Documentation
+
+- `SKILL.md`: Main skill documentation
+- `QUICKSTART.md`: Quick start guide
+
+## License
+
+MIT
--- a/.claude/skills/agent-session-monitor/SKILL.md
+++ b/.claude/skills/agent-session-monitor/SKILL.md
@@ -0,0 +1,376 @@
+---
+name: agent-session-monitor
+description: Real-time agent conversation monitoring - monitors Higress access logs, aggregates conversations by session, tracks token usage. Supports web interface for viewing complete conversation history and costs. Use when users ask about current session token consumption, conversation history, or cost statistics.
+
+---
+
+## Overview
+
+Real-time monitoring of Higress access logs, extracting ai_log JSON, grouping multi-turn conversations by session_id, and calculating token costs with visualization.
+
+### Core Features
+
+- **Real-time Log Monitoring**: Monitors Higress access log files, parses new ai_log entries in real-time
+- **Log Rotation Support**: Full logrotate support, automatically tracks access.log.1~5 etc.
+- **Incremental Parsing**: Inode-based tracking, processes only new content, no duplicates
+- **Session Grouping**: Associates multi-turn conversations by session_id (each turn is a separate request)
+- **Complete Conversation Tracking**: Records messages, question, answer, reasoning, tool_calls for each turn
+- **Token Usage Tracking**: Distinguishes input/output/reasoning/cached tokens
+- **Web Visualization**: Browser-based UI with overview and session drill-down
+- **Real-time URL Generation**: Clawdbot can generate observation links based on current session ID
+- **Background Processing**: Independent process, continuously parses access logs
+- **State Persistence**: Maintains parsing progress and session data across runs
+
+## Usage
+
+### 1. Background Monitoring (Continuous)
+
+```bash
+# Parse Higress access logs (with log rotation support)
+python3 main.py --log-path /var/log/proxy/access.log --output-dir ./sessions
+
+# Filter by session key
+python3 main.py --log-path /var/log/proxy/access.log --session-key <session-id>
+
+# Scheduled task (incremental parsing every minute)
+* * * * * python3 /path/to/main.py --log-path /var/log/proxy/access.log --output-dir /var/lib/sessions
+```
+
+### 2. Start Web UI (Recommended)
+
+```bash
+# Start web server
+python3 scripts/webserver.py --data-dir ./sessions --port 8888
+
+# Access in browser
+open http://localhost:8888
+```
+
+Web UI features:
+- 📊 Overview: View all session statistics and group by model
+- 🔍 Session Details: Click session ID to drill down into complete conversation history
+- 💬 Conversation Log: Display messages, question, answer, reasoning, tool_calls for each turn
+- 💰 Cost Statistics: Real-time token usage and cost calculation
+- 🔄 Auto Refresh: Updates every 30 seconds
+
+### 3. Use in Clawdbot Conversations
+
+When users ask about current session token consumption or conversation history:
+
+1. Get current session_id (from runtime or context)
+2. Generate web UI URL and return to user
+
+Example response:
+
+```
+Your current session statistics:
+- Session ID: agent:main:discord:channel:1465367993012981988
+- View details: http://localhost:8888/session?id=agent:main:discord:channel:1465367993012981988
+
+Click the link to see:
+✅ Complete conversation history
+✅ Token usage breakdown per turn
+✅ Tool call records
+✅ Cost statistics
+```
+
+### 4. CLI Queries (Optional)
+
+```bash
+# View specific session details
+python3 scripts/cli.py show <session-id>
+
+# List all sessions
+python3 scripts/cli.py list --sort-by cost --limit 10
+
+# Statistics by model
+python3 scripts/cli.py stats-model
+
+# Statistics by date (last 7 days)
+python3 scripts/cli.py stats-date --days 7
+
+# Export reports
+python3 scripts/cli.py export finops-report.json
+```
+
+## Configuration
+
+### main.py (Background Monitor)
+
+| Parameter | Description | Required | Default |
+|-----------|-------------|----------|---------|
+| `--log-path` | Higress access log file path | Yes | /var/log/higress/access.log |
+| `--output-dir` | Session data storage directory | No | ./sessions |
+| `--session-key` | Monitor only specified session key | No | Monitor all sessions |
+| `--state-file` | State file path (records read offsets) | No | <output-dir>/.state.json |
+| `--refresh-interval` | Log refresh interval (seconds) | No | 1 |
+
+### webserver.py (Web UI)
+
+| Parameter | Description | Required | Default |
+|-----------|-------------|----------|---------|
+| `--data-dir` | Session data directory | No | ./sessions |
+| `--port` | HTTP server port | No | 8888 |
+| `--host` | HTTP server address | No | 0.0.0.0 |
+
+## Output Examples
+
+### 1. Real-time Monitor
+
+```
+🔍 Session Monitor - Active
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+
+📊 Active Sessions: 3
+
+┌──────────────────────────┬─────────┬──────────┬───────────┐
+│ Session ID               │ Msgs    │ Input    │ Output    │
+├──────────────────────────┼─────────┼──────────┼───────────┤
+│ sess_abc123              │       5 │    1,250 │       800 │
+│ sess_xyz789              │       3 │      890 │       650 │
+│ sess_def456              │       8 │    2,100 │     1,200 │
+└──────────────────────────┴─────────┴──────────┴───────────┘
+
+📈 Token Statistics
+  Total Input:   4240 tokens
+  Total Output:  2650 tokens
+  Total Cached:  0 tokens
+  Total Cost:    $0.00127
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+```
+
+### 2. CLI Session Details
+
+```bash
+$ python3 scripts/cli.py show agent:main:discord:channel:1465367993012981988
+
+======================================================================
+📊 Session Detail: agent:main:discord:channel:1465367993012981988
+======================================================================
+
+🕐 Created:  2026-02-01T09:30:00+08:00
+🕑 Updated:  2026-02-01T10:35:12+08:00
+🤖 Model:    Qwen3-rerank
+💬 Messages: 5
+
+📈 Token Statistics:
+   Input:           1,250 tokens
+   Output:            800 tokens
+   Reasoning:         150 tokens
+   Total:           2,200 tokens
+
+💰 Estimated Cost: $0.00126000 USD
+
+📝 Conversation Rounds (5):
+──────────────────────────────────────────────────────────────────────
+
+  Round 1 @ 2026-02-01T09:30:15+08:00
+    Tokens: 250 in → 160 out
+    🔧 Tool calls: Yes
+    Messages (2):
+      [user] Check Beijing weather
+    ❓ Question: Check Beijing weather
+    ✅ Answer: Checking Beijing weather for you...
+    🧠 Reasoning: User wants to know Beijing weather, I need to call weather API.
+    🛠️  Tool Calls:
+       - get_weather({"location":"Beijing"})
+```
+
+### 3. Statistics by Model
+
+```bash
+$ python3 scripts/cli.py stats-model
+
+================================================================================
+📊 Statistics by Model
+================================================================================
+
+Model                Sessions   Input           Output          Cost (USD)  
+────────────────────────────────────────────────────────────────────────────
+Qwen3-rerank         12         15,230          9,840           $  0.016800
+DeepSeek-R1          5          8,450           6,200           $  0.010600
+Qwen-Max             3          4,200           3,100           $  0.008300
+GPT-4                2          2,100           1,800           $  0.017100
+────────────────────────────────────────────────────────────────────────────
+TOTAL                22         29,980          20,940          $  0.052800
+
+================================================================================
+```
+
+### 4. Statistics by Date
+
+```bash
+$ python3 scripts/cli.py stats-date --days 7
+
+================================================================================
+📊 Statistics by Date (Last 7 days)
+================================================================================
+
+Date         Sessions   Input           Output          Cost (USD)   Models              
+────────────────────────────────────────────────────────────────────────────
+2026-01-26   3          2,100           1,450           $  0.0042   Qwen3-rerank
+2026-01-27   5          4,850           3,200           $  0.0096   Qwen3-rerank, GPT-4
+2026-01-28   4          3,600           2,800           $  0.0078   DeepSeek-R1, Qwen
+────────────────────────────────────────────────────────────────────────────
+TOTAL        22         29,980          20,940          $  0.0528
+
+================================================================================
+```
+
+### 5. Web UI (Recommended)
+
+Access `http://localhost:8888` to see:
+
+**Home Page:**
+- 📊 Total sessions, token consumption, cost cards
+- 📋 Recent sessions list (clickable for details)
+- 📈 Statistics by model table
+
+**Session Detail Page:**
+- 💬 Complete conversation log (messages, question, answer, reasoning, tool_calls per turn)
+- 🔧 Tool call history
+- 💰 Token usage breakdown and costs
+
+**Features:**
+- 🔄 Auto-refresh every 30 seconds
+- 📱 Responsive design, mobile-friendly
+- 🎨 Clean UI, easy to read
+
+## Session Data Structure
+
+Each session is stored as an independent JSON file with complete conversation history and token statistics:
+
+```json
+{
+  "session_id": "agent:main:discord:channel:1465367993012981988",
+  "created_at": "2026-02-01T10:30:00Z",
+  "updated_at": "2026-02-01T10:35:12Z",
+  "messages_count": 5,
+  "total_input_tokens": 1250,
+  "total_output_tokens": 800,
+  "total_reasoning_tokens": 150,
+  "total_cached_tokens": 0,
+  "model": "Qwen3-rerank",
+  "rounds": [
+    {
+      "round": 1,
+      "timestamp": "2026-02-01T10:30:15Z",
+      "input_tokens": 250,
+      "output_tokens": 160,
+      "reasoning_tokens": 0,
+      "cached_tokens": 0,
+      "model": "Qwen3-rerank",
+      "has_tool_calls": true,
+      "response_type": "normal",
+      "messages": [
+        {
+          "role": "system",
+          "content": "You are a helpful assistant..."
+        },
+        {
+          "role": "user",
+          "content": "Check Beijing weather"
+        }
+      ],
+      "question": "Check Beijing weather",
+      "answer": "Checking Beijing weather for you...",
+      "reasoning": "User wants to know Beijing weather, need to call weather API.",
+      "tool_calls": [
+        {
+          "index": 0,
+          "id": "call_abc123",
+          "type": "function",
+          "function": {
+            "name": "get_weather",
+            "arguments": "{\"location\":\"Beijing\"}"
+          }
+        }
+      ],
+      "input_token_details": {"cached_tokens": 0},
+      "output_token_details": {}
+    }
+  ]
+}
+```
+
+### Field Descriptions
+
+**Session Level:**
+- `session_id`: Unique session identifier (from ai_log's session_id field)
+- `created_at`: Session creation time
+- `updated_at`: Last update time
+- `messages_count`: Number of conversation turns
+- `total_input_tokens`: Cumulative input tokens
+- `total_output_tokens`: Cumulative output tokens
+- `total_reasoning_tokens`: Cumulative reasoning tokens (DeepSeek, o1, etc.)
+- `total_cached_tokens`: Cumulative cached tokens (prompt caching)
+- `model`: Current model in use
+
+**Round Level (rounds):**
+- `round`: Turn number
+- `timestamp`: Current turn timestamp
+- `input_tokens`: Input tokens for this turn
+- `output_tokens`: Output tokens for this turn
+- `reasoning_tokens`: Reasoning tokens (o1, etc.)
+- `cached_tokens`: Cached tokens (prompt caching)
+- `model`: Model used for this turn
+- `has_tool_calls`: Whether includes tool calls
+- `response_type`: Response type (normal/error, etc.)
+- `messages`: Complete conversation history (OpenAI messages format)
+- `question`: User's question for this turn (last user message)
+- `answer`: AI's answer for this turn
+- `reasoning`: AI's thinking process (if model supports)
+- `tool_calls`: Tool call list (if any)
+- `input_token_details`: Complete input token details (JSON)
+- `output_token_details`: Complete output token details (JSON)
+
+## Log Format Requirements
+
+Higress access logs must include ai_log field (JSON format). Example:
+
+```json
+{
+  "__file_offset__": "1000",
+  "timestamp": "2026-02-01T09:30:15Z",
+  "ai_log": "{\"session_id\":\"sess_abc\",\"messages\":[...],\"question\":\"...\",\"answer\":\"...\",\"input_token\":250,\"output_token\":160,\"model\":\"Qwen3-rerank\"}"
+}
+```
+
+Supported ai_log attributes:
+- `session_id`: Session identifier (required)
+- `messages`: Complete conversation history
+- `question`: Question for current turn
+- `answer`: AI answer
+- `reasoning`: Thinking process (DeepSeek, o1, etc.)
+- `reasoning_tokens`: Reasoning token count (from PR #3424)
+- `cached_tokens`: Cached token count (from PR #3424)
+- `tool_calls`: Tool call list
+- `input_token`: Input token count
+- `output_token`: Output token count
+- `input_token_details`: Complete input token details (JSON)
+- `output_token_details`: Complete output token details (JSON)
+- `model`: Model name
+- `response_type`: Response type
+
+## Implementation
+
+### Technology Stack
+
+- **Log Parsing**: Direct JSON parsing, no regex needed
+- **File Monitoring**: Polling-based (no watchdog dependency)
+- **Session Management**: In-memory + disk hybrid storage
+- **Token Calculation**: Model-specific pricing for GPT-4, Qwen, Claude, o1, etc.
+
+### Privacy and Security
+
+- ✅ Does not record conversation content in logs, only token statistics
+- ✅ Session data stored locally, not uploaded to external services
+- ✅ Supports log file path allowlist
+- ✅ Session key access control
+
+### Performance Optimization
+
+- Incremental log parsing, avoids full scans
+- In-memory session data with periodic persistence
+- Optimized log file reading (offset tracking)
+- Inode-based file identification (handles rotation efficiently)
--- a/.claude/skills/agent-session-monitor/example/clawdbot_demo.py
+++ b/.claude/skills/agent-session-monitor/example/clawdbot_demo.py
@@ -0,0 +1,101 @@
+#!/usr/bin/env python3
+"""
+演示如何在Clawdbot中生成Session观测URL
+"""
+
+from urllib.parse import quote
+
+def generate_session_url(session_id: str, base_url: str = "http://localhost:8888") -> dict:
+    """
+    生成session观测URL
+    
+    Args:
+        session_id: 当前会话的session ID
+        base_url: Web服务器基础URL
+    
+    Returns:
+        包含各种URL的字典
+    """
+    # URL编码session_id（处理特殊字符）
+    encoded_id = quote(session_id, safe='')
+    
+    return {
+        "session_detail": f"{base_url}/session?id={encoded_id}",
+        "api_session": f"{base_url}/api/session?id={encoded_id}",
+        "index": f"{base_url}/",
+        "api_sessions": f"{base_url}/api/sessions",
+        "api_stats": f"{base_url}/api/stats",
+    }
+
+
+def format_response_message(session_id: str, base_url: str = "http://localhost:8888") -> str:
+    """
+    生成给用户的回复消息
+    
+    Args:
+        session_id: 当前会话的session ID
+        base_url: Web服务器基础URL
+    
+    Returns:
+        格式化的回复消息
+    """
+    urls = generate_session_url(session_id, base_url)
+    
+    return f"""你的当前会话信息：
+
+📊 **Session ID**: `{session_id}`
+
+🔗 **查看详情**: {urls['session_detail']}
+
+点击链接可以看到：
+✅ 完整对话历史（每轮messages）
+✅ Token消耗明细（input/output/reasoning）
+✅ 工具调用记录
+✅ 实时成本统计
+
+**更多链接：**
+- 📋 所有会话: {urls['index']}
+- 📥 API数据: {urls['api_session']}
+- 📊 总体统计: {urls['api_stats']}
+"""
+
+
+# 示例使用
+if __name__ == '__main__':
+    # 模拟clawdbot的session ID
+    demo_session_id = "agent:main:discord:channel:1465367993012981988"
+    
+    print("=" * 70)
+    print("🤖 Clawdbot Session Monitor Demo")
+    print("=" * 70)
+    print()
+    
+    # 生成URL
+    urls = generate_session_url(demo_session_id)
+    
+    print("生成的URL：")
+    print(f"  Session详情: {urls['session_detail']}")
+    print(f"  API数据:     {urls['api_session']}")
+    print(f"  总览页面:    {urls['index']}")
+    print()
+    
+    # 生成回复消息
+    message = format_response_message(demo_session_id)
+    
+    print("回复消息模板：")
+    print("-" * 70)
+    print(message)
+    print("-" * 70)
+    print()
+    
+    print("✅ 在Clawdbot中，你可以直接返回上面的消息给用户")
+    print()
+    
+    # 测试特殊字符的session ID
+    special_session_id = "agent:test:session/with?special&chars"
+    special_urls = generate_session_url(special_session_id)
+    
+    print("特殊字符处理示例：")
+    print(f"  原始ID: {special_session_id}")
+    print(f"  URL:    {special_urls['session_detail']}")
+    print()
--- a/.claude/skills/agent-session-monitor/example/demo.sh
+++ b/.claude/skills/agent-session-monitor/example/demo.sh
@@ -0,0 +1,101 @@
+#!/bin/bash
+# Agent Session Monitor - 演示脚本
+
+set -e
+
+SKILL_DIR="$(dirname "$(dirname "$(realpath "$0")")")"
+EXAMPLE_DIR="$SKILL_DIR/example"
+LOG_FILE="$EXAMPLE_DIR/test_access.log"
+OUTPUT_DIR="$EXAMPLE_DIR/sessions"
+
+echo "========================================"
+echo "Agent Session Monitor - Demo"
+echo "========================================"
+echo ""
+
+# 清理旧数据
+if [ -d "$OUTPUT_DIR" ]; then
+    echo "🧹 Cleaning up old session data..."
+    rm -rf "$OUTPUT_DIR"
+fi
+
+echo "📂 Log file: $LOG_FILE"
+echo "📁 Output dir: $OUTPUT_DIR"
+echo ""
+
+# 步骤1：解析日志文件（单次模式）
+echo "========================================"
+echo "步骤1：解析日志文件"
+echo "========================================"
+python3 "$SKILL_DIR/main.py" \
+    --log-path "$LOG_FILE" \
+    --output-dir "$OUTPUT_DIR"
+
+echo ""
+echo "✅ 日志解析完成！Session数据已保存到: $OUTPUT_DIR"
+echo ""
+
+# 步骤2：列出所有session
+echo "========================================"
+echo "步骤2：列出所有session"
+echo "========================================"
+python3 "$SKILL_DIR/scripts/cli.py" list \
+    --data-dir "$OUTPUT_DIR" \
+    --limit 10
+
+# 步骤3：查看第一个session的详细信息
+echo "========================================"
+echo "步骤3：查看session详细信息"
+echo "========================================"
+FIRST_SESSION=$(ls -1 "$OUTPUT_DIR"/*.json | head -1 | xargs -I {} basename {} .json)
+python3 "$SKILL_DIR/scripts/cli.py" show "$FIRST_SESSION" \
+    --data-dir "$OUTPUT_DIR"
+
+# 步骤4：按模型统计
+echo "========================================"
+echo "步骤4：按模型统计token开销"
+echo "========================================"
+python3 "$SKILL_DIR/scripts/cli.py" stats-model \
+    --data-dir "$OUTPUT_DIR"
+
+# 步骤5：按日期统计
+echo "========================================"
+echo "步骤5：按日期统计token开销"
+echo "========================================"
+python3 "$SKILL_DIR/scripts/cli.py" stats-date \
+    --data-dir "$OUTPUT_DIR" \
+    --days 7
+
+# 步骤6：导出FinOps报表
+echo "========================================"
+echo "步骤6：导出FinOps报表"
+echo "========================================"
+python3 "$SKILL_DIR/scripts/cli.py" export "$EXAMPLE_DIR/finops-report.json" \
+    --data-dir "$OUTPUT_DIR" \
+    --format json
+
+echo ""
+echo "✅ 报表已导出到: $EXAMPLE_DIR/finops-report.json"
+echo ""
+
+# 显示报表内容
+if [ -f "$EXAMPLE_DIR/finops-report.json" ]; then
+    echo "📊 FinOps报表内容："
+    echo "========================================"
+    cat "$EXAMPLE_DIR/finops-report.json" | python3 -m json.tool | head -50
+    echo "..."
+fi
+
+echo ""
+echo "========================================"
+echo "✅ Demo完成！"
+echo "========================================"
+echo ""
+echo "💡 提示："
+echo "  - Session数据保存在: $OUTPUT_DIR/"
+echo "  - FinOps报表: $EXAMPLE_DIR/finops-report.json"
+echo "  - 使用 'python3 scripts/cli.py --help' 查看更多命令"
+echo ""
+echo "🌐 启动Web界面查看："
+echo "  python3 $SKILL_DIR/scripts/webserver.py --data-dir $OUTPUT_DIR --port 8888"
+echo "  然后访问: http://localhost:8888"
--- a/.claude/skills/agent-session-monitor/example/demo_v2.sh
+++ b/.claude/skills/agent-session-monitor/example/demo_v2.sh
@@ -0,0 +1,76 @@
+#!/bin/bash
+# Agent Session Monitor - Demo for PR #3424 token details
+
+set -e
+
+SKILL_DIR="$(dirname "$(dirname "$(realpath "$0")")")"
+EXAMPLE_DIR="$SKILL_DIR/example"
+LOG_FILE="$EXAMPLE_DIR/test_access_v2.log"
+OUTPUT_DIR="$EXAMPLE_DIR/sessions_v2"
+
+echo "========================================"
+echo "Agent Session Monitor - Token Details Demo"
+echo "========================================"
+echo ""
+
+# 清理旧数据
+if [ -d "$OUTPUT_DIR" ]; then
+    echo "🧹 Cleaning up old session data..."
+    rm -rf "$OUTPUT_DIR"
+fi
+
+echo "📂 Log file: $LOG_FILE"
+echo "📁 Output dir: $OUTPUT_DIR"
+echo ""
+
+# 步骤1：解析日志文件
+echo "========================================"
+echo "步骤1：解析日志文件（包含token details）"
+echo "========================================"
+python3 "$SKILL_DIR/main.py" \
+    --log-path "$LOG_FILE" \
+    --output-dir "$OUTPUT_DIR"
+
+echo ""
+echo "✅ 日志解析完成！Session数据已保存到: $OUTPUT_DIR"
+echo ""
+
+# 步骤2：查看使用prompt caching的session（gpt-4o）
+echo "========================================"
+echo "步骤2：查看GPT-4o session（包含cached tokens）"
+echo "========================================"
+python3 "$SKILL_DIR/scripts/cli.py" show "agent:main:discord:1465367993012981988" \
+    --data-dir "$OUTPUT_DIR"
+
+# 步骤3：查看使用reasoning的session（o1）
+echo "========================================"
+echo "步骤3：查看o1 session（包含reasoning tokens）"
+echo "========================================"
+python3 "$SKILL_DIR/scripts/cli.py" show "agent:main:discord:9999999999999999999" \
+    --data-dir "$OUTPUT_DIR"
+
+# 步骤4：按模型统计
+echo "========================================"
+echo "步骤4：按模型统计（包含新token类型）"
+echo "========================================"
+python3 "$SKILL_DIR/scripts/cli.py" stats-model \
+    --data-dir "$OUTPUT_DIR"
+
+echo ""
+echo "========================================"
+echo "✅ Demo完成！"
+echo "========================================"
+echo ""
+echo "💡 新功能说明："
+echo "  ✅ cached_tokens - 缓存命中的token数（prompt caching）"
+echo "  ✅ reasoning_tokens - 推理token数（o1等模型）"
+echo "  ✅ input_token_details - 完整输入token详情（JSON）"
+echo "  ✅ output_token_details - 完整输出token详情（JSON）"
+echo ""
+echo "💰 成本计算已优化："
+echo "  - cached tokens通常比regular input便宜（50-90%折扣）"
+echo "  - reasoning tokens单独计费（o1系列）"
+echo ""
+echo "🌐 启动Web界面查看："
+echo "  python3 $SKILL_DIR/scripts/webserver.py --data-dir $OUTPUT_DIR --port 8889"
+echo "  然后访问: http://localhost:8889"
--- a/.claude/skills/agent-session-monitor/example/test_access.log
+++ b/.claude/skills/agent-session-monitor/example/test_access.log
@@ -0,0 +1,4 @@
+{"__file_offset__":"1000","timestamp":"2026-02-01T09:30:15Z","ai_log":"{\"session_id\":\"agent:main:discord:1465367993012981988\",\"api\":\"Qwen3-rerank@higress\",\"api_type\":\"LLM\",\"chat_round\":1,\"consumer\":\"clawdbot\",\"input_token\":250,\"output_token\":160,\"model\":\"Qwen3-rerank\",\"response_type\":\"normal\",\"total_token\":410,\"messages\":[{\"role\":\"system\",\"content\":\"You are a helpful assistant.\"},{\"role\":\"user\",\"content\":\"查询北京天气\"}],\"question\":\"查询北京天气\",\"answer\":\"正在为您查询北京天气...\",\"reasoning\":\"用户想知道北京的天气，我需要调用天气查询工具。\",\"tool_calls\":[{\"index\":0,\"id\":\"call_abc123\",\"type\":\"function\",\"function\":{\"name\":\"get_weather\",\"arguments\":\"{\\\"location\\\":\\\"Beijing\\\"}\"}}]}"}
+{"__file_offset__":"2000","timestamp":"2026-02-01T09:32:00Z","ai_log":"{\"session_id\":\"agent:main:discord:1465367993012981988\",\"api\":\"Qwen3-rerank@higress\",\"api_type\":\"LLM\",\"chat_round\":2,\"consumer\":\"clawdbot\",\"input_token\":320,\"output_token\":180,\"model\":\"Qwen3-rerank\",\"response_type\":\"normal\",\"total_token\":500,\"messages\":[{\"role\":\"tool\",\"content\":\"{\\\"temperature\\\": 15, \\\"weather\\\": \\\"晴\\\"}\"}],\"question\":\"\",\"answer\":\"北京今天天气晴朗，温度15°C。\",\"reasoning\":\"\",\"tool_calls\":[]}"}
+{"__file_offset__":"3000","timestamp":"2026-02-01T09:35:12Z","ai_log":"{\"session_id\":\"agent:main:discord:1465367993012981988\",\"api\":\"Qwen3-rerank@higress\",\"api_type\":\"LLM\",\"chat_round\":3,\"consumer\":\"clawdbot\",\"input_token\":380,\"output_token\":220,\"model\":\"Qwen3-rerank\",\"response_type\":\"normal\",\"total_token\":600,\"messages\":[{\"role\":\"user\",\"content\":\"谢谢！\"},{\"role\":\"assistant\",\"content\":\"不客气！如果还有其他问题，随时问我。\"}],\"question\":\"谢谢！\",\"answer\":\"不客气！如果还有其他问题，随时问我。\",\"reasoning\":\"\",\"tool_calls\":[]}"}
+{"__file_offset__":"4000","timestamp":"2026-02-01T10:00:00Z","ai_log":"{\"session_id\":\"agent:test:discord:9999999999999999999\",\"api\":\"DeepSeek-R1@higress\",\"api_type\":\"LLM\",\"chat_round\":1,\"consumer\":\"clawdbot\",\"input_token\":50,\"output_token\":30,\"model\":\"DeepSeek-R1\",\"response_type\":\"normal\",\"total_token\":80,\"messages\":[{\"role\":\"user\",\"content\":\"计算2+2\"}],\"question\":\"计算2+2\",\"answer\":\"4\",\"reasoning\":\"这是一个简单的加法运算，2加2等于4。\",\"tool_calls\":[]}"}
--- a/.claude/skills/agent-session-monitor/example/test_access_v2.log
+++ b/.claude/skills/agent-session-monitor/example/test_access_v2.log
@@ -0,0 +1,4 @@
+{"__file_offset__":"1000","timestamp":"2026-02-01T10:00:00Z","ai_log":"{\"session_id\":\"agent:main:discord:1465367993012981988\",\"api\":\"gpt-4o\",\"api_type\":\"LLM\",\"chat_round\":1,\"consumer\":\"clawdbot\",\"input_token\":150,\"output_token\":100,\"reasoning_tokens\":0,\"cached_tokens\":120,\"input_token_details\":\"{\\\"cached_tokens\\\":120}\",\"output_token_details\":\"{}\",\"model\":\"gpt-4o\",\"response_type\":\"normal\",\"total_token\":250,\"messages\":[{\"role\":\"system\",\"content\":\"You are a helpful assistant.\"},{\"role\":\"user\",\"content\":\"你好\"}],\"question\":\"你好\",\"answer\":\"你好！有什么我可以帮助你的吗？\",\"reasoning\":\"\",\"tool_calls\":[]}"}
+{"__file_offset__":"2000","timestamp":"2026-02-01T10:01:00Z","ai_log":"{\"session_id\":\"agent:main:discord:1465367993012981988\",\"api\":\"gpt-4o\",\"api_type\":\"LLM\",\"chat_round\":2,\"consumer\":\"clawdbot\",\"input_token\":200,\"output_token\":150,\"reasoning_tokens\":0,\"cached_tokens\":80,\"input_token_details\":\"{\\\"cached_tokens\\\":80}\",\"output_token_details\":\"{}\",\"model\":\"gpt-4o\",\"response_type\":\"normal\",\"total_token\":350,\"messages\":[{\"role\":\"user\",\"content\":\"介绍一下你的能力\"}],\"question\":\"介绍一下你的能力\",\"answer\":\"我可以帮助你回答问题、写作、编程等...\",\"reasoning\":\"\",\"tool_calls\":[]}"}
+{"__file_offset__":"3000","timestamp":"2026-02-01T10:02:00Z","ai_log":"{\"session_id\":\"agent:main:discord:9999999999999999999\",\"api\":\"o1\",\"api_type\":\"LLM\",\"chat_round\":1,\"consumer\":\"clawdbot\",\"input_token\":100,\"output_token\":80,\"reasoning_tokens\":500,\"cached_tokens\":0,\"input_token_details\":\"{}\",\"output_token_details\":\"{\\\"reasoning_tokens\\\":500}\",\"model\":\"o1\",\"response_type\":\"normal\",\"total_token\":580,\"messages\":[{\"role\":\"user\",\"content\":\"解释量子纠缠\"}],\"question\":\"解释量子纠缠\",\"answer\":\"量子纠缠是量子力学中的一种现象...\",\"reasoning\":\"这是一个复杂的物理概念，我需要仔细思考如何用简单的方式解释...\",\"tool_calls\":[]}"}
+{"__file_offset__":"4000","timestamp":"2026-02-01T10:03:00Z","ai_log":"{\"session_id\":\"agent:main:discord:1465367993012981988\",\"api\":\"gpt-4o\",\"api_type\":\"LLM\",\"chat_round\":3,\"consumer\":\"clawdbot\",\"input_token\":300,\"output_token\":200,\"reasoning_tokens\":0,\"cached_tokens\":200,\"input_token_details\":\"{\\\"cached_tokens\\\":200}\",\"output_token_details\":\"{}\",\"model\":\"gpt-4o\",\"response_type\":\"normal\",\"total_token\":500,\"messages\":[{\"role\":\"user\",\"content\":\"写一个Python函数计算斐波那契数列\"}],\"question\":\"写一个Python函数计算斐波那契数列\",\"answer\":\"```python\\ndef fibonacci(n):\\n    if n <= 1:\\n        return n\\n    return fibonacci(n-1) + fibonacci(n-2)\\n```\",\"reasoning\":\"\",\"tool_calls\":[]}"}
--- a/.claude/skills/agent-session-monitor/example/test_rotation.sh
+++ b/.claude/skills/agent-session-monitor/example/test_rotation.sh
@@ -0,0 +1,137 @@
+#!/bin/bash
+# 测试日志轮转功能
+
+set -e
+
+SKILL_DIR="$(dirname "$(dirname "$(realpath "$0")")")"
+EXAMPLE_DIR="$SKILL_DIR/example"
+TEST_DIR="$EXAMPLE_DIR/rotation_test"
+LOG_FILE="$TEST_DIR/access.log"
+OUTPUT_DIR="$TEST_DIR/sessions"
+
+echo "========================================"
+echo "Log Rotation Test"
+echo "========================================"
+echo ""
+
+# 清理旧测试数据
+rm -rf "$TEST_DIR"
+mkdir -p "$TEST_DIR"
+
+echo "📁 Test directory: $TEST_DIR"
+echo ""
+
+# 模拟日志轮转场景
+echo "========================================"
+echo "步骤1：创建初始日志文件"
+echo "========================================"
+
+# 创建第一批日志（10条）
+for i in {1..10}; do
+    echo "{\"timestamp\":\"2026-02-01T10:0${i}:00Z\",\"ai_log\":\"{\\\"session_id\\\":\\\"session_001\\\",\\\"model\\\":\\\"gpt-4o\\\",\\\"input_token\\\":$((100+i)),\\\"output_token\\\":$((50+i)),\\\"cached_tokens\\\":$((30+i))}\"}" >> "$LOG_FILE"
+done
+
+echo "✅ Created $LOG_FILE with 10 lines"
+echo ""
+
+# 首次解析
+echo "========================================"
+echo "步骤2：首次解析（应该处理10条记录）"
+echo "========================================"
+python3 "$SKILL_DIR/main.py" \
+    --log-path "$LOG_FILE" \
+    --output-dir "$OUTPUT_DIR" \
+    
+
+echo ""
+
+# 检查session数据
+echo "Session数据："
+cat "$OUTPUT_DIR/session_001.json" | python3 -c "import sys, json; d=json.load(sys.stdin); print(f\"  Messages: {d['messages_count']}, Total Input: {d['total_input_tokens']}\")"
+echo ""
+
+# 模拟日志轮转
+echo "========================================"
+echo "步骤3：模拟日志轮转"
+echo "========================================"
+mv "$LOG_FILE" "$LOG_FILE.1"
+echo "✅ Rotated: access.log -> access.log.1"
+echo ""
+
+# 创建新的日志文件（5条新记录）
+for i in {11..15}; do
+    echo "{\"timestamp\":\"2026-02-01T10:${i}:00Z\",\"ai_log\":\"{\\\"session_id\\\":\\\"session_001\\\",\\\"model\\\":\\\"gpt-4o\\\",\\\"input_token\\\":$((100+i)),\\\"output_token\\\":$((50+i)),\\\"cached_tokens\\\":$((30+i))}\"}" >> "$LOG_FILE"
+done
+
+echo "✅ Created new $LOG_FILE with 5 lines"
+echo ""
+
+# 再次解析（应该只处理新的5条）
+echo "========================================"
+echo "步骤4：再次解析（应该只处理新的5条）"
+echo "========================================"
+python3 "$SKILL_DIR/main.py" \
+    --log-path "$LOG_FILE" \
+    --output-dir "$OUTPUT_DIR" \
+    
+
+echo ""
+
+# 检查session数据
+echo "Session数据："
+cat "$OUTPUT_DIR/session_001.json" | python3 -c "import sys, json; d=json.load(sys.stdin); print(f\"  Messages: {d['messages_count']}, Total Input: {d['total_input_tokens']} (应该是15条记录)\")"
+echo ""
+
+# 再次轮转
+echo "========================================"
+echo "步骤5：再次轮转"
+echo "========================================"
+mv "$LOG_FILE.1" "$LOG_FILE.2"
+mv "$LOG_FILE" "$LOG_FILE.1"
+echo "✅ Rotated: access.log -> access.log.1"
+echo "✅ Rotated: access.log.1 -> access.log.2"
+echo ""
+
+# 创建新的日志文件（3条新记录）
+for i in {16..18}; do
+    echo "{\"timestamp\":\"2026-02-01T10:${i}:00Z\",\"ai_log\":\"{\\\"session_id\\\":\\\"session_001\\\",\\\"model\\\":\\\"gpt-4o\\\",\\\"input_token\\\":$((100+i)),\\\"output_token\\\":$((50+i)),\\\"cached_tokens\\\":$((30+i))}\"}" >> "$LOG_FILE"
+done
+
+echo "✅ Created new $LOG_FILE with 3 lines"
+echo ""
+
+# 再次解析（应该只处理新的3条）
+echo "========================================"
+echo "步骤6：再次解析（应该只处理新的3条）"
+echo "========================================"
+python3 "$SKILL_DIR/main.py" \
+    --log-path "$LOG_FILE" \
+    --output-dir "$OUTPUT_DIR" \
+    
+
+echo ""
+
+# 检查session数据
+echo "Session数据："
+cat "$OUTPUT_DIR/session_001.json" | python3 -c "import sys, json; d=json.load(sys.stdin); print(f\"  Messages: {d['messages_count']}, Total Input: {d['total_input_tokens']} (应该是18条记录)\")"
+echo ""
+
+# 检查状态文件
+echo "========================================"
+echo "步骤7：查看状态文件"
+echo "========================================"
+echo "状态文件内容："
+cat "$OUTPUT_DIR/.state.json" | python3 -m json.tool | head -20
+echo ""
+
+echo "========================================"
+echo "✅ 测试完成！"
+echo "========================================"
+echo ""
+echo "💡 验证要点："
+echo "  1. 首次解析处理了10条记录"
+echo "  2. 轮转后只处理新增的5条记录（总计15条）"
+echo "  3. 再次轮转后只处理新增的3条记录（总计18条）"
+echo "  4. 状态文件记录了每个文件的inode和offset"
+echo ""
+echo "📂 测试数据保存在: $TEST_DIR/"
--- a/.claude/skills/agent-session-monitor/main.py
+++ b/.claude/skills/agent-session-monitor/main.py
@@ -0,0 +1,639 @@
+#!/usr/bin/env python3
+"""
+Agent Session Monitor - 实时Agent对话观测程序
+监控Higress访问日志，按session聚合对话，追踪token开销
+"""
+
+import argparse
+import json
+import re
+import os
+import sys
+import time
+from collections import defaultdict
+from datetime import datetime
+from pathlib import Path
+from typing import Dict, List, Optional
+
+# 使用定时轮询机制，不依赖watchdog
+
+# ============================================================================
+# 配置
+# ============================================================================
+
+# Token定价（单位：美元/1M tokens）
+TOKEN_PRICING = {
+    "Qwen": {
+        "input": 0.0002,  # $0.2/1M
+        "output": 0.0006,
+        "cached": 0.0001,  # cached tokens通常是input的50%
+    },
+    "Qwen3-rerank": {
+        "input": 0.0003,
+        "output": 0.0012,
+        "cached": 0.00015,
+    },
+    "Qwen-Max": {
+        "input": 0.0005,
+        "output": 0.002,
+        "cached": 0.00025,
+    },
+    "GPT-4": {
+        "input": 0.003,
+        "output": 0.006,
+        "cached": 0.0015,
+    },
+    "GPT-4o": {
+        "input": 0.0025,
+        "output": 0.01,
+        "cached": 0.00125,  # GPT-4o prompt caching: 50% discount
+    },
+    "GPT-4-32k": {
+        "input": 0.01,
+        "output": 0.03,
+        "cached": 0.005,
+    },
+    "o1": {
+        "input": 0.015,
+        "output": 0.06,
+        "cached": 0.0075,
+        "reasoning": 0.06,  # o1 reasoning tokens same as output
+    },
+    "o1-mini": {
+        "input": 0.003,
+        "output": 0.012,
+        "cached": 0.0015,
+        "reasoning": 0.012,
+    },
+    "Claude": {
+        "input": 0.015,
+        "output": 0.075,
+        "cached": 0.0015,  # Claude prompt caching: 90% discount
+    },
+    "DeepSeek-R1": {
+        "input": 0.004,
+        "output": 0.012,
+        "reasoning": 0.002,
+        "cached": 0.002,
+    }
+}
+
+DEFAULT_LOG_PATH = "/var/log/higress/access.log"
+DEFAULT_OUTPUT_DIR = "./sessions"
+
+# ============================================================================
+# Session管理器
+# ============================================================================
+
+class SessionManager:
+    """管理多个会话的token统计"""
+    
+    def __init__(self, output_dir: str, load_existing: bool = True):
+        self.output_dir = Path(output_dir)
+        self.output_dir.mkdir(parents=True, exist_ok=True)
+        self.sessions: Dict[str, dict] = {}
+        
+        # 加载已有的session数据
+        if load_existing:
+            self._load_existing_sessions()
+    
+    def _load_existing_sessions(self):
+        """加载已有的session数据"""
+        loaded_count = 0
+        for session_file in self.output_dir.glob("*.json"):
+            try:
+                with open(session_file, 'r', encoding='utf-8') as f:
+                    session = json.load(f)
+                    session_id = session.get('session_id')
+                    if session_id:
+                        self.sessions[session_id] = session
+                        loaded_count += 1
+            except Exception as e:
+                print(f"Warning: Failed to load session {session_file}: {e}", file=sys.stderr)
+        
+        if loaded_count > 0:
+            print(f"📦 Loaded {loaded_count} existing session(s)")
+    
+    def update_session(self, session_id: str, ai_log: dict) -> dict:
+        """更新或创建session"""
+        if session_id not in self.sessions:
+            self.sessions[session_id] = {
+                "session_id": session_id,
+                "created_at": datetime.now().isoformat(),
+                "updated_at": datetime.now().isoformat(),
+                "messages_count": 0,
+                "total_input_tokens": 0,
+                "total_output_tokens": 0,
+                "total_reasoning_tokens": 0,
+                "total_cached_tokens": 0,
+                "rounds": [],
+                "model": ai_log.get("model", "unknown")
+            }
+        
+        session = self.sessions[session_id]
+        
+        # 更新统计
+        model = ai_log.get("model", "unknown")
+        session["model"] = model
+        session["updated_at"] = datetime.now().isoformat()
+        
+        # Token统计
+        session["total_input_tokens"] += ai_log.get("input_token", 0)
+        session["total_output_tokens"] += ai_log.get("output_token", 0)
+        
+        # 检查reasoning tokens（优先使用ai_log中的reasoning_tokens字段）
+        reasoning_tokens = ai_log.get("reasoning_tokens", 0)
+        if reasoning_tokens == 0 and "reasoning" in ai_log and ai_log["reasoning"]:
+            # 如果没有reasoning_tokens字段，估算reasoning的token数（大致按字符数/4）
+            reasoning_text = ai_log["reasoning"]
+            reasoning_tokens = len(reasoning_text) // 4
+        session["total_reasoning_tokens"] += reasoning_tokens
+        
+        # 检查cached tokens（prompt caching）
+        cached_tokens = ai_log.get("cached_tokens", 0)
+        session["total_cached_tokens"] += cached_tokens
+        
+        # 检查是否有tool_calls（工具调用）
+        has_tool_calls = "tool_calls" in ai_log and ai_log["tool_calls"]
+        
+        # 更新消息数
+        session["messages_count"] += 1
+        
+        # 解析token details（如果有）
+        input_token_details = {}
+        output_token_details = {}
+        
+        if "input_token_details" in ai_log:
+            try:
+                # input_token_details可能是字符串或字典
+                details = ai_log["input_token_details"]
+                if isinstance(details, str):
+                    import json
+                    input_token_details = json.loads(details)
+                else:
+                    input_token_details = details
+            except (json.JSONDecodeError, TypeError):
+                pass
+        
+        if "output_token_details" in ai_log:
+            try:
+                # output_token_details可能是字符串或字典
+                details = ai_log["output_token_details"]
+                if isinstance(details, str):
+                    import json
+                    output_token_details = json.loads(details)
+                else:
+                    output_token_details = details
+            except (json.JSONDecodeError, TypeError):
+                pass
+        
+        # 添加轮次记录（包含完整的llm请求和响应信息）
+        round_data = {
+            "round": session["messages_count"],
+            "timestamp": datetime.now().isoformat(),
+            "input_tokens": ai_log.get("input_token", 0),
+            "output_tokens": ai_log.get("output_token", 0),
+            "reasoning_tokens": reasoning_tokens,
+            "cached_tokens": cached_tokens,
+            "model": model,
+            "has_tool_calls": has_tool_calls,
+            "response_type": ai_log.get("response_type", "normal"),
+            # 完整的对话信息
+            "messages": ai_log.get("messages", []),
+            "question": ai_log.get("question", ""),
+            "answer": ai_log.get("answer", ""),
+            "reasoning": ai_log.get("reasoning", ""),
+            "tool_calls": ai_log.get("tool_calls", []),
+            # Token详情
+            "input_token_details": input_token_details,
+            "output_token_details": output_token_details,
+        }
+        session["rounds"].append(round_data)
+        
+        # 保存到文件
+        self._save_session(session)
+        
+        return session
+    
+    def _save_session(self, session: dict):
+        """保存session数据到文件"""
+        session_file = self.output_dir / f"{session['session_id']}.json"
+        with open(session_file, 'w', encoding='utf-8') as f:
+            json.dump(session, f, ensure_ascii=False, indent=2)
+    
+    def get_all_sessions(self) -> List[dict]:
+        """获取所有session"""
+        return list(self.sessions.values())
+    
+    def get_session(self, session_id: str) -> Optional[dict]:
+        """获取指定session"""
+        return self.sessions.get(session_id)
+    
+    def get_summary(self) -> dict:
+        """获取总体统计"""
+        total_input = sum(s["total_input_tokens"] for s in self.sessions.values())
+        total_output = sum(s["total_output_tokens"] for s in self.sessions.values())
+        total_reasoning = sum(s.get("total_reasoning_tokens", 0) for s in self.sessions.values())
+        total_cached = sum(s.get("total_cached_tokens", 0) for s in self.sessions.values())
+        
+        # 计算成本
+        total_cost = 0
+        for session in self.sessions.values():
+            model = session.get("model", "unknown")
+            input_tokens = session["total_input_tokens"]
+            output_tokens = session["total_output_tokens"]
+            reasoning_tokens = session.get("total_reasoning_tokens", 0)
+            cached_tokens = session.get("total_cached_tokens", 0)
+            
+            pricing = TOKEN_PRICING.get(model, TOKEN_PRICING.get("GPT-4", {}))
+            
+            # 基础成本计算
+            # 注意：cached_tokens已经包含在input_tokens中，需要分开计算
+            regular_input_tokens = input_tokens - cached_tokens
+            input_cost = regular_input_tokens * pricing.get("input", 0) / 1000000
+            output_cost = output_tokens * pricing.get("output", 0) / 1000000
+            
+            # reasoning成本
+            reasoning_cost = 0
+            if "reasoning" in pricing and reasoning_tokens > 0:
+                reasoning_cost = reasoning_tokens * pricing["reasoning"] / 1000000
+            
+            # cached成本（通常比input便宜）
+            cached_cost = 0
+            if "cached" in pricing and cached_tokens > 0:
+                cached_cost = cached_tokens * pricing["cached"] / 1000000
+            
+            total_cost += input_cost + output_cost + reasoning_cost + cached_cost
+        
+        return {
+            "total_sessions": len(self.sessions),
+            "total_input_tokens": total_input,
+            "total_output_tokens": total_output,
+            "total_reasoning_tokens": total_reasoning,
+            "total_cached_tokens": total_cached,
+            "total_tokens": total_input + total_output + total_reasoning + total_cached,
+            "total_cost_usd": round(total_cost, 4),
+            "active_session_ids": list(self.sessions.keys())
+        }
+
+
+# ============================================================================
+# 日志解析器
+# ============================================================================
+
+class LogParser:
+    """解析Higress访问日志，提取ai_log，支持日志轮转"""
+    
+    def __init__(self, state_file: str = None):
+        self.state_file = Path(state_file) if state_file else None
+        self.file_offsets = {}  # {文件路径: 已读取的字节偏移}
+        self._load_state()
+    
+    def _load_state(self):
+        """加载上次的读取状态"""
+        if self.state_file and self.state_file.exists():
+            try:
+                with open(self.state_file, 'r') as f:
+                    self.file_offsets = json.load(f)
+            except Exception as e:
+                print(f"Warning: Failed to load state file: {e}", file=sys.stderr)
+    
+    def _save_state(self):
+        """保存当前的读取状态"""
+        if self.state_file:
+            try:
+                self.state_file.parent.mkdir(parents=True, exist_ok=True)
+                with open(self.state_file, 'w') as f:
+                    json.dump(self.file_offsets, f, indent=2)
+            except Exception as e:
+                print(f"Warning: Failed to save state file: {e}", file=sys.stderr)
+    
+    def parse_log_line(self, line: str) -> Optional[dict]:
+        """解析单行日志，提取ai_log JSON"""
+        try:
+            # 直接解析整个日志行为JSON
+            log_obj = json.loads(line.strip())
+            
+            # 获取ai_log字段（这是一个JSON字符串）
+            if 'ai_log' in log_obj:
+                ai_log_str = log_obj['ai_log']
+                
+                # 解析内层JSON
+                ai_log = json.loads(ai_log_str)
+                return ai_log
+        except (json.JSONDecodeError, ValueError, KeyError):
+            # 静默忽略非JSON行或缺少ai_log字段的行
+            pass
+        
+        return None
+    
+    def parse_rotated_logs(self, log_pattern: str, session_manager) -> None:
+        """解析日志文件及其轮转文件
+        
+        Args:
+            log_pattern: 日志文件路径，如 /var/log/proxy/access.log
+            session_manager: Session管理器
+        """
+        base_path = Path(log_pattern)
+        
+        # 自动扫描所有轮转的日志文件（从旧到新）
+        log_files = []
+        
+        # 自动扫描轮转文件（最多扫描到 .100，超过这个数量的日志应该很少见）
+        for i in range(100, 0, -1):
+            rotated_path = Path(f"{log_pattern}.{i}")
+            if rotated_path.exists():
+                log_files.append(str(rotated_path))
+        
+        # 添加当前日志文件
+        if base_path.exists():
+            log_files.append(str(base_path))
+        
+        if not log_files:
+            print(f"❌ No log files found for pattern: {log_pattern}")
+            return
+        
+        print(f"📂 Found {len(log_files)} log file(s):")
+        for f in log_files:
+            print(f"   - {f}")
+        print()
+        
+        # 按顺序解析每个文件（从旧到新）
+        for log_file in log_files:
+            self._parse_file_incremental(log_file, session_manager)
+        
+        # 保存状态
+        self._save_state()
+    
+    def _parse_file_incremental(self, file_path: str, session_manager) -> None:
+        """增量解析单个日志文件"""
+        try:
+            file_stat = os.stat(file_path)
+            file_size = file_stat.st_size
+            file_inode = file_stat.st_ino
+            
+            # 使用inode作为主键
+            inode_key = str(file_inode)
+            last_offset = self.file_offsets.get(inode_key, 0)
+            
+            # 如果文件变小了，说明是新文件（被truncate或新创建），从头开始读
+            if file_size < last_offset:
+                print(f"   📝 File truncated or recreated, reading from start: {file_path}")
+                last_offset = 0
+            
+            # 如果offset相同，说明没有新内容
+            if file_size == last_offset:
+                print(f"   ⏭️  No new content in: {file_path} (inode:{inode_key})")
+                return
+            
+            print(f"   📖 Reading {file_path} from offset {last_offset} to {file_size} (inode:{inode_key})")
+            
+            with open(file_path, 'r', encoding='utf-8', errors='ignore') as f:
+                f.seek(last_offset)
+                lines_processed = 0
+                
+                for line in f:
+                    ai_log = self.parse_log_line(line)
+                    if ai_log:
+                        session_id = ai_log.get("session_id", "default")
+                        session_manager.update_session(session_id, ai_log)
+                        lines_processed += 1
+                        
+                        # 每处理1000行打印一次进度
+                        if lines_processed % 1000 == 0:
+                            print(f"      Processed {lines_processed} lines, {len(session_manager.sessions)} sessions")
+                
+                # 更新offset（使用inode作为key）
+                current_offset = f.tell()
+                self.file_offsets[inode_key] = current_offset
+                
+                print(f"   ✅ Processed {lines_processed} new lines from {file_path}")
+                
+        except FileNotFoundError:
+            print(f"   ❌ File not found: {file_path}")
+        except Exception as e:
+            print(f"   ❌ Error parsing {file_path}: {e}")
+
+
+# ============================================================================
+# 实时显示器
+# ============================================================================
+
+class RealtimeMonitor:
+    """实时监控显示和交互（定时轮询模式）"""
+    
+    def __init__(self, session_manager: SessionManager, log_parser=None, log_path: str = None, refresh_interval: int = 1):
+        self.session_manager = session_manager
+        self.log_parser = log_parser
+        self.log_path = log_path
+        self.refresh_interval = refresh_interval
+        self.running = True
+        self.last_poll_time = 0
+    
+    def start(self):
+        """启动实时监控（定时轮询日志文件）"""
+        print(f"\n{'=' * 50}")
+        print(f"🔍 Agent Session Monitor - Real-time View")
+        print(f"{'=' * 50}")
+        print()
+        print("Press Ctrl+C to stop...")
+        print()
+        
+        try:
+            while self.running:
+                # 定时轮询日志文件（检查新增内容和轮转）
+                current_time = time.time()
+                if self.log_parser and self.log_path and (current_time - self.last_poll_time >= self.refresh_interval):
+                    self.log_parser.parse_rotated_logs(self.log_path, self.session_manager)
+                    self.last_poll_time = current_time
+                
+                # 显示状态
+                self._display_status()
+                time.sleep(self.refresh_interval)
+        except KeyboardInterrupt:
+            print("\n\n👋 Stopping monitor...")
+            self.running = False
+            self._display_summary()
+    
+    def _display_status(self):
+        """显示当前状态"""
+        summary = self.session_manager.get_summary()
+        
+        # 清屏
+        os.system('clear' if os.name == 'posix' else 'cls')
+        
+        print(f"{'=' * 50}")
+        print(f"🔍 Session Monitor - Active")
+        print(f"{'=' * 50}")
+        print()
+        print(f"📊 Active Sessions: {summary['total_sessions']}")
+        print()
+        
+        # 显示活跃session的token统计
+        if summary['active_session_ids']:
+            print("┌──────────────────────────┬─────────┬──────────┬───────────┐")
+            print("│ Session ID               │ Msgs    │ Input    │ Output    │")
+            print("├──────────────────────────┼─────────┼──────────┼───────────┤")
+            
+            for session_id in summary['active_session_ids'][:10]:  # 最多显示10个
+                session = self.session_manager.get_session(session_id)
+                if session:
+                    sid = session_id[:24] if len(session_id) > 24 else session_id
+                    print(f"│ {sid:<24} │ {session['messages_count']:>7} │ {session['total_input_tokens']:>8,} │ {session['total_output_tokens']:>9,} │")
+            
+            print("└──────────────────────────┴─────────┴──────────┴───────────┘")
+        
+        print()
+        print(f"📈 Token Statistics")
+        print(f"   Total Input:   {summary['total_input_tokens']:,} tokens")
+        print(f"   Total Output:  {summary['total_output_tokens']:,} tokens")
+        if summary['total_reasoning_tokens'] > 0:
+            print(f"   Total Reasoning: {summary['total_reasoning_tokens']:,} tokens")
+        print(f"   Total Cached:   {summary['total_cached_tokens']:,} tokens")
+        print(f"   Total Cost:     ${summary['total_cost_usd']:.4f}")
+    
+    def _display_summary(self):
+        """显示最终汇总"""
+        summary = self.session_manager.get_summary()
+        
+        print()
+        print(f"{'=' * 50}")
+        print(f"📊 Session Monitor - Summary")
+        print(f"{'=' * 50}")
+        print()
+        print(f"📈 Final Statistics")
+        print(f"   Total Sessions: {summary['total_sessions']}")
+        print(f"   Total Input:   {summary['total_input_tokens']:,} tokens")
+        print(f"   Total Output:  {summary['total_output_tokens']:,} tokens")
+        if summary['total_reasoning_tokens'] > 0:
+            print(f"   Total Reasoning: {summary['total_reasoning_tokens']:,} tokens")
+        print(f"   Total Cached:   {summary['total_cached_tokens']:,} tokens")
+        print(f"   Total Tokens:   {summary['total_tokens']:,} tokens")
+        print(f"   Total Cost:     ${summary['total_cost_usd']:.4f}")
+        print(f"{'=' * 50}")
+        print()
+
+
+# ============================================================================
+# 主程序
+# ============================================================================
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Agent Session Monitor - 实时监控多轮Agent对话的token开销",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+示例:
+  # 监控默认日志
+  %(prog)s
+  
+  # 监控指定日志文件
+  %(prog)s --log-path /var/log/higress/access.log
+  
+  # 设置预算为500K tokens
+  %(prog)s --budget 500000
+  
+  # 监控特定session
+  %(prog)s --session-key agent:main:discord:channel:1465367993012981988
+        """,
+        allow_abbrev=False
+    )
+    
+    parser.add_argument(
+        '--log-path',
+        default=DEFAULT_LOG_PATH,
+        help=f'Higress访问日志文件路径（默认: {DEFAULT_LOG_PATH}）'
+    )
+    
+    parser.add_argument(
+        '--output-dir',
+        default=DEFAULT_OUTPUT_DIR,
+        help=f'Session数据存储目录（默认: {DEFAULT_OUTPUT_DIR}）'
+    )
+    
+    parser.add_argument(
+        '--session-key',
+        help='只监控包含指定session key的日志'
+    )
+    
+    parser.add_argument(
+        '--refresh-interval',
+        type=int,
+        default=1,
+        help=f'实时监控刷新间隔（秒，默认: 1）'
+    )
+    
+    parser.add_argument(
+        '--state-file',
+        help='状态文件路径，用于记录已读取的offset（默认: <output-dir>/.state.json）'
+    )
+    
+    args = parser.parse_args()
+    
+    # 初始化组件
+    session_manager = SessionManager(output_dir=args.output_dir)
+    
+    # 状态文件路径
+    state_file = args.state_file or str(Path(args.output_dir) / '.state.json')
+    
+    log_parser = LogParser(state_file=state_file)
+    
+    print(f"{'=' * 60}")
+    print(f"🔍 Agent Session Monitor")
+    print(f"{'=' * 60}")
+    print()
+    print(f"📂 Log path: {args.log_path}")
+    print(f"📁 Output dir: {args.output_dir}")
+    if args.session_key:
+        print(f"🔑 Session key filter: {args.session_key}")
+    print(f"{'=' * 60}")
+    print()
+    
+    # 模式选择：实时监控或单次解析
+    if len(sys.argv) == 1:
+        # 默认模式：实时监控（定时轮询）
+        print("📺 Mode: Real-time monitoring (polling mode with log rotation support)")
+        print(f"   Refresh interval: {args.refresh_interval} second(s)")
+        print()
+        
+        # 首次解析现有日志文件（包括轮转的文件）
+        log_parser.parse_rotated_logs(args.log_path, session_manager)
+        
+        # 启动实时监控（定时轮询模式）
+        monitor = RealtimeMonitor(
+            session_manager, 
+            log_parser=log_parser,
+            log_path=args.log_path,
+            refresh_interval=args.refresh_interval
+        )
+        monitor.start()
+        
+    else:
+        # 单次解析模式
+        print("📊 Mode: One-time log parsing (with log rotation support)")
+        print()
+        log_parser.parse_rotated_logs(args.log_path, session_manager)
+        
+        # 显示汇总
+        summary = session_manager.get_summary()
+        print(f"\n{'=' * 50}")
+        print(f"📊 Session Summary")
+        print(f"{'=' * 50}")
+        print()
+        print(f"📈 Final Statistics")
+        print(f"   Total Sessions: {summary['total_sessions']}")
+        print(f"   Total Input:   {summary['total_input_tokens']:,} tokens")
+        print(f"   Total Output:  {summary['total_output_tokens']:,} tokens")
+        if summary['total_reasoning_tokens'] > 0:
+            print(f"   Total Reasoning: {summary['total_reasoning_tokens']:,} tokens")
+        print(f"   Total Cached:   {summary['total_cached_tokens']:,} tokens")
+        print(f"   Total Tokens:   {summary['total_tokens']:,} tokens")
+        print(f"   Total Cost:     ${summary['total_cost_usd']:.4f}")
+        print(f"{'=' * 50}")
+        print()
+        print(f"💾 Session data saved to: {args.output_dir}/")
+        print(f"   Run with --output-dir to specify custom directory")
+
+
+if __name__ == '__main__':
+    main()
--- a/.claude/skills/agent-session-monitor/scripts/cli.py
+++ b/.claude/skills/agent-session-monitor/scripts/cli.py
@@ -0,0 +1,600 @@
+#!/usr/bin/env python3
+"""
+Agent Session Monitor CLI - 查询和分析agent对话数据
+支持：
+1. 实时查询指定session的完整llm请求和响应
+2. 按模型统计token开销
+3. 按日期统计token开销
+4. 生成FinOps报表
+"""
+
+import argparse
+import json
+import sys
+from collections import defaultdict
+from datetime import datetime, timedelta
+from pathlib import Path
+from typing import Dict, List, Optional
+import re
+
+# Token定价（单位：美元/1M tokens）
+TOKEN_PRICING = {
+    "Qwen": {
+        "input": 0.0002,  # $0.2/1M
+        "output": 0.0006,
+        "cached": 0.0001,  # cached tokens通常是input的50%
+    },
+    "Qwen3-rerank": {
+        "input": 0.0003,
+        "output": 0.0012,
+        "cached": 0.00015,
+    },
+    "Qwen-Max": {
+        "input": 0.0005,
+        "output": 0.002,
+        "cached": 0.00025,
+    },
+    "GPT-4": {
+        "input": 0.003,
+        "output": 0.006,
+        "cached": 0.0015,
+    },
+    "GPT-4o": {
+        "input": 0.0025,
+        "output": 0.01,
+        "cached": 0.00125,  # GPT-4o prompt caching: 50% discount
+    },
+    "GPT-4-32k": {
+        "input": 0.01,
+        "output": 0.03,
+        "cached": 0.005,
+    },
+    "o1": {
+        "input": 0.015,
+        "output": 0.06,
+        "cached": 0.0075,
+        "reasoning": 0.06,  # o1 reasoning tokens same as output
+    },
+    "o1-mini": {
+        "input": 0.003,
+        "output": 0.012,
+        "cached": 0.0015,
+        "reasoning": 0.012,
+    },
+    "Claude": {
+        "input": 0.015,
+        "output": 0.075,
+        "cached": 0.0015,  # Claude prompt caching: 90% discount
+    },
+    "DeepSeek-R1": {
+        "input": 0.004,
+        "output": 0.012,
+        "reasoning": 0.002,
+        "cached": 0.002,
+    }
+}
+
+
+class SessionAnalyzer:
+    """Session数据分析器"""
+    
+    def __init__(self, data_dir: str):
+        self.data_dir = Path(data_dir)
+        if not self.data_dir.exists():
+            raise FileNotFoundError(f"Session data directory not found: {data_dir}")
+    
+    def load_session(self, session_id: str) -> Optional[dict]:
+        """加载指定session的完整数据"""
+        session_file = self.data_dir / f"{session_id}.json"
+        if not session_file.exists():
+            return None
+        
+        with open(session_file, 'r', encoding='utf-8') as f:
+            return json.load(f)
+    
+    def load_all_sessions(self) -> List[dict]:
+        """加载所有session数据"""
+        sessions = []
+        for session_file in self.data_dir.glob("*.json"):
+            try:
+                with open(session_file, 'r', encoding='utf-8') as f:
+                    session = json.load(f)
+                    sessions.append(session)
+            except Exception as e:
+                print(f"Warning: Failed to load {session_file}: {e}", file=sys.stderr)
+        return sessions
+    
+    def display_session_detail(self, session_id: str, show_messages: bool = True):
+        """显示session的详细信息"""
+        session = self.load_session(session_id)
+        if not session:
+            print(f"❌ Session not found: {session_id}")
+            return
+        
+        print(f"\n{'='*70}")
+        print(f"📊 Session Detail: {session_id}")
+        print(f"{'='*70}\n")
+        
+        # 基本信息
+        print(f"🕐 Created:  {session['created_at']}")
+        print(f"🕑 Updated:  {session['updated_at']}")
+        print(f"🤖 Model:    {session['model']}")
+        print(f"💬 Messages: {session['messages_count']}")
+        print()
+        
+        # Token统计
+        print(f"📈 Token Statistics:")
+        
+        total_input = session['total_input_tokens']
+        total_output = session['total_output_tokens']
+        total_reasoning = session.get('total_reasoning_tokens', 0)
+        total_cached = session.get('total_cached_tokens', 0)
+        
+        # 区分regular input和cached input
+        regular_input = total_input - total_cached
+        
+        if total_cached > 0:
+            print(f"   Input:      {regular_input:>10,} tokens (regular)")
+            print(f"   Cached:     {total_cached:>10,} tokens (from cache)")
+            print(f"   Total Input:{total_input:>10,} tokens")
+        else:
+            print(f"   Input:      {total_input:>10,} tokens")
+        
+        print(f"   Output:     {total_output:>10,} tokens")
+        
+        if total_reasoning > 0:
+            print(f"   Reasoning:  {total_reasoning:>10,} tokens")
+        
+        # 总计（不重复计算cached）
+        total_tokens = total_input + total_output + total_reasoning
+        print(f"   ────────────────────────")
+        print(f"   Total:      {total_tokens:>10,} tokens")
+        print()
+        
+        # 成本计算
+        cost = self._calculate_cost(session)
+        print(f"💰 Estimated Cost: ${cost:.8f} USD")
+        print()
+        
+        # 对话轮次
+        if show_messages and 'rounds' in session:
+            print(f"📝 Conversation Rounds ({len(session['rounds'])}):")
+            print(f"{'─'*70}")
+            
+            for i, round_data in enumerate(session['rounds'], 1):
+                timestamp = round_data.get('timestamp', 'N/A')
+                input_tokens = round_data.get('input_tokens', 0)
+                output_tokens = round_data.get('output_tokens', 0)
+                has_tool_calls = round_data.get('has_tool_calls', False)
+                response_type = round_data.get('response_type', 'normal')
+                
+                print(f"\n  Round {i} @ {timestamp}")
+                print(f"    Tokens: {input_tokens:,} in → {output_tokens:,} out")
+                
+                if has_tool_calls:
+                    print(f"    🔧 Tool calls: Yes")
+                
+                if response_type != 'normal':
+                    print(f"    Type: {response_type}")
+                
+                # 显示完整的messages（如果有）
+                if 'messages' in round_data:
+                    messages = round_data['messages']
+                    print(f"    Messages ({len(messages)}):")
+                    for msg in messages[-3:]:  # 只显示最后3条
+                        role = msg.get('role', 'unknown')
+                        content = msg.get('content', '')
+                        content_preview = content[:100] + '...' if len(content) > 100 else content
+                        print(f"      [{role}] {content_preview}")
+                
+                # 显示question/answer/reasoning（如果有）
+                if 'question' in round_data:
+                    q = round_data['question']
+                    q_preview = q[:150] + '...' if len(q) > 150 else q
+                    print(f"    ❓ Question: {q_preview}")
+                
+                if 'answer' in round_data:
+                    a = round_data['answer']
+                    a_preview = a[:150] + '...' if len(a) > 150 else a
+                    print(f"    ✅ Answer: {a_preview}")
+                
+                if 'reasoning' in round_data and round_data['reasoning']:
+                    r = round_data['reasoning']
+                    r_preview = r[:150] + '...' if len(r) > 150 else r
+                    print(f"    🧠 Reasoning: {r_preview}")
+                
+                if 'tool_calls' in round_data and round_data['tool_calls']:
+                    print(f"    🛠️  Tool Calls:")
+                    for tool_call in round_data['tool_calls']:
+                        func_name = tool_call.get('function', {}).get('name', 'unknown')
+                        args = tool_call.get('function', {}).get('arguments', '')
+                        print(f"       - {func_name}({args[:80]}...)")
+                
+                # 显示token details（如果有）
+                if round_data.get('input_token_details'):
+                    print(f"    📊 Input Token Details: {round_data['input_token_details']}")
+                
+                if round_data.get('output_token_details'):
+                    print(f"    📊 Output Token Details: {round_data['output_token_details']}")
+            
+            print(f"\n{'─'*70}")
+        
+        print(f"\n{'='*70}\n")
+    
+    def _calculate_cost(self, session: dict) -> float:
+        """计算session的成本"""
+        model = session.get('model', 'unknown')
+        pricing = TOKEN_PRICING.get(model, TOKEN_PRICING.get("GPT-4", {}))
+        
+        input_tokens = session['total_input_tokens']
+        output_tokens = session['total_output_tokens']
+        reasoning_tokens = session.get('total_reasoning_tokens', 0)
+        cached_tokens = session.get('total_cached_tokens', 0)
+        
+        # 区分regular input和cached input
+        regular_input_tokens = input_tokens - cached_tokens
+        
+        input_cost = regular_input_tokens * pricing.get('input', 0) / 1000000
+        output_cost = output_tokens * pricing.get('output', 0) / 1000000
+        
+        reasoning_cost = 0
+        if 'reasoning' in pricing and reasoning_tokens > 0:
+            reasoning_cost = reasoning_tokens * pricing['reasoning'] / 1000000
+        
+        cached_cost = 0
+        if 'cached' in pricing and cached_tokens > 0:
+            cached_cost = cached_tokens * pricing['cached'] / 1000000
+        
+        return input_cost + output_cost + reasoning_cost + cached_cost
+    
+    def stats_by_model(self) -> Dict[str, dict]:
+        """按模型统计token开销"""
+        sessions = self.load_all_sessions()
+        
+        stats = defaultdict(lambda: {
+            'session_count': 0,
+            'total_input': 0,
+            'total_output': 0,
+            'total_reasoning': 0,
+            'total_cost': 0.0
+        })
+        
+        for session in sessions:
+            model = session.get('model', 'unknown')
+            stats[model]['session_count'] += 1
+            stats[model]['total_input'] += session['total_input_tokens']
+            stats[model]['total_output'] += session['total_output_tokens']
+            stats[model]['total_reasoning'] += session.get('total_reasoning_tokens', 0)
+            stats[model]['total_cost'] += self._calculate_cost(session)
+        
+        return dict(stats)
+    
+    def stats_by_date(self, days: int = 30) -> Dict[str, dict]:
+        """按日期统计token开销（最近N天）"""
+        sessions = self.load_all_sessions()
+        
+        stats = defaultdict(lambda: {
+            'session_count': 0,
+            'total_input': 0,
+            'total_output': 0,
+            'total_reasoning': 0,
+            'total_cost': 0.0,
+            'models': set()
+        })
+        
+        cutoff_date = datetime.now() - timedelta(days=days)
+        
+        for session in sessions:
+            created_at = datetime.fromisoformat(session['created_at'])
+            if created_at < cutoff_date:
+                continue
+            
+            date_key = created_at.strftime('%Y-%m-%d')
+            stats[date_key]['session_count'] += 1
+            stats[date_key]['total_input'] += session['total_input_tokens']
+            stats[date_key]['total_output'] += session['total_output_tokens']
+            stats[date_key]['total_reasoning'] += session.get('total_reasoning_tokens', 0)
+            stats[date_key]['total_cost'] += self._calculate_cost(session)
+            stats[date_key]['models'].add(session.get('model', 'unknown'))
+        
+        # 转换sets为lists以便JSON序列化
+        for date_key in stats:
+            stats[date_key]['models'] = list(stats[date_key]['models'])
+        
+        return dict(stats)
+    
+    def display_model_stats(self):
+        """显示按模型的统计"""
+        stats = self.stats_by_model()
+        
+        print(f"\n{'='*80}")
+        print(f"📊 Statistics by Model")
+        print(f"{'='*80}\n")
+        
+        print(f"{'Model':<20} {'Sessions':<10} {'Input':<15} {'Output':<15} {'Cost (USD)':<12}")
+        print(f"{'─'*80}")
+        
+        # 按成本降序排列
+        sorted_models = sorted(stats.items(), key=lambda x: x[1]['total_cost'], reverse=True)
+        
+        for model, data in sorted_models:
+            print(f"{model:<20} "
+                  f"{data['session_count']:<10} "
+                  f"{data['total_input']:>12,}  "
+                  f"{data['total_output']:>12,}  "
+                  f"${data['total_cost']:>10.6f}")
+        
+        # 总计
+        total_sessions = sum(d['session_count'] for d in stats.values())
+        total_input = sum(d['total_input'] for d in stats.values())
+        total_output = sum(d['total_output'] for d in stats.values())
+        total_cost = sum(d['total_cost'] for d in stats.values())
+        
+        print(f"{'─'*80}")
+        print(f"{'TOTAL':<20} "
+              f"{total_sessions:<10} "
+              f"{total_input:>12,}  "
+              f"{total_output:>12,}  "
+              f"${total_cost:>10.6f}")
+        
+        print(f"\n{'='*80}\n")
+    
+    def display_date_stats(self, days: int = 30):
+        """显示按日期的统计"""
+        stats = self.stats_by_date(days)
+        
+        print(f"\n{'='*80}")
+        print(f"📊 Statistics by Date (Last {days} days)")
+        print(f"{'='*80}\n")
+        
+        print(f"{'Date':<12} {'Sessions':<10} {'Input':<15} {'Output':<15} {'Cost (USD)':<12} {'Models':<20}")
+        print(f"{'─'*80}")
+        
+        # 按日期升序排列
+        sorted_dates = sorted(stats.items())
+        
+        for date, data in sorted_dates:
+            models_str = ', '.join(data['models'][:3])  # 最多显示3个模型
+            if len(data['models']) > 3:
+                models_str += f" +{len(data['models'])-3}"
+            
+            print(f"{date:<12} "
+                  f"{data['session_count']:<10} "
+                  f"{data['total_input']:>12,}  "
+                  f"{data['total_output']:>12,}  "
+                  f"${data['total_cost']:>10.4f}  "
+                  f"{models_str}")
+        
+        # 总计
+        total_sessions = sum(d['session_count'] for d in stats.values())
+        total_input = sum(d['total_input'] for d in stats.values())
+        total_output = sum(d['total_output'] for d in stats.values())
+        total_cost = sum(d['total_cost'] for d in stats.values())
+        
+        print(f"{'─'*80}")
+        print(f"{'TOTAL':<12} "
+              f"{total_sessions:<10} "
+              f"{total_input:>12,}  "
+              f"{total_output:>12,}  "
+              f"${total_cost:>10.4f}")
+        
+        print(f"\n{'='*80}\n")
+    
+    def list_sessions(self, limit: int = 20, sort_by: str = 'updated'):
+        """列出所有session"""
+        sessions = self.load_all_sessions()
+        
+        # 排序
+        if sort_by == 'updated':
+            sessions.sort(key=lambda s: s.get('updated_at', ''), reverse=True)
+        elif sort_by == 'cost':
+            sessions.sort(key=lambda s: self._calculate_cost(s), reverse=True)
+        elif sort_by == 'tokens':
+            sessions.sort(key=lambda s: s['total_input_tokens'] + s['total_output_tokens'], reverse=True)
+        
+        print(f"\n{'='*100}")
+        print(f"📋 Sessions (sorted by {sort_by}, showing {min(limit, len(sessions))} of {len(sessions)})")
+        print(f"{'='*100}\n")
+        
+        print(f"{'Session ID':<30} {'Updated':<20} {'Model':<15} {'Msgs':<6} {'Tokens':<12} {'Cost':<10}")
+        print(f"{'─'*100}")
+        
+        for session in sessions[:limit]:
+            session_id = session['session_id'][:28] + '..' if len(session['session_id']) > 30 else session['session_id']
+            updated = session.get('updated_at', 'N/A')[:19]
+            model = session.get('model', 'unknown')[:13]
+            msg_count = session.get('messages_count', 0)
+            total_tokens = session['total_input_tokens'] + session['total_output_tokens']
+            cost = self._calculate_cost(session)
+            
+            print(f"{session_id:<30} {updated:<20} {model:<15} {msg_count:<6} {total_tokens:>10,}  ${cost:>8.4f}")
+        
+        print(f"\n{'='*100}\n")
+    
+    def export_finops_report(self, output_file: str, format: str = 'json'):
+        """导出FinOps报表"""
+        model_stats = self.stats_by_model()
+        date_stats = self.stats_by_date(30)
+        
+        report = {
+            'generated_at': datetime.now().isoformat(),
+            'summary': {
+                'total_sessions': sum(d['session_count'] for d in model_stats.values()),
+                'total_input_tokens': sum(d['total_input'] for d in model_stats.values()),
+                'total_output_tokens': sum(d['total_output'] for d in model_stats.values()),
+                'total_cost_usd': sum(d['total_cost'] for d in model_stats.values()),
+            },
+            'by_model': model_stats,
+            'by_date': date_stats,
+        }
+        
+        output_path = Path(output_file)
+        
+        if format == 'json':
+            with open(output_path, 'w', encoding='utf-8') as f:
+                json.dump(report, f, ensure_ascii=False, indent=2)
+            print(f"✅ FinOps report exported to: {output_path}")
+        
+        elif format == 'csv':
+            import csv
+            
+            # 按模型导出CSV
+            model_csv = output_path.with_suffix('.model.csv')
+            with open(model_csv, 'w', newline='', encoding='utf-8') as f:
+                writer = csv.writer(f)
+                writer.writerow(['Model', 'Sessions', 'Input Tokens', 'Output Tokens', 'Cost (USD)'])
+                for model, data in model_stats.items():
+                    writer.writerow([
+                        model,
+                        data['session_count'],
+                        data['total_input'],
+                        data['total_output'],
+                        f"{data['total_cost']:.6f}"
+                    ])
+            
+            # 按日期导出CSV
+            date_csv = output_path.with_suffix('.date.csv')
+            with open(date_csv, 'w', newline='', encoding='utf-8') as f:
+                writer = csv.writer(f)
+                writer.writerow(['Date', 'Sessions', 'Input Tokens', 'Output Tokens', 'Cost (USD)', 'Models'])
+                for date, data in sorted(date_stats.items()):
+                    writer.writerow([
+                        date,
+                        data['session_count'],
+                        data['total_input'],
+                        data['total_output'],
+                        f"{data['total_cost']:.6f}",
+                        ', '.join(data['models'])
+                    ])
+            
+            print(f"✅ FinOps report exported to:")
+            print(f"   Model stats: {model_csv}")
+            print(f"   Date stats:  {date_csv}")
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Agent Session Monitor CLI - 查询和分析agent对话数据",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Commands:
+  show <session-id>      显示session的详细信息
+  list                   列出所有session
+  stats-model            按模型统计token开销
+  stats-date             按日期统计token开销（默认30天）
+  export                 导出FinOps报表
+
+Examples:
+  # 查看特定session的详细对话
+  %(prog)s show agent:main:discord:channel:1465367993012981988
+  
+  # 列出最近20个session（按更新时间）
+  %(prog)s list
+  
+  # 列出token开销最高的10个session
+  %(prog)s list --sort-by cost --limit 10
+  
+  # 按模型统计token开销
+  %(prog)s stats-model
+  
+  # 按日期统计token开销（最近7天）
+  %(prog)s stats-date --days 7
+  
+  # 导出FinOps报表（JSON格式）
+  %(prog)s export finops-report.json
+  
+  # 导出FinOps报表（CSV格式）
+  %(prog)s export finops-report --format csv
+        """
+    )
+    
+    parser.add_argument(
+        'command',
+        choices=['show', 'list', 'stats-model', 'stats-date', 'export'],
+        help='命令'
+    )
+    
+    parser.add_argument(
+        'args',
+        nargs='*',
+        help='命令参数（例如：session-id或输出文件名）'
+    )
+    
+    parser.add_argument(
+        '--data-dir',
+        default='./sessions',
+        help='Session数据目录（默认: ./sessions）'
+    )
+    
+    parser.add_argument(
+        '--limit',
+        type=int,
+        default=20,
+        help='list命令的结果限制（默认: 20）'
+    )
+    
+    parser.add_argument(
+        '--sort-by',
+        choices=['updated', 'cost', 'tokens'],
+        default='updated',
+        help='list命令的排序方式（默认: updated）'
+    )
+    
+    parser.add_argument(
+        '--days',
+        type=int,
+        default=30,
+        help='stats-date命令的天数（默认: 30）'
+    )
+    
+    parser.add_argument(
+        '--format',
+        choices=['json', 'csv'],
+        default='json',
+        help='export命令的输出格式（默认: json）'
+    )
+    
+    parser.add_argument(
+        '--no-messages',
+        action='store_true',
+        help='show命令：不显示对话内容'
+    )
+    
+    args = parser.parse_args()
+    
+    try:
+        analyzer = SessionAnalyzer(args.data_dir)
+        
+        if args.command == 'show':
+            if not args.args:
+                parser.error("show命令需要session-id参数")
+            session_id = args.args[0]
+            analyzer.display_session_detail(session_id, show_messages=not args.no_messages)
+        
+        elif args.command == 'list':
+            analyzer.list_sessions(limit=args.limit, sort_by=args.sort_by)
+        
+        elif args.command == 'stats-model':
+            analyzer.display_model_stats()
+        
+        elif args.command == 'stats-date':
+            analyzer.display_date_stats(days=args.days)
+        
+        elif args.command == 'export':
+            if not args.args:
+                parser.error("export命令需要输出文件名参数")
+            output_file = args.args[0]
+            analyzer.export_finops_report(output_file, format=args.format)
+    
+    except FileNotFoundError as e:
+        print(f"❌ Error: {e}", file=sys.stderr)
+        sys.exit(1)
+    except Exception as e:
+        print(f"❌ Unexpected error: {e}", file=sys.stderr)
+        import traceback
+        traceback.print_exc()
+        sys.exit(1)
+
+
+if __name__ == '__main__':
+    main()
--- a/.claude/skills/agent-session-monitor/scripts/webserver.py
+++ b/.claude/skills/agent-session-monitor/scripts/webserver.py
@@ -0,0 +1,755 @@
+#!/usr/bin/env python3
+"""
+Agent Session Monitor - Web Server
+提供浏览器访问的观测界面
+"""
+
+import argparse
+import json
+import sys
+from pathlib import Path
+from http.server import HTTPServer, BaseHTTPRequestHandler
+from urllib.parse import urlparse, parse_qs
+from collections import defaultdict
+from datetime import datetime, timedelta
+import re
+
+# 添加父目录到path以导入cli模块
+sys.path.insert(0, str(Path(__file__).parent.parent))
+
+try:
+    from scripts.cli import SessionAnalyzer, TOKEN_PRICING
+except ImportError:
+    # 如果导入失败，定义简单版本
+    TOKEN_PRICING = {
+        "Qwen3-rerank": {"input": 0.0003, "output": 0.0012},
+        "DeepSeek-R1": {"input": 0.004, "output": 0.012, "reasoning": 0.002},
+    }
+
+
+class SessionMonitorHandler(BaseHTTPRequestHandler):
+    """HTTP请求处理器"""
+    
+    def __init__(self, *args, data_dir=None, **kwargs):
+        self.data_dir = Path(data_dir) if data_dir else Path("./sessions")
+        super().__init__(*args, **kwargs)
+    
+    def do_GET(self):
+        """处理GET请求"""
+        parsed_path = urlparse(self.path)
+        path = parsed_path.path
+        query = parse_qs(parsed_path.query)
+        
+        if path == '/' or path == '/index.html':
+            self.serve_index()
+        elif path == '/session':
+            session_id = query.get('id', [None])[0]
+            if session_id:
+                self.serve_session_detail(session_id)
+            else:
+                self.send_error(400, "Missing session id")
+        elif path == '/api/sessions':
+            self.serve_api_sessions()
+        elif path == '/api/session':
+            session_id = query.get('id', [None])[0]
+            if session_id:
+                self.serve_api_session(session_id)
+            else:
+                self.send_error(400, "Missing session id")
+        elif path == '/api/stats':
+            self.serve_api_stats()
+        else:
+            self.send_error(404, "Not Found")
+    
+    def serve_index(self):
+        """首页 - 总览"""
+        html = self.generate_index_html()
+        self.send_html(html)
+    
+    def serve_session_detail(self, session_id: str):
+        """Session详情页"""
+        html = self.generate_session_html(session_id)
+        self.send_html(html)
+    
+    def serve_api_sessions(self):
+        """API: 获取所有session列表"""
+        sessions = self.load_all_sessions()
+        
+        # 简化数据
+        data = []
+        for session in sessions:
+            data.append({
+                'session_id': session['session_id'],
+                'model': session.get('model', 'unknown'),
+                'messages_count': session.get('messages_count', 0),
+                'total_tokens': session['total_input_tokens'] + session['total_output_tokens'],
+                'updated_at': session.get('updated_at', ''),
+                'cost': self.calculate_cost(session)
+            })
+        
+        # 按更新时间降序排序
+        data.sort(key=lambda x: x['updated_at'], reverse=True)
+        
+        self.send_json(data)
+    
+    def serve_api_session(self, session_id: str):
+        """API: 获取指定session的详细数据"""
+        session = self.load_session(session_id)
+        if session:
+            session['cost'] = self.calculate_cost(session)
+            self.send_json(session)
+        else:
+            self.send_error(404, "Session not found")
+    
+    def serve_api_stats(self):
+        """API: 获取统计数据"""
+        sessions = self.load_all_sessions()
+        
+        # 按模型统计
+        by_model = defaultdict(lambda: {
+            'count': 0,
+            'input_tokens': 0,
+            'output_tokens': 0,
+            'cost': 0.0
+        })
+        
+        # 按日期统计
+        by_date = defaultdict(lambda: {
+            'count': 0,
+            'input_tokens': 0,
+            'output_tokens': 0,
+            'cost': 0.0,
+            'models': set()
+        })
+        
+        total_cost = 0.0
+        
+        for session in sessions:
+            model = session.get('model', 'unknown')
+            cost = self.calculate_cost(session)
+            total_cost += cost
+            
+            # 按模型
+            by_model[model]['count'] += 1
+            by_model[model]['input_tokens'] += session['total_input_tokens']
+            by_model[model]['output_tokens'] += session['total_output_tokens']
+            by_model[model]['cost'] += cost
+            
+            # 按日期
+            created_at = session.get('created_at', '')
+            date_key = created_at[:10] if len(created_at) >= 10 else 'unknown'
+            by_date[date_key]['count'] += 1
+            by_date[date_key]['input_tokens'] += session['total_input_tokens']
+            by_date[date_key]['output_tokens'] += session['total_output_tokens']
+            by_date[date_key]['cost'] += cost
+            by_date[date_key]['models'].add(model)
+        
+        # 转换sets为lists
+        for date in by_date:
+            by_date[date]['models'] = list(by_date[date]['models'])
+        
+        stats = {
+            'total_sessions': len(sessions),
+            'total_cost': total_cost,
+            'by_model': dict(by_model),
+            'by_date': dict(sorted(by_date.items(), reverse=True))
+        }
+        
+        self.send_json(stats)
+    
+    def load_session(self, session_id: str):
+        """加载指定session"""
+        session_file = self.data_dir / f"{session_id}.json"
+        if session_file.exists():
+            with open(session_file, 'r', encoding='utf-8') as f:
+                return json.load(f)
+        return None
+    
+    def load_all_sessions(self):
+        """加载所有session"""
+        sessions = []
+        for session_file in self.data_dir.glob("*.json"):
+            try:
+                with open(session_file, 'r', encoding='utf-8') as f:
+                    sessions.append(json.load(f))
+            except Exception as e:
+                print(f"Warning: Failed to load {session_file}: {e}", file=sys.stderr)
+        return sessions
+    
+    def calculate_cost(self, session: dict) -> float:
+        """计算session成本"""
+        model = session.get('model', 'unknown')
+        pricing = TOKEN_PRICING.get(model, TOKEN_PRICING.get("GPT-4", {"input": 0.003, "output": 0.006}))
+        
+        input_tokens = session['total_input_tokens']
+        output_tokens = session['total_output_tokens']
+        reasoning_tokens = session.get('total_reasoning_tokens', 0)
+        cached_tokens = session.get('total_cached_tokens', 0)
+        
+        # 区分regular input和cached input
+        regular_input_tokens = input_tokens - cached_tokens
+        
+        input_cost = regular_input_tokens * pricing.get('input', 0) / 1000000
+        output_cost = output_tokens * pricing.get('output', 0) / 1000000
+        
+        reasoning_cost = 0
+        if 'reasoning' in pricing and reasoning_tokens > 0:
+            reasoning_cost = reasoning_tokens * pricing['reasoning'] / 1000000
+        
+        cached_cost = 0
+        if 'cached' in pricing and cached_tokens > 0:
+            cached_cost = cached_tokens * pricing['cached'] / 1000000
+        
+        return input_cost + output_cost + reasoning_cost + cached_cost
+    
+    def send_html(self, html: str):
+        """发送HTML响应"""
+        self.send_response(200)
+        self.send_header('Content-type', 'text/html; charset=utf-8')
+        self.end_headers()
+        self.wfile.write(html.encode('utf-8'))
+    
+    def send_json(self, data):
+        """发送JSON响应"""
+        self.send_response(200)
+        self.send_header('Content-type', 'application/json; charset=utf-8')
+        self.send_header('Access-Control-Allow-Origin', '*')
+        self.end_headers()
+        self.wfile.write(json.dumps(data, ensure_ascii=False, indent=2).encode('utf-8'))
+    
+    def generate_index_html(self) -> str:
+        """生成首页HTML"""
+        return '''<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Agent Session Monitor</title>
+    <style>
+        * { margin: 0; padding: 0; box-sizing: border-box; }
+        body {
+            font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
+            background: #f5f5f5;
+            padding: 20px;
+        }
+        .container { max-width: 1400px; margin: 0 auto; }
+        header {
+            background: white;
+            padding: 30px;
+            border-radius: 8px;
+            box-shadow: 0 2px 8px rgba(0,0,0,0.1);
+            margin-bottom: 20px;
+        }
+        h1 { color: #333; margin-bottom: 10px; }
+        .subtitle { color: #666; font-size: 14px; }
+        
+        .stats-grid {
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
+            gap: 20px;
+            margin-bottom: 20px;
+        }
+        .stat-card {
+            background: white;
+            padding: 20px;
+            border-radius: 8px;
+            box-shadow: 0 2px 8px rgba(0,0,0,0.1);
+        }
+        .stat-label { color: #666; font-size: 14px; margin-bottom: 8px; }
+        .stat-value { color: #333; font-size: 32px; font-weight: bold; }
+        .stat-unit { color: #999; font-size: 16px; margin-left: 4px; }
+        
+        .section {
+            background: white;
+            padding: 30px;
+            border-radius: 8px;
+            box-shadow: 0 2px 8px rgba(0,0,0,0.1);
+            margin-bottom: 20px;
+        }
+        h2 { color: #333; margin-bottom: 20px; font-size: 20px; }
+        
+        table { width: 100%; border-collapse: collapse; }
+        thead { background: #f8f9fa; }
+        th, td { padding: 12px; text-align: left; border-bottom: 1px solid #e9ecef; }
+        th { font-weight: 600; color: #666; font-size: 14px; }
+        td { color: #333; }
+        tbody tr:hover { background: #f8f9fa; }
+        
+        .session-link {
+            color: #007bff;
+            text-decoration: none;
+            font-family: monospace;
+            font-size: 13px;
+        }
+        .session-link:hover { text-decoration: underline; }
+        
+        .badge {
+            display: inline-block;
+            padding: 4px 8px;
+            border-radius: 4px;
+            font-size: 12px;
+            font-weight: 500;
+        }
+        .badge-qwen { background: #e3f2fd; color: #1976d2; }
+        .badge-deepseek { background: #f3e5f5; color: #7b1fa2; }
+        .badge-gpt { background: #e8f5e9; color: #388e3c; }
+        .badge-claude { background: #fff3e0; color: #f57c00; }
+        
+        .loading { text-align: center; padding: 40px; color: #666; }
+        .error { color: #d32f2f; padding: 20px; }
+        
+        .refresh-btn {
+            background: #007bff;
+            color: white;
+            border: none;
+            padding: 10px 20px;
+            border-radius: 4px;
+            cursor: pointer;
+            font-size: 14px;
+        }
+        .refresh-btn:hover { background: #0056b3; }
+    </style>
+</head>
+<body>
+    <div class="container">
+        <header>
+            <h1>🔍 Agent Session Monitor</h1>
+            <p class="subtitle">实时观测Clawdbot对话过程和Token开销</p>
+        </header>
+        
+        <div class="stats-grid" id="stats-grid">
+            <div class="stat-card">
+                <div class="stat-label">总会话数</div>
+                <div class="stat-value">-</div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-label">总Token消耗</div>
+                <div class="stat-value">-</div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-label">总成本</div>
+                <div class="stat-value">-</div>
+            </div>
+        </div>
+        
+        <div class="section">
+            <h2>📊 最近会话</h2>
+            <button class="refresh-btn" onclick="loadSessions()">🔄 刷新</button>
+            <div id="sessions-table">
+                <div class="loading">加载中...</div>
+            </div>
+        </div>
+        
+        <div class="section">
+            <h2>📈 按模型统计</h2>
+            <div id="model-stats">
+                <div class="loading">加载中...</div>
+            </div>
+        </div>
+    </div>
+    
+    <script>
+        function loadSessions() {
+            fetch('/api/sessions')
+                .then(r => r.json())
+                .then(sessions => {
+                    const html = `
+                        <table>
+                            <thead>
+                                <tr>
+                                    <th>Session ID</th>
+                                    <th>模型</th>
+                                    <th>消息数</th>
+                                    <th>总Token</th>
+                                    <th>成本</th>
+                                    <th>更新时间</th>
+                                </tr>
+                            </thead>
+                            <tbody>
+                                ${sessions.slice(0, 50).map(s => `
+                                    <tr>
+                                        <td><a href="/session?id=${encodeURIComponent(s.session_id)}" class="session-link">${s.session_id}</a></td>
+                                        <td>${getModelBadge(s.model)}</td>
+                                        <td>${s.messages_count}</td>
+                                        <td>${s.total_tokens.toLocaleString()}</td>
+                                        <td>$${s.cost.toFixed(6)}</td>
+                                        <td>${new Date(s.updated_at).toLocaleString()}</td>
+                                    </tr>
+                                `).join('')}
+                            </tbody>
+                        </table>
+                    `;
+                    document.getElementById('sessions-table').innerHTML = html;
+                })
+                .catch(err => {
+                    document.getElementById('sessions-table').innerHTML = `<div class="error">加载失败: ${err.message}</div>`;
+                });
+        }
+        
+        function loadStats() {
+            fetch('/api/stats')
+                .then(r => r.json())
+                .then(stats => {
+                    // 更新顶部统计卡片
+                    const cards = document.querySelectorAll('.stat-card');
+                    cards[0].querySelector('.stat-value').textContent = stats.total_sessions;
+                    
+                    const totalTokens = Object.values(stats.by_model).reduce((sum, m) => sum + m.input_tokens + m.output_tokens, 0);
+                    cards[1].querySelector('.stat-value').innerHTML = totalTokens.toLocaleString() + '<span class="stat-unit">tokens</span>';
+                    
+                    cards[2].querySelector('.stat-value').innerHTML = '$' + stats.total_cost.toFixed(4);
+                    
+                    // 模型统计表格
+                    const modelHtml = `
+                        <table>
+                            <thead>
+                                <tr>
+                                    <th>模型</th>
+                                    <th>会话数</th>
+                                    <th>输入Token</th>
+                                    <th>输出Token</th>
+                                    <th>成本</th>
+                                </tr>
+                            </thead>
+                            <tbody>
+                                ${Object.entries(stats.by_model).map(([model, data]) => `
+                                    <tr>
+                                        <td>${getModelBadge(model)}</td>
+                                        <td>${data.count}</td>
+                                        <td>${data.input_tokens.toLocaleString()}</td>
+                                        <td>${data.output_tokens.toLocaleString()}</td>
+                                        <td>$${data.cost.toFixed(6)}</td>
+                                    </tr>
+                                `).join('')}
+                            </tbody>
+                        </table>
+                    `;
+                    document.getElementById('model-stats').innerHTML = modelHtml;
+                })
+                .catch(err => {
+                    console.error('Failed to load stats:', err);
+                });
+        }
+        
+        function getModelBadge(model) {
+            let cls = 'badge';
+            if (model.includes('Qwen')) cls += ' badge-qwen';
+            else if (model.includes('DeepSeek')) cls += ' badge-deepseek';
+            else if (model.includes('GPT')) cls += ' badge-gpt';
+            else if (model.includes('Claude')) cls += ' badge-claude';
+            return `<span class="${cls}">${model}</span>`;
+        }
+        
+        // 初始加载
+        loadSessions();
+        loadStats();
+        
+        // 每30秒自动刷新
+        setInterval(() => {
+            loadSessions();
+            loadStats();
+        }, 30000);
+    </script>
+</body>
+</html>'''
+    
+    def generate_session_html(self, session_id: str) -> str:
+        """生成Session详情页HTML"""
+        session = self.load_session(session_id)
+        if not session:
+            return f'<html><body><h1>Session not found: {session_id}</h1></body></html>'
+        
+        cost = self.calculate_cost(session)
+        
+        # 生成对话轮次HTML
+        rounds_html = []
+        for r in session.get('rounds', []):
+            messages_html = ''
+            if r.get('messages'):
+                messages_html = '<div class="messages">'
+                for msg in r['messages'][-5:]:  # 最多显示5条
+                    role = msg.get('role', 'unknown')
+                    content = msg.get('content', '')
+                    messages_html += f'<div class="message message-{role}"><strong>[{role}]</strong> {self.escape_html(content)}</div>'
+                messages_html += '</div>'
+            
+            tool_calls_html = ''
+            if r.get('tool_calls'):
+                tool_calls_html = '<div class="tool-calls"><strong>🛠️ Tool Calls:</strong><ul>'
+                for tc in r['tool_calls']:
+                    func_name = tc.get('function', {}).get('name', 'unknown')
+                    tool_calls_html += f'<li>{func_name}()</li>'
+                tool_calls_html += '</ul></div>'
+            
+            # Token详情显示
+            token_details_html = ''
+            if r.get('input_token_details') or r.get('output_token_details'):
+                token_details_html = '<div class="token-details"><strong>📊 Token Details:</strong><ul>'
+                if r.get('input_token_details'):
+                    token_details_html += f'<li>Input: {r["input_token_details"]}</li>'
+                if r.get('output_token_details'):
+                    token_details_html += f'<li>Output: {r["output_token_details"]}</li>'
+                token_details_html += '</ul></div>'
+            
+            # Token类型标签
+            token_badges = ''
+            if r.get('cached_tokens', 0) > 0:
+                token_badges += f' <span class="token-badge token-badge-cached">📦 {r["cached_tokens"]:,} cached</span>'
+            if r.get('reasoning_tokens', 0) > 0:
+                token_badges += f' <span class="token-badge token-badge-reasoning">🧠 {r["reasoning_tokens"]:,} reasoning</span>'
+            
+            rounds_html.append(f'''
+                <div class="round">
+                    <div class="round-header">
+                        <span class="round-number">Round {r['round']}</span>
+                        <span class="round-time">{r['timestamp']}</span>
+                        <span class="round-tokens">{r['input_tokens']:,} in → {r['output_tokens']:,} out{token_badges}</span>
+                    </div>
+                    {messages_html}
+                    {f'<div class="question"><strong>❓ Question:</strong> {self.escape_html(r.get("question", ""))}</div>' if r.get('question') else ''}
+                    {f'<div class="answer"><strong>✅ Answer:</strong> {self.escape_html(r.get("answer", ""))}</div>' if r.get('answer') else ''}
+                    {f'<div class="reasoning"><strong>🧠 Reasoning:</strong> {self.escape_html(r.get("reasoning", ""))}</div>' if r.get('reasoning') else ''}
+                    {tool_calls_html}
+                    {token_details_html}
+                </div>
+            ''')
+        
+        return f'''<!DOCTYPE html>
+<html lang="zh-CN">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>{session_id} - Session Monitor</title>
+    <style>
+        * {{ margin: 0; padding: 0; box-sizing: border-box; }}
+        body {{
+            font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
+            background: #f5f5f5;
+            padding: 20px;
+        }}
+        .container {{ max-width: 1200px; margin: 0 auto; }}
+        
+        header {{
+            background: white;
+            padding: 30px;
+            border-radius: 8px;
+            box-shadow: 0 2px 8px rgba(0,0,0,0.1);
+            margin-bottom: 20px;
+        }}
+        h1 {{ color: #333; margin-bottom: 10px; font-size: 24px; }}
+        .back-link {{ color: #007bff; text-decoration: none; margin-bottom: 10px; display: inline-block; }}
+        .back-link:hover {{ text-decoration: underline; }}
+        
+        .info-grid {{
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
+            gap: 15px;
+            margin-top: 20px;
+        }}
+        .info-item {{ padding: 10px 0; }}
+        .info-label {{ color: #666; font-size: 14px; }}
+        .info-value {{ color: #333; font-size: 18px; font-weight: 600; margin-top: 4px; }}
+        
+        .section {{
+            background: white;
+            padding: 30px;
+            border-radius: 8px;
+            box-shadow: 0 2px 8px rgba(0,0,0,0.1);
+            margin-bottom: 20px;
+        }}
+        h2 {{ color: #333; margin-bottom: 20px; font-size: 20px; }}
+        
+        .round {{
+            border-left: 3px solid #007bff;
+            padding: 20px;
+            margin-bottom: 20px;
+            background: #f8f9fa;
+            border-radius: 4px;
+        }}
+        .round-header {{
+            display: flex;
+            justify-content: space-between;
+            margin-bottom: 15px;
+            font-size: 14px;
+        }}
+        .round-number {{ font-weight: 600; color: #007bff; }}
+        .round-time {{ color: #666; }}
+        .round-tokens {{ color: #333; }}
+        
+        .messages {{ margin: 15px 0; }}
+        .message {{
+            padding: 10px;
+            margin: 5px 0;
+            border-radius: 4px;
+            font-size: 14px;
+            line-height: 1.6;
+        }}
+        .message-system {{ background: #fff3cd; }}
+        .message-user {{ background: #d1ecf1; }}
+        .message-assistant {{ background: #d4edda; }}
+        .message-tool {{ background: #e2e3e5; }}
+        
+        .question, .answer, .reasoning, .tool-calls {{
+            margin: 10px 0;
+            padding: 10px;
+            background: white;
+            border-radius: 4px;
+            font-size: 14px;
+            line-height: 1.6;
+        }}
+        .question {{ border-left: 3px solid #ffc107; }}
+        .answer {{ border-left: 3px solid #28a745; }}
+        .reasoning {{ border-left: 3px solid #17a2b8; }}
+        .tool-calls {{ border-left: 3px solid #6c757d; }}
+        .tool-calls ul {{ margin-left: 20px; margin-top: 5px; }}
+        
+        .token-details {{
+            margin: 10px 0;
+            padding: 10px;
+            background: white;
+            border-radius: 4px;
+            font-size: 13px;
+            border-left: 3px solid #17a2b8;
+        }}
+        .token-details ul {{ margin-left: 20px; margin-top: 5px; color: #666; }}
+        
+        .token-badge {{
+            display: inline-block;
+            padding: 2px 6px;
+            border-radius: 3px;
+            font-size: 11px;
+            margin-left: 5px;
+        }}
+        .token-badge-cached {{
+            background: #d4edda;
+            color: #155724;
+        }}
+        .token-badge-reasoning {{
+            background: #cce5ff;
+            color: #004085;
+        }}
+        
+        .badge {{
+            display: inline-block;
+            padding: 4px 8px;
+            border-radius: 4px;
+            font-size: 12px;
+            font-weight: 500;
+            background: #e3f2fd;
+            color: #1976d2;
+        }}
+    </style>
+</head>
+<body>
+    <div class="container">
+        <header>
+            <a href="/" class="back-link">← 返回列表</a>
+            <h1>📊 Session Detail</h1>
+            <p style="color: #666; font-family: monospace; font-size: 14px; margin-top: 10px;">{session_id}</p>
+            
+            <div class="info-grid">
+                <div class="info-item">
+                    <div class="info-label">模型</div>
+                    <div class="info-value"><span class="badge">{session.get('model', 'unknown')}</span></div>
+                </div>
+                <div class="info-item">
+                    <div class="info-label">消息数</div>
+                    <div class="info-value">{session.get('messages_count', 0)}</div>
+                </div>
+                <div class="info-item">
+                    <div class="info-label">总Token</div>
+                    <div class="info-value">{session['total_input_tokens'] + session['total_output_tokens']:,}</div>
+                </div>
+                <div class="info-item">
+                    <div class="info-label">成本</div>
+                    <div class="info-value">${cost:.6f}</div>
+                </div>
+            </div>
+        </header>
+        
+        <div class="section">
+            <h2>💬 对话记录 ({len(session.get('rounds', []))} 轮)</h2>
+            {"".join(rounds_html) if rounds_html else '<p style="color: #666;">暂无对话记录</p>'}
+        </div>
+    </div>
+</body>
+</html>'''
+    
+    def escape_html(self, text: str) -> str:
+        """转义HTML特殊字符"""
+        return (text.replace('&', '&amp;')
+                   .replace('<', '&lt;')
+                   .replace('>', '&gt;')
+                   .replace('"', '&quot;')
+                   .replace("'", '&#39;'))
+    
+    def log_message(self, format, *args):
+        """重写日志方法，简化输出"""
+        pass  # 不打印每个请求
+
+
+def create_handler(data_dir):
+    """创建带数据目录的处理器"""
+    def handler(*args, **kwargs):
+        return SessionMonitorHandler(*args, data_dir=data_dir, **kwargs)
+    return handler
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Agent Session Monitor - Web Server",
+        formatter_class=argparse.RawDescriptionHelpFormatter
+    )
+    
+    parser.add_argument(
+        '--data-dir',
+        default='./sessions',
+        help='Session数据目录（默认: ./sessions）'
+    )
+    
+    parser.add_argument(
+        '--port',
+        type=int,
+        default=8888,
+        help='HTTP服务器端口（默认: 8888）'
+    )
+    
+    parser.add_argument(
+        '--host',
+        default='0.0.0.0',
+        help='HTTP服务器地址（默认: 0.0.0.0）'
+    )
+    
+    args = parser.parse_args()
+    
+    # 检查数据目录是否存在
+    data_dir = Path(args.data_dir)
+    if not data_dir.exists():
+        print(f"❌ Error: Data directory not found: {data_dir}")
+        print(f"   Please run main.py first to generate session data.")
+        sys.exit(1)
+    
+    # 创建HTTP服务器
+    handler_class = create_handler(args.data_dir)
+    server = HTTPServer((args.host, args.port), handler_class)
+    
+    print(f"{'=' * 60}")
+    print(f"🌐 Agent Session Monitor - Web Server")
+    print(f"{'=' * 60}")
+    print()
+    print(f"📂 Data directory: {args.data_dir}")
+    print(f"🌍 Server address: http://{args.host}:{args.port}")
+    print()
+    print(f"✅ Server started. Press Ctrl+C to stop.")
+    print(f"{'=' * 60}")
+    print()
+    
+    try:
+        server.serve_forever()
+    except KeyboardInterrupt:
+        print("\n\n👋 Shutting down server...")
+        server.shutdown()
+
+
+if __name__ == '__main__':
+    main()
--- a/.claude/skills/higress-auto-router/SKILL.md
+++ b/.claude/skills/higress-auto-router/SKILL.md
@@ -0,0 +1,139 @@
+---
+name: higress-auto-router
+description: "Configure automatic model routing using the get-ai-gateway.sh CLI tool for Higress AI Gateway. Use when: (1) User wants to configure automatic model routing, (2) User mentions 'route to', 'switch model', 'use model when', 'auto routing', (3) User describes scenarios that should trigger specific models, (4) User wants to add, list, or remove routing rules."
+---
+
+# Higress Auto Router
+
+Configure automatic model routing using the get-ai-gateway.sh CLI tool for intelligent model selection based on message content triggers.
+
+## Prerequisites
+
+- Higress AI Gateway running (container name: `higress-ai-gateway`)
+- get-ai-gateway.sh script downloaded
+
+## CLI Commands
+
+### Add a Routing Rule
+
+```bash
+./get-ai-gateway.sh route add --model <model-name> --trigger "<trigger-phrases>"
+```
+
+**Options:**
+- `--model MODEL` (required): Target model to route to
+- `--trigger PHRASE`: Trigger phrase(s), separated by `|` (e.g., `"深入思考|deep thinking"`)
+- `--pattern REGEX`: Custom regex pattern (alternative to `--trigger`)
+
+**Examples:**
+
+```bash
+# Route complex reasoning to Claude
+./get-ai-gateway.sh route add \
+  --model claude-opus-4.5 \
+  --trigger "深入思考|deep thinking"
+
+# Route coding tasks to Qwen Coder
+./get-ai-gateway.sh route add \
+  --model qwen-coder \
+  --trigger "写代码|code:|coding:"
+
+# Route creative writing
+./get-ai-gateway.sh route add \
+  --model gpt-4o \
+  --trigger "创意写作|creative:"
+
+# Use custom regex pattern
+./get-ai-gateway.sh route add \
+  --model deepseek-chat \
+  --pattern "(?i)^(数学题|math:)"
+```
+
+### List Routing Rules
+
+```bash
+./get-ai-gateway.sh route list
+```
+
+Output:
+```
+Default model: qwen-turbo
+
+ID   Pattern                                  Model               
+----------------------------------------------------------------------
+0    (?i)^(深入思考|deep thinking)             claude-opus-4.5     
+1    (?i)^(写代码|code:|coding:)               qwen-coder          
+```
+
+### Remove a Routing Rule
+
+```bash
+./get-ai-gateway.sh route remove --rule-id <id>
+```
+
+**Example:**
+```bash
+# Remove rule with ID 0
+./get-ai-gateway.sh route remove --rule-id 0
+```
+
+## Common Trigger Mappings
+
+| Scenario | Suggested Triggers | Recommended Model |
+|----------|-------------------|-------------------|
+| Complex reasoning | `深入思考\|deep thinking` | claude-opus-4.5, o1 |
+| Coding tasks | `写代码\|code:\|coding:` | qwen-coder, deepseek-coder |
+| Creative writing | `创意写作\|creative:` | gpt-4o, claude-sonnet |
+| Translation | `翻译:\|translate:` | gpt-4o, qwen-max |
+| Math problems | `数学题\|math:` | deepseek-r1, o1-mini |
+| Quick answers | `快速回答\|quick:` | qwen-turbo, gpt-4o-mini |
+
+## Usage Flow
+
+1. **User Request:** "我希望在解决困难问题时路由到claude-opus-4.5"
+
+2. **Execute CLI:**
+   ```bash
+   ./get-ai-gateway.sh route add \
+     --model claude-opus-4.5 \
+     --trigger "深入思考|deep thinking"
+   ```
+
+3. **Response to User:**
+   ```
+   ✅ 自动路由配置完成！
+   
+   触发方式：以 "深入思考" 或 "deep thinking" 开头
+   目标模型：claude-opus-4.5
+   
+   使用示例：
+   - 深入思考 这道算法题应该怎么解？
+   - deep thinking What's the best architecture?
+   
+   提示：确保请求中 model 参数为 'higress/auto'
+   ```
+
+## How Auto-Routing Works
+
+1. User sends request with `model: "higress/auto"`
+2. Higress checks message content against routing rules
+3. If a trigger pattern matches, routes to the specified model
+4. If no match, uses the default model (e.g., `qwen-turbo`)
+
+## Configuration File
+
+Rules are stored in the container at:
+```
+/data/wasmplugins/model-router.internal.yaml
+```
+
+The CLI tool automatically:
+- Edits the configuration file
+- Triggers hot-reload (no container restart needed)
+- Validates YAML syntax
+
+## Error Handling
+
+- **Container not running:** Start with `./get-ai-gateway.sh start`
+- **Rule ID not found:** Use `route list` to see valid IDs
+- **Invalid model:** Check configured providers in Higress Console
--- a/.claude/skills/higress-daily-report/README.md
+++ b/.claude/skills/higress-daily-report/README.md
@@ -0,0 +1,198 @@
+# Higress 社区治理日报 - Clawdbot Skill
+
+这个 skill 让 AI 助手通过 Clawdbot 自动追踪 Higress 项目的 GitHub 活动，并生成结构化的每日社区治理报告。
+
+## 架构概览
+
+```
+┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
+│    Clawdbot     │────▶│  AI + Skill     │────▶│   GitHub API    │
+│   (Gateway)     │     │                 │     │   (gh CLI)      │
+└─────────────────┘     └─────────────────┘     └─────────────────┘
+        │                       │
+        │                       ▼
+        │               ┌─────────────────┐
+        │               │  数据文件        │
+        │               │  - tracking.json│
+        │               │  - knowledge.md │
+        │               └─────────────────┘
+        │                       │
+        ▼                       ▼
+┌─────────────────┐     ┌─────────────────┐
+│  Discord/Slack  │◀────│    日报输出      │
+│   Channel       │     │                 │
+└─────────────────┘     └─────────────────┘
+```
+
+## 什么是 Clawdbot？
+
+[Clawdbot](https://github.com/clawdbot/clawdbot) 是一个 AI Agent 网关，可以将 Claude、GPT、GLM 等 AI 模型连接到各种消息平台（Discord、Slack、Telegram 等）和工具（GitHub CLI、浏览器、文件系统等）。
+
+通过 Clawdbot，AI 助手可以：
+- 接收来自 Discord 等平台的消息
+- 执行 shell 命令（如 `gh` CLI）
+- 读写文件
+- 定时执行任务（cron）
+- 将生成的内容发送回消息平台
+
+## 工作流程
+
+### 1. 定时触发
+
+通过 Clawdbot 的 cron 功能，每天定时触发日报生成：
+
+```
+# Clawdbot 配置示例
+cron:
+  - schedule: "0 9 * * *"  # 每天早上 9 点
+    task: "生成 Higress 昨日日报并发送到 #issue-pr-notify 频道"
+```
+
+### 2. Skill 加载
+
+当 AI 助手收到生成日报的指令时，会自动加载此 skill（SKILL.md），获取：
+- 数据获取方法（gh CLI 命令）
+- 数据结构定义
+- 日报格式模板
+- 知识库维护规则
+
+### 3. 数据获取
+
+AI 助手使用 GitHub CLI 获取数据：
+
+```bash
+# 获取昨日新建的 issues
+gh search issues --repo alibaba/higress --created yesterday --json number,title,author,url,body,state,labels
+
+# 获取昨日新建的 PRs
+gh search prs --repo alibaba/higress --created yesterday --json number,title,author,url,body,state
+
+# 获取特定 issue 的评论
+gh api repos/alibaba/higress/issues/{number}/comments
+```
+
+### 4. 状态追踪
+
+AI 助手维护一个 JSON 文件追踪每个 issue 的状态：
+
+```json
+{
+  "issues": [
+    {
+      "number": 3398,
+      "title": "浏览器发起的options请求报401",
+      "lastCommentCount": 13,
+      "status": "waiting_for_user",
+      "waitingFor": "用户验证解决方案"
+    }
+  ]
+}
+```
+
+### 5. 知识沉淀
+
+当 issue 被解决时，AI 助手会将问题模式和解决方案记录到知识库：
+
+```markdown
+## KB-001: OPTIONS 预检请求被认证拦截
+
+**问题**: 浏览器 OPTIONS 请求返回 401
+**根因**: key-auth 在 AUTHN 阶段执行，先于 CORS
+**解决方案**: 为 OPTIONS 请求创建单独路由，不启用认证插件
+**关联 Issue**: #3398
+```
+
+### 6. 日报生成
+
+最终生成结构化日报，包含：
+- 📋 概览统计
+- 📌 新增 Issues
+- 🔀 新增 PRs
+- 🔔 Issue 动态（新评论、已解决）
+- ⏰ 跟进提醒
+- 📚 知识沉淀
+
+### 7. 消息推送
+
+AI 助手通过 Clawdbot 将日报发送到指定的 Discord 频道。
+
+## 快速开始
+
+### 前置要求
+
+1. 安装并配置 [Clawdbot](https://github.com/clawdbot/clawdbot)
+2. 配置 GitHub CLI (`gh`) 并登录
+3. 配置消息平台（如 Discord）
+
+### 配置 Skill
+
+将此 skill 目录复制到 Clawdbot 的 skills 目录：
+
+```bash
+cp -r .claude/skills/higress-daily-report ~/.clawdbot/skills/
+```
+
+### 使用方式
+
+**手动触发：**
+```
+生成 Higress 昨日日报
+```
+
+**定时触发（推荐）：**
+在 Clawdbot 配置中添加 cron 任务，每天自动生成并推送日报。
+
+## 文件说明
+
+```
+higress-daily-report/
+├── README.md           # 本文件
+├── SKILL.md            # Skill 定义（AI 助手读取）
+└── scripts/
+    └── generate-report.sh  # 辅助脚本（可选）
+```
+
+## 自定义
+
+### 修改日报格式
+
+编辑 `SKILL.md` 中的「日报格式」章节。
+
+### 添加新的追踪维度
+
+在 `SKILL.md` 的数据结构中添加新字段。
+
+### 调整知识库规则
+
+修改 `SKILL.md` 中的「知识沉淀」章节。
+
+## 示例日报
+
+```markdown
+📊 Higress 项目每日报告 - 2026-01-29
+
+📋 概览
+• 新增 Issues: 2 个
+• 新增 PRs: 3 个
+• 待跟进: 1 个
+
+📌 新增 Issues
+• #3399: 网关启动失败问题
+  - 作者: user123
+  - 标签: bug
+
+🔔 Issue 动态
+✅ 已解决
+• #3398: OPTIONS 请求 401 问题
+  - 知识库: KB-001
+
+⏰ 跟进提醒
+🟡 等待反馈
+• #3396: 等待用户提供配置信息（2天）
+```
+
+## 相关链接
+
+- [Clawdbot 文档](https://docs.clawd.bot)
+- [Higress 项目](https://github.com/alibaba/higress)
+- [GitHub CLI 文档](https://cli.github.com/manual/)
--- a/.claude/skills/higress-daily-report/SKILL.md
+++ b/.claude/skills/higress-daily-report/SKILL.md
@@ -0,0 +1,257 @@
+---
+name: higress-daily-report
+description: 生成 Higress 项目每日报告，追踪 issue/PR 动态，沉淀问题处理经验，驱动社区问题闭环。用于生成日报、跟进 issue、记录解决方案。
+---
+
+# Higress Daily Report
+
+驱动 Higress 社区问题处理的智能工作流。
+
+## 核心目标
+
+1. **每日感知** - 追踪新 issues/PRs 和评论动态
+2. **进度跟踪** - 确保每个 issue 被持续跟进直到关闭
+3. **知识沉淀** - 积累问题分析和解决方案，提升处理能力
+4. **闭环驱动** - 通过日报推动问题解决，避免遗忘
+
+## 数据文件
+
+| 文件 | 用途 |
+|------|------|
+| `/root/clawd/memory/higress-issue-tracking.json` | Issue 追踪状态（评论数、跟进状态） |
+| `/root/clawd/memory/higress-knowledge-base.md` | 知识库：问题模式、解决方案、经验教训 |
+| `/root/clawd/reports/report_YYYY-MM-DD.md` | 每日报告存档 |
+
+## 工作流程
+
+### 1. 获取每日数据
+
+```bash
+# 获取昨日 issues
+gh search issues --repo alibaba/higress --created yesterday --json number,title,author,url,body,state,labels --limit 50
+
+# 获取昨日 PRs
+gh search prs --repo alibaba/higress --created yesterday --json number,title,author,url,body,state,additions,deletions,reviewDecision --limit 50
+```
+
+### 2. Issue 追踪状态管理
+
+**追踪数据结构** (`higress-issue-tracking.json`)：
+
+```json
+{
+  "date": "2026-01-28",
+  "issues": [
+    {
+      "number": 3398,
+      "title": "Issue 标题",
+      "state": "open",
+      "author": "username",
+      "url": "https://github.com/...",
+      "created_at": "2026-01-27",
+      "comment_count": 11,
+      "last_comment_by": "johnlanni",
+      "last_comment_at": "2026-01-28",
+      "follow_up_status": "waiting_user",
+      "follow_up_note": "等待用户提供请求日志",
+      "priority": "high",
+      "category": "cors",
+      "solution_ref": "KB-001"
+    }
+  ]
+}
+```
+
+**跟进状态枚举**：
+- `new` - 新 issue，待分析
+- `analyzing` - 正在分析中
+- `waiting_user` - 等待用户反馈
+- `waiting_review` - 等待 PR review
+- `in_progress` - 修复进行中
+- `resolved` - 已解决（待关闭）
+- `closed` - 已关闭
+- `wontfix` - 不予修复
+- `stale` - 超过 7 天无活动
+
+### 3. 知识库结构
+
+**知识库** (`higress-knowledge-base.md`) 用于沉淀经验：
+
+```markdown
+# Higress 问题知识库
+
+## 问题模式索引
+
+### 认证与跨域类
+- KB-001: OPTIONS 预检请求被认证拦截
+- KB-002: CORS 配置不生效
+
+### 路由配置类
+- KB-010: 路由状态 address 为空
+- KB-011: 服务发现失败
+
+### 部署运维类
+- KB-020: Helm 安装问题
+- KB-021: 升级兼容性问题
+
+---
+
+## KB-001: OPTIONS 预检请求被认证拦截
+
+**问题特征**：
+- 浏览器 OPTIONS 请求返回 401
+- 已配置 CORS 和认证插件
+
+**根因分析**：
+Higress 插件执行阶段优先级：AUTHN (310) > AUTHZ (340) > STATS
+- key-auth 在 AUTHN 阶段执行
+- CORS 在 AUTHZ 阶段执行
+- OPTIONS 请求先被 key-auth 拦截，CORS 无机会处理
+
+**解决方案**：
+1. **推荐**：修改 CORS 插件 stage 从 AUTHZ 改为 AUTHN
+2. **Workaround**：创建 OPTIONS 专用路由，不启用认证
+3. **Workaround**：使用实例级 CORS 配置
+
+**关联 Issue**：#3398
+
+**学到的经验**：
+- 排查跨域问题时，首先确认插件执行顺序
+- Higress 阶段优先级由 phase 决定，不是 priority 数值
+```
+
+### 4. 日报生成规则
+
+**报告结构**：
+
+```markdown
+# 📊 Higress 项目每日报告 - YYYY-MM-DD
+
+## 📋 概览
+- 统计时间: YYYY-MM-DD
+- 新增 Issues: X 个
+- 新增 PRs: X 个
+- 待跟进 Issues: X 个
+- 本周关闭: X 个
+
+## 📌 新增 Issues
+（按优先级排序，包含分类标签）
+
+## 🔀 新增 PRs
+（包含代码变更量和 review 状态）
+
+## 🔔 Issue 动态
+（有新评论的 issues，标注最新进展）
+
+## ⏰ 跟进提醒
+
+### 🔴 需要立即处理
+（等待我方回复超过 24h 的 issues）
+
+### 🟡 等待用户反馈
+（等待用户回复的 issues，标注等待天数）
+
+### 🟢 进行中
+（正在处理的 issues）
+
+### ⚪ 已过期
+（超过 7 天无活动的 issues，需决定是否关闭）
+
+## 📚 本周知识沉淀
+（新增的知识库条目摘要）
+```
+
+### 5. 智能分析能力
+
+生成日报时，对每个新 issue 进行初步分析：
+
+1. **问题分类** - 根据标题和内容判断类别
+2. **知识库匹配** - 检索相似问题的解决方案
+3. **优先级评估** - 根据影响范围和紧急程度
+4. **建议回复** - 基于知识库生成初步回复建议
+
+### 6. Issue 跟进触发
+
+当用户在 Discord 中提到以下关键词时触发跟进记录：
+
+**完成跟进**：
+- "已跟进 #xxx"
+- "已回复 #xxx"
+- "issue #xxx 已处理"
+
+**记录解决方案**：
+- "issue #xxx 的问题是..."
+- "#xxx 根因是..."
+- "#xxx 解决方案..."
+
+触发后更新追踪状态和知识库。
+
+## 执行检查清单
+
+每次生成日报时：
+
+- [ ] 获取昨日新 issues 和 PRs
+- [ ] 加载追踪数据，检查评论变化
+- [ ] 对比 `last_comment_by` 判断是等待用户还是等待我方
+- [ ] 超过 7 天无活动的 issue 标记为 stale
+- [ ] 检索知识库，为新 issue 匹配相似问题
+- [ ] 生成报告并保存到 `/root/clawd/reports/`
+- [ ] 更新追踪数据
+- [ ] 发送到 Discord channel:1465549185632702591
+- [ ] 格式：使用列表而非表格（Discord 不支持 Markdown 表格）
+
+## 知识库维护
+
+### 新增条目时机
+
+1. Issue 被成功解决后
+2. 发现新的问题模式
+3. 踩坑后的经验总结
+
+### 条目模板
+
+```markdown
+## KB-XXX: 问题简述
+
+**问题特征**：
+- 症状1
+- 症状2
+
+**根因分析**：
+（技术原因说明）
+
+**解决方案**：
+1. 推荐方案
+2. 备选方案
+
+**关联 Issue**：#xxx
+
+**学到的经验**：
+- 经验1
+- 经验2
+```
+
+## 命令参考
+
+```bash
+# 查看 issue 详情和评论
+gh issue view <number> --repo alibaba/higress --json number,title,state,comments,author,createdAt,labels,url
+
+# 查看 issue 评论
+gh issue view <number> --repo alibaba/higress --comments
+
+# 发送 issue 评论
+gh issue comment <number> --repo alibaba/higress --body "评论内容"
+
+# 关闭 issue
+gh issue close <number> --repo alibaba/higress --reason completed
+
+# 添加标签
+gh issue edit <number> --repo alibaba/higress --add-label "bug"
+```
+
+## Discord 输出
+
+- 频道: `channel:1465549185632702591`
+- 格式: 纯文本 + emoji + 链接（用 `<url>` 抑制预览）
+- 长度: 单条消息不超过 2000 字符，超过则分多条发送
--- a/.claude/skills/higress-daily-report/scripts/generate-report.sh
+++ b/.claude/skills/higress-daily-report/scripts/generate-report.sh
@@ -0,0 +1,273 @@
+#!/bin/bash
+# Higress Daily Report Generator
+# Generates daily report for alibaba/higress repository
+
+# set -e  # 临时禁用以调试
+
+REPO="alibaba/higress"
+CHANNEL="1465549185632702591"
+DATE=$(date +"%Y-%m-%d")
+REPORT_DIR="/root/clawd/reports"
+TRACKING_DIR="/root/clawd/memory"
+RECORD_FILE="${TRACKING_DIR}/higress-issue-process-record.md"
+
+mkdir -p "$REPORT_DIR" "$TRACKING_DIR"
+
+echo "=== Higress Daily Report - $DATE ==="
+
+# Get yesterday's date
+YESTERDAY=$(date -d "yesterday" +"%Y-%m-%d" 2>/dev/null || date -v-1d +"%Y-%m-%d")
+
+echo "Fetching issues created on $YESTERDAY..."
+
+# Fetch issues created yesterday
+ISSUES=$(gh search issues --repo "${REPO}" --state open --created "${YESTERDAY}..${YESTERDAY}" --json number,title,labels,author,url,body,state --limit 50 2>/dev/null)
+
+if [ -z "$ISSUES" ]; then
+    ISSUES_COUNT=0
+else
+    ISSUES_COUNT=$(echo "$ISSUES" | jq 'length' 2>/dev/null || echo "0")
+fi
+
+# Fetch PRs created yesterday
+PRS=$(gh search prs --repo "${REPO}" --state open --created "${YESTERDAY}..${YESTERDAY}" --json number,title,labels,author,url,reviewDecision,additions,deletions,body,state --limit 50 2>/dev/null)
+
+if [ -z "$PRS" ]; then
+    PRS_COUNT=0
+else
+    PRS_COUNT=$(echo "$PRS" | jq 'length' 2>/dev/null || echo "0")
+fi
+
+echo "Found: $ISSUES_COUNT issues, $PRS_COUNT PRs"
+
+# Build report
+REPORT="📊 **Higress 项目每日报告 - ${DATE}**
+
+**📋 概览**
+- 统计时间: ${YESTERDAY} 全天
+- 新增 Issues: **${ISSUES_COUNT}** 个
+- 新增 PRs: **${PRS_COUNT}** 个
+
+---
+
+"
+
+# Process issues
+if [ "$ISSUES_COUNT" -gt 0 ]; then
+    REPORT="${REPORT}**📌 Issues 详情**
+
+"
+
+    # Use a temporary file to avoid subshell variable scoping issues
+    ISSUE_DETAILS=$(mktemp)
+
+    echo "$ISSUES" | jq -r '.[] | @json' | while IFS= read -r ISSUE; do
+        NUM=$(echo "$ISSUE" | jq -r '.number')
+        TITLE=$(echo "$ISSUE" | jq -r '.title')
+        URL=$(echo "$ISSUE" | jq -r '.url')
+        AUTHOR=$(echo "$ISSUE" | jq -r '.author.login')
+        BODY=$(echo "$ISSUE" | jq -r '.body // ""')
+        LABELS=$(echo "$ISSUE" | jq -r '.labels[]?.name // ""' | head -1)
+
+        # Determine emoji
+        EMOJI="📝"
+        echo "$LABELS" | grep -q "priority/high" && EMOJI="🔴"
+        echo "$LABELS" | grep -q "type/bug" && EMOJI="🐛"
+        echo "$LABELS" | grep -q "type/enhancement" && EMOJI="✨"
+
+        # Extract content
+        CONTENT=$(echo "$BODY" | head -n 8 | sed 's/```.*```//g' | sed 's/`//g' | tr '\n' ' ' | head -c 300)
+
+        if [ -z "$CONTENT" ]; then
+            CONTENT="无详细描述"
+        fi
+
+        if [ ${#CONTENT} -eq 300 ]; then
+            CONTENT="${CONTENT}..."
+        fi
+
+        # Append to temporary file
+        echo "${EMOJI} **[#${NUM}](${URL})**: ${TITLE}
+ 👤 @${AUTHOR}
+ 📝 ${CONTENT}
+" >> "$ISSUE_DETAILS"
+    done
+
+    # Read from temp file and append to REPORT
+    REPORT="${REPORT}$(cat $ISSUE_DETAILS)"
+
+    rm -f "$ISSUE_DETAILS"
+fi
+
+REPORT="${REPORT}
+---
+
+"
+
+# Process PRs
+if [ "$PRS_COUNT" -gt 0 ]; then
+    REPORT="${REPORT}**🔀 PRs 详情**
+
+"
+
+    # Use a temporary file to avoid subshell variable scoping issues
+    PR_DETAILS=$(mktemp)
+
+    echo "$PRS" | jq -r '.[] | @json' | while IFS= read -r PR; do
+        NUM=$(echo "$PR" | jq -r '.number')
+        TITLE=$(echo "$PR" | jq -r '.title')
+        URL=$(echo "$PR" | jq -r '.url')
+        AUTHOR=$(echo "$PR" | jq -r '.author.login')
+        ADDITIONS=$(echo "$PR" | jq -r '.additions')
+        DELETIONS=$(echo "$PR" | jq -r '.deletions')
+        REVIEW=$(echo "$PR" | jq -r '.reviewDecision // "pending"')
+        BODY=$(echo "$PR" | jq -r '.body // ""')
+
+        # Determine status
+        STATUS="👀"
+        [ "$REVIEW" = "APPROVED" ] && STATUS="✅"
+        [ "$REVIEW" = "CHANGES_REQUESTED" ] && STATUS="🔄"
+
+        # Calculate size
+        TOTAL=$((ADDITIONS + DELETIONS))
+        SIZE="M"
+        [ $TOTAL -lt 100 ] && SIZE="XS"
+        [ $TOTAL -lt 500 ] && SIZE="S"
+        [ $TOTAL -lt 1000 ] && SIZE="M"
+        [ $TOTAL -lt 5000 ] && SIZE="L"
+        [ $TOTAL -ge 5000 ] && SIZE="XL"
+
+        # Extract content
+        CONTENT=$(echo "$BODY" | head -n 8 | sed 's/```.*```//g' | sed 's/`//g' | tr '\n' ' ' | head -c 300)
+
+        if [ -z "$CONTENT" ]; then
+            CONTENT="无详细描述"
+        fi
+
+        if [ ${#CONTENT} -eq 300 ]; then
+            CONTENT="${CONTENT}..."
+        fi
+
+        # Append to temporary file
+        echo "${STATUS} **[#${NUM}](${URL})**: ${TITLE} ${SIZE}
+ 👤 @${AUTHOR} | ${STATUS} | 变更: +${ADDITIONS}/-${DELETIONS}
+ 📝 ${CONTENT}
+" >> "$PR_DETAILS"
+    done
+
+    # Read from temp file and append to REPORT
+    REPORT="${REPORT}$(cat $PR_DETAILS)"
+
+    rm -f "$PR_DETAILS"
+fi
+
+# Check for new comments on tracked issues
+TRACKING_FILE="${TRACKING_DIR}/higress-issue-tracking.json"
+
+echo ""
+echo "Checking for new comments on tracked issues..."
+
+# Load previous tracking data
+if [ -f "$TRACKING_FILE" ]; then
+    PREV_TRACKING=$(cat "$TRACKING_FILE")
+    PREV_ISSUES=$(echo "$PREV_TRACKING" | jq -r '.issues[]?.number // empty' 2>/dev/null)
+
+    if [ -n "$PREV_ISSUES" ]; then
+        REPORT="${REPORT}**🔔 Issue跟进（新评论）**"
+
+        HAS_NEW_COMMENTS=false
+
+        for issue_num in $PREV_ISSUES; do
+            # Get current comment count
+            CURRENT_INFO=$(gh issue view "$issue_num" --repo "$REPO" --json number,title,state,comments,url 2>/dev/null)
+            if [ -n "$CURRENT_INFO" ]; then
+                CURRENT_COUNT=$(echo "$CURRENT_INFO" | jq '.comments | length')
+                CURRENT_TITLE=$(echo "$CURRENT_INFO" | jq -r '.title')
+                CURRENT_STATE=$(echo "$CURRENT_INFO" | jq -r '.state')
+                ISSUE_URL=$(echo "$CURRENT_INFO" | jq -r '.url')
+                PREV_COUNT=$(echo "$PREV_TRACKING" | jq -r ".issues[] | select(.number == $issue_num) | .comment_count // 0")
+
+                if [ -z "$PREV_COUNT" ]; then
+                    PREV_COUNT=0
+                fi
+
+                NEW_COMMENTS=$((CURRENT_COUNT - PREV_COUNT))
+
+                if [ "$NEW_COMMENTS" -gt 0 ]; then
+                    HAS_NEW_COMMENTS=true
+                    REPORT="${REPORT}
+
+• [#${issue_num}](${ISSUE_URL}) ${CURRENT_TITLE}
+  📬 +${NEW_COMMENTS}条新评论（总计: ${CURRENT_COUNT}） | 状态: ${CURRENT_STATE}"
+                fi
+            fi
+        done
+
+        if [ "$HAS_NEW_COMMENTS" = false ]; then
+            REPORT="${REPORT}
+
+• 暂无新评论"
+        fi
+
+        REPORT="${REPORT}
+
+---
+"
+    fi
+fi
+
+# Save current tracking data for tomorrow
+echo "Saving issue tracking data for follow-up..."
+
+if [ -z "$ISSUES" ]; then
+    TRACKING_DATA='{"date":"'"$DATE"'","issues":[]}'
+else
+    TRACKING_DATA=$(echo "$ISSUES" | jq '{
+  date: "'"$DATE"'",
+  issues: [.[] | {
+    number: .number,
+    title: .title,
+    state: .state,
+    comment_count: 0,
+    url: .url
+  }]
+}')
+fi
+
+echo "$TRACKING_DATA" > "$TRACKING_FILE"
+echo "Tracking data saved to $TRACKING_FILE"
+
+# Save report to file
+REPORT_FILE="${REPORT_DIR}/report_${DATE}.md"
+echo "$REPORT" > "$REPORT_FILE"
+echo "Report saved to $REPORT_FILE"
+
+# Follow-up reminder
+FOLLOWUP_ISSUES=$(echo "$PREV_TRACKING" | jq -r '[.issues[] | select(.comment_count > 0 or .state == "open")] | "#\(.number) [\(.title)]"' 2>/dev/null || echo "")
+
+if [ -n "$FOLLOWUP_ISSUES" ]; then
+    REPORT="${REPORT}
+
+**📌 需要跟进的Issues**
+
+以下Issues需要跟进处理：
+${FOLLOWUP_ISSUES}
+
+---
+
+"
+fi
+
+# Footer
+REPORT="${REPORT}
+---
+📅 生成时间: $(date +"%Y-%m-%d %H:%M:%S %Z")
+🔗 项目: https://github.com/${REPO}
+🤖 本报告由 AI 辅助生成，所有链接均可点击跳转
+"
+
+# Send report
+echo "Sending report to Discord..."
+echo "$REPORT" | /root/.nvm/versions/node/v24.13.0/bin/clawdbot message send --channel discord -t "$CHANNEL" -m "$(cat -)"
+
+echo "Done!"
--- a/.claude/skills/higress-openclaw-integration/SKILL.md
+++ b/.claude/skills/higress-openclaw-integration/SKILL.md
@@ -0,0 +1,259 @@
+---
+name: higress-openclaw-integration
+description: "Deploy and configure Higress AI Gateway for OpenClaw integration. Use when: (1) User wants to deploy Higress AI Gateway, (2) User wants to configure OpenClaw to use more model providers, (3) User mentions 'higress', 'ai gateway', 'model gateway', 'AI网关', (4) User wants to set up model routing or auto-routing, (5) User needs to manage LLM provider API keys."
+---
+
+# Higress AI Gateway Integration
+
+Deploy Higress AI Gateway and configure OpenClaw to use it as a unified model provider.
+
+## Quick Start
+
+### Step 1: Collect Information from User
+
+**Ask the user for the following information upfront:**
+
+1. **Which LLM provider(s) to use?** (at least one required)
+
+   **Commonly Used Providers:**
+
+   | Provider | Parameter | Notes |
+   |----------|-----------|-------|
+   | 智谱 / z.ai | `--zhipuai-key` | Models: glm-*, Code Plan mode enabled by default |
+   | Claude Code | `--claude-code-key` | **Requires OAuth token from `claude setup-token`** |
+   | Moonshot (Kimi) | `--moonshot-key` | Models: moonshot-*, kimi-* |
+   | Minimax | `--minimax-key` | Models: abab-* |
+   | 阿里云通义千问 (Dashscope) | `--dashscope-key` | Models: qwen* |
+   | OpenAI | `--openai-key` | Models: gpt-*, o1-*, o3-* |
+   | DeepSeek | `--deepseek-key` | Models: deepseep-* |
+   | Grok | `--grok-key` | Models: grok-* |
+
+   **Other Providers:**
+
+   | Provider | Parameter | Notes |
+   |----------|-----------|-------|
+   | Claude | `--claude-key` | Models: claude-* |
+   | Google Gemini | `--gemini-key` | Models: gemini-* |
+   | OpenRouter | `--openrouter-key` | Supports all models (catch-all) |
+   | Groq | `--groq-key` | Fast inference |
+   | Doubao (豆包) | `--doubao-key` | Models: doubao-* |
+   | Mistral | `--mistral-key` | Models: mistral-* |
+   | Baichuan (百川) | `--baichuan-key` | Models: Baichuan* |
+   | 01.AI (Yi) | `--yi-key` | Models: yi-* |
+   | Stepfun (阶跃星辰) | `--stepfun-key` | Models: step-* |
+   | Cohere | `--cohere-key` | Models: command* |
+   | Fireworks AI | `--fireworks-key` | - |
+   | Together AI | `--togetherai-key` | - |
+   | GitHub Models | `--github-key` | - |
+   
+   **Cloud Providers (require additional config):**
+   - Azure OpenAI: `--azure-key` (requires service URL)
+   - AWS Bedrock: `--bedrock-key` (requires region and access key)
+   - Google Vertex AI: `--vertex-key` (requires project ID and region)
+   
+   **Brand Name Display (z.ai / 智谱):**
+   - If user communicates in Chinese: display as "智谱"
+   - If user communicates in English: display as "z.ai"
+
+2. **Enable auto-routing?** (recommended)
+   - If yes: `--auto-routing --auto-routing-default-model <model-name>`
+   - Auto-routing allows using `model="higress/auto"` to automatically route requests based on message content
+
+3. **Custom ports?** (optional, defaults: HTTP=8080, HTTPS=8443, Console=8001)
+
+### Step 2: Deploy Gateway
+
+**Auto-detect region for z.ai / 智谱 domain configuration:**
+
+When user selects z.ai / 智谱 provider, detect their region:
+
+```bash
+# Run region detection script (scripts/detect-region.sh relative to skill directory)
+REGION=$(bash scripts/detect-region.sh)
+# Output: "china" or "international"
+```
+
+**Based on detection result:**
+
+- If `REGION="china"`: use default domain `open.bigmodel.cn`, no extra parameter needed
+- If `REGION="international"`: automatically add `--zhipuai-domain api.z.ai` to deployment command
+
+**After deployment (for international users):**
+Notify user in English: "The z.ai endpoint domain has been set to api.z.ai. If you want to change it, let me know and I can update the configuration."
+
+```bash
+# Create installation directory
+mkdir -p higress-install
+cd higress-install
+
+# Download script (if not exists)
+curl -fsSL https://higress.ai/ai-gateway/install.sh -o get-ai-gateway.sh
+chmod +x get-ai-gateway.sh
+
+# Deploy with user's configuration
+# For z.ai / 智谱: always include --zhipuai-code-plan-mode
+# For non-China users: include --zhipuai-domain api.z.ai
+./get-ai-gateway.sh start --non-interactive \
+  --<provider>-key <api-key> \
+  [--auto-routing --auto-routing-default-model <model>]
+```
+
+**z.ai / 智谱 Options:**
+| Option | Description |
+|--------|-------------|
+| `--zhipuai-code-plan-mode` | Enable Code Plan mode (enabled by default) |
+| `--zhipuai-domain <domain>` | Custom domain, default: `open.bigmodel.cn` (China), `api.z.ai` (international) |
+
+**Example (China user):**
+```bash
+./get-ai-gateway.sh start --non-interactive \
+  --zhipuai-key sk-xxx \
+  --zhipuai-code-plan-mode \
+  --auto-routing \
+  --auto-routing-default-model glm-5
+```
+
+**Example (International user):**
+```bash
+./get-ai-gateway.sh start --non-interactive \
+  --zhipuai-key sk-xxx \
+  --zhipuai-domain api.z.ai \
+  --zhipuai-code-plan-mode \
+  --auto-routing \
+  --auto-routing-default-model glm-5
+```
+
+### Step 3: Install OpenClaw Plugin
+
+Install the Higress provider plugin for OpenClaw:
+
+```bash
+# Copy plugin files (PLUGIN_SRC is relative to skill directory: scripts/plugin)
+PLUGIN_SRC="scripts/plugin"
+PLUGIN_DEST="$HOME/.openclaw/extensions/higress"
+
+mkdir -p "$PLUGIN_DEST"
+cp -r "$PLUGIN_SRC"/* "$PLUGIN_DEST/"
+```
+
+**Tell user to run the following commands manually in their terminal (interactive commands, cannot be executed by AI agent):**
+
+```bash
+# Step 1: Enable the plugin
+openclaw plugins enable higress
+
+# Step 2: Configure provider (interactive - will prompt for Gateway URL, API Key, models, etc.)
+openclaw models auth login --provider higress --set-default
+
+# Step 3: Restart OpenClaw gateway to apply changes
+openclaw gateway restart
+```
+
+The `openclaw models auth login` command will interactively prompt for:
+1. Gateway URL (default: `http://localhost:8080`)
+2. Console URL (default: `http://localhost:8001`)
+3. API Key (optional for local deployments)
+4. Model list (auto-detected or manually specified)
+5. Auto-routing default model (if using `higress/auto`)
+
+After configuration and restart, Higress models are available in OpenClaw with `higress/` prefix (e.g., `higress/glm-5`, `higress/auto`).
+
+**Future Configuration Updates (No Restart Needed)**
+
+After the initial setup, you can manage your configuration through conversation with OpenClaw:
+
+- **Add New Providers**: Add new LLM providers (e.g., DeepSeek, OpenAI, Claude) and their models dynamically.
+- **Update API Keys**: Update existing provider API keys without service restart.
+- **Configure Auto-routing**: If you've set up multiple models, ask OpenClaw to configure auto-routing rules. Requests will be intelligently routed based on your message content, using the most suitable model automatically.
+
+All configuration changes are hot-loaded through Higress — no `openclaw gateway restart` required. Iterate on your model provider setup dynamically without service interruption!
+
+## Post-Deployment Management
+
+### Add/Update API Keys (Hot-reload)
+
+```bash
+./get-ai-gateway.sh config add --provider <provider> --key <api-key>
+./get-ai-gateway.sh config list
+./get-ai-gateway.sh config remove --provider <provider>
+```
+
+Provider aliases: `dashscope`/`qwen`, `moonshot`/`kimi`, `zhipuai`/`zhipu`
+
+### Update z.ai Domain (Hot-reload)
+
+If user wants to change the z.ai domain after deployment:
+
+```bash
+# Update domain configuration
+./get-ai-gateway.sh config add --provider zhipuai --extra-config "zhipuDomain=api.z.ai"
+# Or revert to China endpoint
+./get-ai-gateway.sh config add --provider zhipuai --extra-config "zhipuDomain=open.bigmodel.cn"
+```
+
+### Add Routing Rules (for auto-routing)
+
+```bash
+# Add rule: route to specific model when message starts with trigger
+./get-ai-gateway.sh route add --model <model> --trigger "keyword1|keyword2"
+
+# Examples
+./get-ai-gateway.sh route add --model glm-4-flash --trigger "quick|fast"
+./get-ai-gateway.sh route add --model claude-opus-4 --trigger "think|complex"
+./get-ai-gateway.sh route add --model deepseek-coder --trigger "code|debug"
+
+# List/remove rules
+./get-ai-gateway.sh route list
+./get-ai-gateway.sh route remove --rule-id 0
+```
+
+### Stop/Delete Gateway
+
+```bash
+./get-ai-gateway.sh stop
+./get-ai-gateway.sh delete
+```
+
+## Endpoints
+
+| Endpoint | URL |
+|----------|-----|
+| Chat Completions | http://localhost:8080/v1/chat/completions |
+| Console | http://localhost:8001 |
+| Logs | `./higress-install/logs/access.log` |
+
+## Testing
+
+```bash
+# Test with specific model
+curl 'http://localhost:8080/v1/chat/completions' \
+  -H 'Content-Type: application/json' \
+  -d '{"model": "<model-name>", "messages": [{"role": "user", "content": "Hello"}]}'
+
+# Test auto-routing (if enabled)
+curl 'http://localhost:8080/v1/chat/completions' \
+  -H 'Content-Type: application/json' \
+  -d '{"model": "higress/auto", "messages": [{"role": "user", "content": "What is AI?"}]}'
+```
+
+## Troubleshooting
+
+| Issue | Solution |
+|-------|----------|
+| Container fails to start | Check `docker logs higress-ai-gateway` |
+| Port already in use | Use `--http-port`, `--console-port` to change ports |
+| API key error | Run `./get-ai-gateway.sh config list` to verify keys |
+| Auto-routing not working | Ensure `--auto-routing` was set during deployment |
+| Slow image download | Script auto-selects nearest registry based on timezone |
+
+## Important Notes
+
+1. **Claude Code Mode**: Requires OAuth token from `claude setup-token` command, not a regular API key
+2. **z.ai Code Plan Mode**: Enabled by default, uses `/api/coding/paas/v4/chat/completions` endpoint, optimized for coding tasks
+3. **z.ai Domain Selection**:
+   - China users: `open.bigmodel.cn` (default)
+   - International users: `api.z.ai` (auto-detected based on timezone)
+   - Users can update domain anytime after deployment
+4. **Auto-routing**: Must be enabled during initial deployment (`--auto-routing`); routing rules can be added later
+5. **OpenClaw Integration**: The `openclaw models auth login` and `openclaw gateway restart` commands are **interactive** and must be run by the user manually in their terminal
+6. **Hot-reload**: API key changes take effect immediately; no container restart needed
--- a/.claude/skills/higress-openclaw-integration/references/TROUBLESHOOTING.md
+++ b/.claude/skills/higress-openclaw-integration/references/TROUBLESHOOTING.md
@@ -0,0 +1,325 @@
+# Higress AI Gateway - Troubleshooting
+
+Common issues and solutions for Higress AI Gateway deployment and operation.
+
+## Container Issues
+
+### Container fails to start
+
+**Check Docker is running:**
+```bash
+docker info
+```
+
+**Check port availability:**
+```bash
+netstat -tlnp | grep 8080
+```
+
+**View container logs:**
+```bash
+docker logs higress-ai-gateway
+```
+
+### Gateway not responding
+
+**Check container status:**
+```bash
+docker ps -a
+```
+
+**Verify port mapping:**
+```bash
+docker port higress-ai-gateway
+```
+
+**Test locally:**
+```bash
+curl http://localhost:8080/v1/models
+```
+
+## File System Issues
+
+### "too many open files" error from API server
+
+**Symptom:**
+```
+panic: unable to create REST storage for a resource due to too many open files, will die
+```
+or
+```
+command failed err="failed to create shared file watcher: too many open files"
+```
+
+**Root Cause:**
+
+The system's `fs.inotify.max_user_instances` limit is too low. This commonly occurs on systems with many Docker containers, as each container can consume inotify instances.
+
+**Check current limit:**
+```bash
+cat /proc/sys/fs/inotify/max_user_instances
+```
+
+Default is often 128, which is insufficient when running multiple containers.
+
+**Solution:**
+
+Increase the inotify instance limit to 8192:
+
+```bash
+# Temporarily (until next reboot)
+sudo sysctl -w fs.inotify.max_user_instances=8192
+
+# Permanently (survives reboots)
+echo "fs.inotify.max_user_instances = 8192" | sudo tee -a /etc/sysctl.conf
+sudo sysctl -p
+```
+
+**Verify:**
+```bash
+cat /proc/sys/fs/inotify/max_user_instances
+# Should output: 8192
+```
+
+**Restart the container:**
+```bash
+docker restart higress-ai-gateway
+```
+
+**Additional inotify tunables** (if still experiencing issues):
+```bash
+# Increase max watches per user
+sudo sysctl -w fs.inotify.max_user_watches=524288
+
+# Increase max queued events
+sudo sysctl -w fs.inotify.max_queued_events=32768
+```
+
+To make these permanent as well:
+```bash
+echo "fs.inotify.max_user_watches = 524288" | sudo tee -a /etc/sysctl.conf
+echo "fs.inotify.max_queued_events = 32768" | sudo tee -a /etc/sysctl.conf
+sudo sysctl -p
+```
+
+## Plugin Issues
+
+### Plugin not recognized
+
+**Verify plugin installation:**
+
+For Clawdbot:
+```bash
+ls -la ~/.clawdbot/extensions/higress-ai-gateway
+```
+
+For OpenClaw:
+```bash
+ls -la ~/.openclaw/extensions/higress-ai-gateway
+```
+
+**Check package.json:**
+
+Ensure `package.json` contains the correct extension field:
+- Clawdbot: `"clawdbot.extensions"`
+- OpenClaw: `"openclaw.extensions"`
+
+**Restart the runtime:**
+```bash
+# Restart Clawdbot gateway
+clawdbot gateway restart
+
+# Or OpenClaw gateway
+openclaw gateway restart
+```
+
+## Routing Issues
+
+### Auto-routing not working
+
+**Confirm model is in list:**
+```bash
+# Check if higress/auto is available
+clawdbot models list | grep "higress/auto"
+```
+
+**Check routing rules exist:**
+```bash
+./get-ai-gateway.sh route list
+```
+
+**Verify default model is configured:**
+```bash
+./get-ai-gateway.sh config list
+```
+
+**Check gateway logs:**
+```bash
+docker logs higress-ai-gateway | grep -i routing
+```
+
+**View access logs:**
+```bash
+tail -f ./higress/logs/access.log
+```
+
+## Configuration Issues
+
+### Timezone detection fails
+
+**Manually check timezone:**
+```bash
+timedatectl show --property=Timezone --value
+```
+
+**Or check timezone file:**
+```bash
+cat /etc/timezone
+```
+
+**Fallback behavior:**
+- If detection fails, defaults to Hangzhou mirror
+- Manual override: Set `IMAGE_REPO` environment variable
+
+**Manual repository selection:**
+```bash
+# For China/Asia
+IMAGE_REPO="higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/all-in-one"
+
+# For Southeast Asia
+IMAGE_REPO="higress-registry.ap-southeast-7.cr.aliyuncs.com/higress/all-in-one"
+
+# For North America
+IMAGE_REPO="higress-registry.us-west-1.cr.aliyuncs.com/higress/all-in-one"
+
+# Use in deployment
+IMAGE_REPO="$IMAGE_REPO" ./get-ai-gateway.sh start --non-interactive ...
+```
+
+## Performance Issues
+
+### Slow image downloads
+
+**Check selected repository:**
+```bash
+echo $IMAGE_REPO
+```
+
+**Manually select closest mirror:**
+
+See [Configuration Issues → Timezone detection fails](#timezone-detection-fails) for manual repository selection.
+
+### High memory usage
+
+**Check container stats:**
+```bash
+docker stats higress-ai-gateway
+```
+
+**View resource limits:**
+```bash
+docker inspect higress-ai-gateway | grep -A 10 "HostConfig"
+```
+
+**Set memory limits:**
+```bash
+# Stop container
+./get-ai-gateway.sh stop
+
+# Manually restart with limits
+docker run -d \
+  --name higress-ai-gateway \
+  --memory="4g" \
+  --memory-swap="4g" \
+  ...
+```
+
+## Log Analysis
+
+### Access logs location
+
+```bash
+# Default location
+./higress/logs/access.log
+
+# View real-time logs
+tail -f ./higress/logs/access.log
+```
+
+### Container logs
+
+```bash
+# View all logs
+docker logs higress-ai-gateway
+
+# Follow logs
+docker logs -f higress-ai-gateway
+
+# Last 100 lines
+docker logs --tail 100 higress-ai-gateway
+
+# With timestamps
+docker logs -t higress-ai-gateway
+```
+
+## Network Issues
+
+### Cannot connect to gateway
+
+**Verify container is running:**
+```bash
+docker ps | grep higress-ai-gateway
+```
+
+**Check port bindings:**
+```bash
+docker port higress-ai-gateway
+```
+
+**Test from inside container:**
+```bash
+docker exec higress-ai-gateway curl localhost:8080/v1/models
+```
+
+**Check firewall rules:**
+```bash
+# Check if port is accessible
+sudo ufw status | grep 8080
+
+# Allow port (if needed)
+sudo ufw allow 8080/tcp
+```
+
+### DNS resolution issues
+
+**Test from container:**
+```bash
+docker exec higress-ai-gateway ping -c 3 api.openai.com
+```
+
+**Check DNS settings:**
+```bash
+docker exec higress-ai-gateway cat /etc/resolv.conf
+```
+
+## Getting Help
+
+If you're still experiencing issues:
+
+1. **Collect logs:**
+   ```bash
+   docker logs higress-ai-gateway > gateway.log 2>&1
+   cat ./higress/logs/access.log > access.log
+   ```
+
+2. **Check system info:**
+   ```bash
+   docker version
+   docker info
+   uname -a
+   cat /proc/sys/fs/inotify/max_user_instances
+   ```
+
+3. **Report issue:**
+   - Repository: https://github.com/higress-group/higress-standalone
+   - Include: logs, system info, deployment command used
--- a/.claude/skills/higress-openclaw-integration/scripts/detect-region.sh
+++ b/.claude/skills/higress-openclaw-integration/scripts/detect-region.sh
@@ -0,0 +1,15 @@
+#!/bin/bash
+# Detect if user is in China region based on timezone
+# Returns: "china" or "international"
+
+TIMEZONE=$(cat /etc/timezone 2>/dev/null || timedatectl show --property=Timezone --value 2>/dev/null || echo "Unknown")
+
+# Check if timezone indicates China region (including Hong Kong)
+if [[ "$TIMEZONE" == "Asia/Shanghai" ]] || \
+   [[ "$TIMEZONE" == "Asia/Hong_Kong" ]] || \
+   [[ "$TIMEZONE" == *"China"* ]] || \
+   [[ "$TIMEZONE" == *"Beijing"* ]]; then
+  echo "china"
+else
+  echo "international"
+fi
--- a/.claude/skills/higress-openclaw-integration/scripts/plugin/README.md
+++ b/.claude/skills/higress-openclaw-integration/scripts/plugin/README.md
@@ -0,0 +1,61 @@
+# Higress AI Gateway Plugin
+
+OpenClaw model provider plugin for Higress AI Gateway with auto-routing support.
+
+## What is this?
+
+This is a TypeScript-based provider plugin that enables OpenClaw to use Higress AI Gateway as a model provider. It provides:
+
+- **Auto-routing support**: Use `higress/auto` to intelligently route requests based on message content
+- **Dynamic model discovery**: Auto-detect available models from Higress Console
+- **Smart URL handling**: Automatic URL normalization and validation
+- **Flexible authentication**: Support for both local and remote gateway deployments
+
+## Files
+
+- **index.ts**: Main plugin implementation
+- **package.json**: NPM package metadata and OpenClaw extension declaration
+- **openclaw.plugin.json**: Plugin manifest for OpenClaw
+
+## Installation
+
+This plugin is automatically installed when you use the `higress-openclaw-integration` skill. See parent SKILL.md for complete installation instructions.
+
+### Manual Installation
+
+If you need to install manually:
+
+```bash
+# Copy plugin files
+mkdir -p "$HOME/.openclaw/extensions/higress"
+cp -r ./* "$HOME/.openclaw/extensions/higress/"
+
+# Configure provider
+openclaw plugins enable higress
+openclaw models auth login --provider higress
+```
+
+## Usage
+
+After installation, configure Higress as a model provider:
+
+```bash
+openclaw models auth login --provider higress
+```
+
+The plugin will prompt for:
+1. Gateway URL (default: http://localhost:8080)
+2. Console URL (default: http://localhost:8001)
+3. API Key (optional for local deployments)
+4. Model list (auto-detected or manually specified)
+5. Auto-routing default model (if using higress/auto)
+
+
+## Related Resources
+
+- **Parent Skill**: [higress-openclaw-integration](../SKILL.md)
+- **Auto-routing Configuration**: [higress-auto-router](../../higress-auto-router/SKILL.md)
+
+## License
+
+Apache-2.0
--- a/.claude/skills/higress-openclaw-integration/scripts/plugin/index.ts
+++ b/.claude/skills/higress-openclaw-integration/scripts/plugin/index.ts
@@ -0,0 +1,302 @@
+import { emptyPluginConfigSchema } from "openclaw/plugin-sdk";
+
+const DEFAULT_GATEWAY_URL = "http://localhost:8080";
+const DEFAULT_CONSOLE_URL = "http://localhost:8001";
+
+// Model-specific context window and max tokens configurations
+const MODEL_CONFIG: Record<string, { contextWindow: number; maxTokens: number }> = {
+  "gpt-5.3-codex": { contextWindow: 400_000, maxTokens: 128_000 },
+  "gpt-5-mini": { contextWindow: 400_000, maxTokens: 128_000 },
+  "gpt-5-nano": { contextWindow: 400_000, maxTokens: 128_000 },
+  "claude-opus-4-6": { contextWindow: 1_000_000, maxTokens: 128_000 },
+  "claude-sonnet-4-6": { contextWindow: 1_000_000, maxTokens: 64_000 },
+  "claude-haiku-4-5": { contextWindow: 200_000, maxTokens: 64_000 },
+  "qwen3.5-plus": { contextWindow: 960_000, maxTokens: 64_000 },
+  "deepseek-chat": { contextWindow: 256_000, maxTokens: 128_000 },
+  "deepseek-reasoner": { contextWindow: 256_000, maxTokens: 128_000 },
+  "kimi-k2.5": { contextWindow: 256_000, maxTokens: 128_000 },
+  "glm-5": { contextWindow: 200_000, maxTokens: 128_000 },
+  "MiniMax-M2.5": { contextWindow: 200_000, maxTokens: 128_000 },
+};
+
+// Default values for unknown models
+const DEFAULT_CONTEXT_WINDOW = 200_000;
+const DEFAULT_MAX_TOKENS = 128_000;
+
+// Common models that Higress AI Gateway typically supports
+const DEFAULT_MODEL_IDS = [
+  // Auto-routing special model
+  "higress/auto",
+  // Commonly models
+  "kimi-k2.5",
+  "glm-5",
+  "MiniMax-M2.5",
+  "qwen3.5-plus",
+  // Anthropic models
+  "claude-opus-4-6",
+  "claude-sonnet-4-6",
+  "claude-haiku-4-5",
+  // OpenAI models
+  "gpt-5.3-codex",
+  "gpt-5-mini",
+  "gpt-5-nano",
+  // DeepSeek models
+  "deepseek-chat",
+  "deepseek-reasoner",  
+] as const;
+
+function normalizeBaseUrl(value: string): string {
+  const trimmed = value.trim();
+  if (!trimmed) return DEFAULT_GATEWAY_URL;
+  let normalized = trimmed;
+  while (normalized.endsWith("/")) normalized = normalized.slice(0, -1);
+  if (!normalized.endsWith("/v1")) normalized = `${normalized}/v1`;
+  return normalized;
+}
+
+function validateUrl(value: string): string | undefined {
+  const normalized = normalizeBaseUrl(value);
+  try {
+    new URL(normalized);
+  } catch {
+    return "Enter a valid URL";
+  }
+  return undefined;
+}
+
+function parseModelIds(input: string): string[] {
+  const parsed = input
+    .split(/[\n,]/)
+    .map((model) => model.trim())
+    .filter(Boolean);
+  return Array.from(new Set(parsed));
+}
+
+function buildModelDefinition(modelId: string) {
+  const isAutoModel = modelId === "higress/auto";
+  const config = MODEL_CONFIG[modelId] || { contextWindow: DEFAULT_CONTEXT_WINDOW, maxTokens: DEFAULT_MAX_TOKENS };
+
+  return {
+    id: modelId,
+    name: isAutoModel ? "Higress Auto Router" : modelId,
+    api: "openai-completions",
+    reasoning: true,
+    input: ["text", "image"],
+    cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+    contextWindow: config.contextWindow,
+    maxTokens: config.maxTokens,
+  };
+}
+
+async function testGatewayConnection(gatewayUrl: string): Promise<boolean> {
+  try {
+    // gatewayUrl already ends with /v1 from normalizeBaseUrl()
+    // Use chat/completions endpoint with empty body to test connection
+    // Higress doesn't support /models endpoint
+    const response = await fetch(`${gatewayUrl}/chat/completions`, {
+      method: "POST",
+      headers: { "Content-Type": "application/json" },
+      body: JSON.stringify({}),
+      signal: AbortSignal.timeout(5000),
+    });
+    // Any response (including 400/401/422) means gateway is reachable
+    return true;
+  } catch {
+    return false;
+  }
+}
+
+async function fetchAvailableModels(consoleUrl: string): Promise<string[]> {
+  try {
+    // Try to get models from Higress Console API
+    const response = await fetch(`${consoleUrl}/v1/ai/routes`, {
+      method: "GET",
+      headers: { "Content-Type": "application/json" },
+      signal: AbortSignal.timeout(5000),
+    });
+    if (response.ok) {
+      const data = (await response.json()) as { data?: { model?: string }[] };
+      if (data.data && Array.isArray(data.data)) {
+        return data.data
+          .map((route: { model?: string }) => route.model)
+          .filter((m): m is string => typeof m === "string");
+      }
+    }
+  } catch {
+    // Ignore errors, use defaults
+  }
+  return [];
+}
+
+const higressPlugin = {
+  id: "higress",
+  name: "Higress AI Gateway",
+  description: "Model provider plugin for Higress AI Gateway with auto-routing support",
+  configSchema: emptyPluginConfigSchema(),
+  register(api) {
+    api.registerProvider({
+      id: "higress",
+      label: "Higress AI Gateway",
+      docsPath: "/providers/models",
+      aliases: ["higress-gateway", "higress-ai"],
+      auth: [
+        {
+          id: "api-key",
+          label: "API Key",
+          hint: "Configure Higress AI Gateway endpoint with optional API key",
+          kind: "custom",
+          run: async (ctx) => {
+            // Step 1: Get Gateway URL
+            const gatewayUrlInput = await ctx.prompter.text({
+              message: "Higress AI Gateway URL",
+              initialValue: DEFAULT_GATEWAY_URL,
+              validate: validateUrl,
+            });
+            const gatewayUrl = normalizeBaseUrl(gatewayUrlInput);
+
+            // Step 2: Get Console URL (for auto-router configuration)
+            const consoleUrlInput = await ctx.prompter.text({
+              message: "Higress Console URL (for auto-router config)",
+              initialValue: DEFAULT_CONSOLE_URL,
+              validate: validateUrl,
+            });
+            const consoleUrl = normalizeBaseUrl(consoleUrlInput);
+
+            // Step 3: Test connection (create a new spinner)
+            const spin = ctx.prompter.progress("Testing gateway connection…");
+            const isConnected = await testGatewayConnection(gatewayUrl);
+            if (!isConnected) {
+              spin.stop("Gateway connection failed");
+              await ctx.prompter.note(
+                [
+                  "Could not connect to Higress AI Gateway.",
+                  "Make sure the gateway is running and the URL is correct.",
+                ].join("\n"),
+                "Connection Warning",
+              );
+            } else {
+              spin.stop("Gateway connected");
+            }
+
+            // Step 4: Get API Key (optional for local gateway)
+            const apiKeyInput = await ctx.prompter.text({
+              message: "API Key (leave empty if not required)",
+              initialValue: "",
+            }) || '';
+            const apiKey = apiKeyInput.trim() || "higress-local";
+
+            // Step 5: Fetch available models (create a new spinner)
+            const spin2 = ctx.prompter.progress("Fetching available models…");
+            const fetchedModels = await fetchAvailableModels(consoleUrl);
+            const defaultModels = fetchedModels.length > 0
+              ? ["higress/auto", ...fetchedModels]
+              : DEFAULT_MODEL_IDS;
+            spin2.stop();
+
+            // Step 6: Let user customize model list
+            const modelInput = await ctx.prompter.text({
+              message: "Model IDs (comma-separated, higress/auto enables auto-routing)",
+              initialValue: defaultModels.slice(0, 10).join(", "),
+              validate: (value) =>
+                parseModelIds(value).length > 0 ? undefined : "Enter at least one model id",
+            });
+
+            const modelIds = parseModelIds(modelInput);
+            const hasAutoModel = modelIds.includes("higress/auto");
+
+            // Always add higress/ provider prefix to create model reference
+            const defaultModelId = hasAutoModel
+              ? "higress/auto"
+              : (modelIds[0] ?? "glm-5");
+            const defaultModelRef = `higress/${defaultModelId}`;
+
+            // Step 7: Configure default model for auto-routing
+            let autoRoutingDefaultModel = "glm-5";
+            if (hasAutoModel) {
+              const autoRoutingModelInput = await ctx.prompter.text({
+                message: "Default model for auto-routing (when no rule matches)",
+                initialValue: "glm-5",
+              });
+              autoRoutingDefaultModel = autoRoutingModelInput.trim(); // FIX: Add trim() here
+            }
+
+            return {
+              profiles: [
+                {
+                  profileId: `higress:${apiKey === "higress-local" ? "local" : "default"}`,
+                  credential: {
+                    type: "token",
+                    provider: "higress",
+                    token: apiKey,
+                  },
+                },
+              ],
+              configPatch: {
+                models: {
+                  providers: {
+                    higress: {
+                      // gatewayUrl already ends with /v1 from normalizeBaseUrl()
+                      baseUrl: gatewayUrl,
+                      apiKey: apiKey,
+                      api: "openai-completions",
+                      authHeader: apiKey !== "higress-local",
+                      models: modelIds.map((modelId) => buildModelDefinition(modelId)),
+                    },
+                  },
+                },
+                agents: {
+                  defaults: {
+                    models: Object.fromEntries(
+                      modelIds.map((modelId) => {
+                        // Always add higress/ provider prefix to create model reference
+                        const modelRef = `higress/${modelId}`;
+                        return [modelRef, {}];
+                      }),
+                    ),
+                  },
+                },
+                plugins: {
+                  entries: {
+                    "higress": {
+                      enabled: true,
+                      config: {
+                        gatewayUrl,
+                        consoleUrl,
+                        autoRoutingDefaultModel,
+                      },
+                    },
+                  },
+                },
+              },
+              defaultModel: defaultModelRef,
+              notes: [
+                "Higress AI Gateway is now configured as a model provider.",
+                hasAutoModel
+                  ? `Auto-routing enabled: use model "higress/auto" to route based on message content.`
+                  : "Add 'higress/auto' to models to enable auto-routing.",
+                // gatewayUrl already ends with /v1 from normalizeBaseUrl()
+                `Gateway endpoint: ${gatewayUrl}/chat/completions`,
+                `Console: ${consoleUrl}`,
+                "",
+                "💡 Future Configuration Updates (No Restart Needed):",
+                "   • Add New Providers: Add LLM providers (DeepSeek, OpenAI, Claude, etc.) dynamically.",
+                "   • Update API Keys: Update existing provider keys without restart.",
+                "   • Configure Auto-Routing: Ask OpenClaw to set up intelligent routing rules.",
+                "   All changes hot-load via Higress — no gateway restart required!",
+                "",
+                "🎯 Recommended Skills (install via OpenClaw conversation):",
+                "",
+                "1. Auto-Routing Skill:",
+                "   Configure automatic model routing based on message content",
+                "   https://github.com/alibaba/higress/tree/main/.claude/skills/higress-auto-router",
+                '   Say: "Install higress-auto-router skill"',
+              ],
+            };
+          },
+        },
+      ],
+    });
+  },
+};
+
+export default higressPlugin;
--- a/.claude/skills/higress-openclaw-integration/scripts/plugin/openclaw.plugin.json
+++ b/.claude/skills/higress-openclaw-integration/scripts/plugin/openclaw.plugin.json
@@ -0,0 +1,10 @@
+{
+  "id": "higress",
+  "name": "Higress AI Gateway",
+  "description": "Model provider plugin for Higress AI Gateway with auto-routing support",
+  "providers": ["higress"],
+  "configSchema": {
+    "type": "object",
+    "additionalProperties": true
+  }
+}
--- a/.claude/skills/higress-openclaw-integration/scripts/plugin/package.json
+++ b/.claude/skills/higress-openclaw-integration/scripts/plugin/package.json
@@ -0,0 +1,22 @@
+{
+  "name": "@higress/higress",
+  "version": "1.0.0",
+  "description": "Higress AI Gateway model provider plugin for OpenClaw with auto-routing support",
+  "main": "index.ts",
+  "openclaw": {
+    "extensions": ["./index.ts"]
+  },
+  "keywords": [
+    "openclaw",
+    "higress",
+    "ai-gateway",
+    "model-router",
+    "auto-routing"
+  ],
+  "author": "Higress Team",
+  "license": "Apache-2.0",
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/alibaba/higress"
+  }
+}
--- a/.claude/skills/higress-wasm-go-plugin/SKILL.md
+++ b/.claude/skills/higress-wasm-go-plugin/SKILL.md
@@ -0,0 +1,251 @@
+---
+name: higress-wasm-go-plugin
+description: Develop Higress WASM plugins using Go 1.24+. Use when creating, modifying, or debugging Higress gateway plugins for HTTP request/response processing, external service calls, Redis integration, or custom gateway logic.
+---
+
+# Higress WASM Go Plugin Development
+
+Develop Higress gateway WASM plugins using Go language with the `wasm-go` SDK.
+
+## Quick Start
+
+### Project Setup
+
+```bash
+# Create project directory
+mkdir my-plugin && cd my-plugin
+
+# Initialize Go module
+go mod init my-plugin
+
+# Set proxy (China)
+go env -w GOPROXY=https://proxy.golang.com.cn,direct
+
+# Download dependencies
+go get github.com/higress-group/proxy-wasm-go-sdk@go-1.24
+go get github.com/higress-group/wasm-go@main
+go get github.com/tidwall/gjson
+```
+
+### Minimal Plugin Template
+
+```go
+package main
+
+import (
+    "github.com/higress-group/wasm-go/pkg/wrapper"
+    "github.com/higress-group/proxy-wasm-go-sdk/proxywasm"
+    "github.com/higress-group/proxy-wasm-go-sdk/proxywasm/types"
+    "github.com/tidwall/gjson"
+)
+
+func main() {}
+
+func init() {
+    wrapper.SetCtx(
+        "my-plugin",
+        wrapper.ParseConfig(parseConfig),
+        wrapper.ProcessRequestHeaders(onHttpRequestHeaders),
+    )
+}
+
+type MyConfig struct {
+    Enabled bool
+}
+
+func parseConfig(json gjson.Result, config *MyConfig) error {
+    config.Enabled = json.Get("enabled").Bool()
+    return nil
+}
+
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    if config.Enabled {
+        proxywasm.AddHttpRequestHeader("x-my-header", "hello")
+    }
+    return types.HeaderContinue
+}
+```
+
+### Compile
+
+```bash
+go mod tidy
+GOOS=wasip1 GOARCH=wasm go build -buildmode=c-shared -o main.wasm ./
+```
+
+## Core Concepts
+
+### Plugin Lifecycle
+
+1. **init()** - Register plugin with `wrapper.SetCtx()`
+2. **parseConfig** - Parse YAML config (auto-converted to JSON)
+3. **HTTP processing phases** - Handle requests/responses
+
+### HTTP Processing Phases
+
+| Phase | Trigger | Handler |
+|-------|---------|---------|
+| Request Headers | Gateway receives client request headers | `ProcessRequestHeaders` |
+| Request Body | Gateway receives client request body | `ProcessRequestBody` |
+| Response Headers | Gateway receives backend response headers | `ProcessResponseHeaders` |
+| Response Body | Gateway receives backend response body | `ProcessResponseBody` |
+| Stream Done | HTTP stream completes | `ProcessStreamDone` |
+
+### Action Return Values
+
+| Action | Behavior |
+|--------|----------|
+| `types.HeaderContinue` | Continue to next filter |
+| `types.HeaderStopIteration` | Stop header processing, wait for body |
+| `types.HeaderStopAllIterationAndWatermark` | Stop all processing, buffer data, call `proxywasm.ResumeHttpRequest/Response()` to resume |
+
+## API Reference
+
+### HttpContext Methods
+
+```go
+// Request info (cached, safe to call in any phase)
+ctx.Scheme()   // :scheme
+ctx.Host()     // :authority
+ctx.Path()     // :path
+ctx.Method()   // :method
+
+// Body handling
+ctx.HasRequestBody()        // Check if request has body
+ctx.HasResponseBody()       // Check if response has body
+ctx.DontReadRequestBody()   // Skip reading request body
+ctx.DontReadResponseBody()  // Skip reading response body
+ctx.BufferRequestBody()     // Buffer instead of stream
+ctx.BufferResponseBody()    // Buffer instead of stream
+
+// Content detection
+ctx.IsWebsocket()           // Check WebSocket upgrade
+ctx.IsBinaryRequestBody()   // Check binary content
+ctx.IsBinaryResponseBody()  // Check binary content
+
+// Context storage
+ctx.SetContext(key, value)
+ctx.GetContext(key)
+ctx.GetStringContext(key, defaultValue)
+ctx.GetBoolContext(key, defaultValue)
+
+// Custom logging
+ctx.SetUserAttribute(key, value)
+ctx.WriteUserAttributeToLog()
+```
+
+### Header/Body Operations (proxywasm)
+
+```go
+// Request headers
+proxywasm.GetHttpRequestHeader(name)
+proxywasm.AddHttpRequestHeader(name, value)
+proxywasm.ReplaceHttpRequestHeader(name, value)
+proxywasm.RemoveHttpRequestHeader(name)
+proxywasm.GetHttpRequestHeaders()
+proxywasm.ReplaceHttpRequestHeaders(headers)
+
+// Response headers
+proxywasm.GetHttpResponseHeader(name)
+proxywasm.AddHttpResponseHeader(name, value)
+proxywasm.ReplaceHttpResponseHeader(name, value)
+proxywasm.RemoveHttpResponseHeader(name)
+proxywasm.GetHttpResponseHeaders()
+proxywasm.ReplaceHttpResponseHeaders(headers)
+
+// Request body (only in body phase)
+proxywasm.GetHttpRequestBody(start, size)
+proxywasm.ReplaceHttpRequestBody(body)
+proxywasm.AppendHttpRequestBody(data)
+proxywasm.PrependHttpRequestBody(data)
+
+// Response body (only in body phase)
+proxywasm.GetHttpResponseBody(start, size)
+proxywasm.ReplaceHttpResponseBody(body)
+proxywasm.AppendHttpResponseBody(data)
+proxywasm.PrependHttpResponseBody(data)
+
+// Direct response
+proxywasm.SendHttpResponse(statusCode, headers, body, grpcStatus)
+
+// Flow control
+proxywasm.ResumeHttpRequest()   // Resume paused request
+proxywasm.ResumeHttpResponse()  // Resume paused response
+```
+
+## Common Patterns
+
+### External HTTP Call
+
+See [references/http-client.md](references/http-client.md) for complete HTTP client patterns.
+
+```go
+func parseConfig(json gjson.Result, config *MyConfig) error {
+    serviceName := json.Get("serviceName").String()
+    servicePort := json.Get("servicePort").Int()
+    config.client = wrapper.NewClusterClient(wrapper.FQDNCluster{
+        FQDN: serviceName,
+        Port: servicePort,
+    })
+    return nil
+}
+
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    err := config.client.Get("/api/check", nil, func(statusCode int, headers http.Header, body []byte) {
+        if statusCode != 200 {
+            proxywasm.SendHttpResponse(403, nil, []byte("Forbidden"), -1)
+            return
+        }
+        proxywasm.ResumeHttpRequest()
+    }, 3000) // timeout ms
+    
+    if err != nil {
+        return types.HeaderContinue // fallback on error
+    }
+    return types.HeaderStopAllIterationAndWatermark
+}
+```
+
+### Redis Integration
+
+See [references/redis-client.md](references/redis-client.md) for complete Redis patterns.
+
+```go
+func parseConfig(json gjson.Result, config *MyConfig) error {
+    config.redis = wrapper.NewRedisClusterClient(wrapper.FQDNCluster{
+        FQDN: json.Get("redisService").String(),
+        Port: json.Get("redisPort").Int(),
+    })
+    return config.redis.Init(
+        json.Get("username").String(),
+        json.Get("password").String(),
+        json.Get("timeout").Int(),
+    )
+}
+```
+
+### Multi-level Config
+
+插件配置支持在控制台不同级别设置：全局、域名级、路由级。控制面会自动处理配置的优先级和匹配逻辑，插件代码中通过 `parseConfig` 解析到的就是当前请求匹配到的配置。
+
+## Local Testing
+
+See [references/local-testing.md](references/local-testing.md) for Docker Compose setup.
+
+## Advanced Topics
+
+See [references/advanced-patterns.md](references/advanced-patterns.md) for:
+- Streaming body processing
+- Route call pattern
+- Tick functions (periodic tasks)
+- Leader election
+- Memory management
+- Custom logging
+
+## Best Practices
+
+1. **Never call Resume after SendHttpResponse** - Response auto-resumes
+2. **Check HasRequestBody() before returning HeaderStopIteration** - Avoids blocking
+3. **Use cached ctx methods** - `ctx.Path()` works in any phase, `GetHttpRequestHeader(":path")` only in header phase
+4. **Handle external call failures gracefully** - Return `HeaderContinue` on error to avoid blocking
+5. **Set appropriate timeouts** - Default HTTP call timeout is 500ms
--- a/.claude/skills/higress-wasm-go-plugin/references/advanced-patterns.md
+++ b/.claude/skills/higress-wasm-go-plugin/references/advanced-patterns.md
@@ -0,0 +1,253 @@
+# Advanced Patterns
+
+## Streaming Body Processing
+
+Process body chunks as they arrive without buffering:
+
+```go
+func init() {
+    wrapper.SetCtx(
+        "streaming-plugin",
+        wrapper.ParseConfig(parseConfig),
+        wrapper.ProcessStreamingRequestBody(onStreamingRequestBody),
+        wrapper.ProcessStreamingResponseBody(onStreamingResponseBody),
+    )
+}
+
+func onStreamingRequestBody(ctx wrapper.HttpContext, config MyConfig, chunk []byte, isLastChunk bool) []byte {
+    // Modify chunk and return
+    modified := bytes.ReplaceAll(chunk, []byte("old"), []byte("new"))
+    return modified
+}
+
+func onStreamingResponseBody(ctx wrapper.HttpContext, config MyConfig, chunk []byte, isLastChunk bool) []byte {
+    // Can call external services with NeedPauseStreamingResponse()
+    return chunk
+}
+```
+
+## Buffered Body Processing
+
+Buffer entire body before processing:
+
+```go
+func init() {
+    wrapper.SetCtx(
+        "buffered-plugin",
+        wrapper.ParseConfig(parseConfig),
+        wrapper.ProcessRequestBody(onRequestBody),
+        wrapper.ProcessResponseBody(onResponseBody),
+    )
+}
+
+func onRequestBody(ctx wrapper.HttpContext, config MyConfig, body []byte) types.Action {
+    // Full request body available
+    var data map[string]interface{}
+    json.Unmarshal(body, &data)
+    
+    // Modify and replace
+    data["injected"] = "value"
+    newBody, _ := json.Marshal(data)
+    proxywasm.ReplaceHttpRequestBody(newBody)
+    
+    return types.ActionContinue
+}
+```
+
+## Route Call Pattern
+
+Call the current route's upstream with modified request:
+
+```go
+func onRequestBody(ctx wrapper.HttpContext, config MyConfig, body []byte) types.Action {
+    err := ctx.RouteCall("POST", "/modified-path", [][2]string{
+        {"Content-Type", "application/json"},
+        {"X-Custom", "header"},
+    }, body, func(statusCode int, headers [][2]string, body []byte) {
+        // Handle response from upstream
+        proxywasm.SendHttpResponse(statusCode, headers, body, -1)
+    })
+    
+    if err != nil {
+        proxywasm.SendHttpResponse(500, nil, []byte("Route call failed"), -1)
+    }
+    return types.ActionContinue
+}
+```
+
+## Tick Functions (Periodic Tasks)
+
+Register periodic background tasks:
+
+```go
+func parseConfig(json gjson.Result, config *MyConfig) error {
+    // Register tick functions during config parsing
+    wrapper.RegisterTickFunc(1000, func() {
+        // Executes every 1 second
+        log.Info("1s tick")
+    })
+    
+    wrapper.RegisterTickFunc(5000, func() {
+        // Executes every 5 seconds
+        log.Info("5s tick")
+    })
+    
+    return nil
+}
+```
+
+## Leader Election
+
+For tasks that should run on only one VM instance:
+
+```go
+func init() {
+    wrapper.SetCtx(
+        "leader-plugin",
+        wrapper.PrePluginStartOrReload(onPluginStart),
+        wrapper.ParseConfig(parseConfig),
+    )
+}
+
+func onPluginStart(ctx wrapper.PluginContext) error {
+    ctx.DoLeaderElection()
+    return nil
+}
+
+func parseConfig(json gjson.Result, config *MyConfig) error {
+    wrapper.RegisterTickFunc(10000, func() {
+        if ctx.IsLeader() {
+            // Only leader executes this
+            log.Info("Leader task")
+        }
+    })
+    return nil
+}
+```
+
+## Plugin Context Storage
+
+Store data across requests at plugin level:
+
+```go
+type MyConfig struct {
+    // Config fields
+}
+
+func init() {
+    wrapper.SetCtx(
+        "context-plugin",
+        wrapper.ParseConfigWithContext(parseConfigWithContext),
+        wrapper.ProcessRequestHeaders(onHttpRequestHeaders),
+    )
+}
+
+func parseConfigWithContext(ctx wrapper.PluginContext, json gjson.Result, config *MyConfig) error {
+    // Store in plugin context (survives across requests)
+    ctx.SetContext("initTime", time.Now().Unix())
+    return nil
+}
+```
+
+## Rule-Level Config Isolation
+
+Enable graceful degradation when rule config parsing fails:
+
+```go
+func init() {
+    wrapper.SetCtx(
+        "isolated-plugin",
+        wrapper.PrePluginStartOrReload(func(ctx wrapper.PluginContext) error {
+            ctx.EnableRuleLevelConfigIsolation()
+            return nil
+        }),
+        wrapper.ParseOverrideConfig(parseGlobal, parseRule),
+    )
+}
+
+func parseGlobal(json gjson.Result, config *MyConfig) error {
+    // Parse global config
+    return nil
+}
+
+func parseRule(json gjson.Result, global MyConfig, config *MyConfig) error {
+    // Parse per-rule config, inheriting from global
+    *config = global // Copy global defaults
+    // Override with rule-specific values
+    return nil
+}
+```
+
+## Memory Management
+
+Configure automatic VM rebuild to prevent memory leaks:
+
+```go
+func init() {
+    wrapper.SetCtxWithOptions(
+        "memory-managed-plugin",
+        wrapper.ParseConfig(parseConfig),
+        wrapper.WithRebuildAfterRequests(10000),           // Rebuild after 10k requests
+        wrapper.WithRebuildMaxMemBytes(100*1024*1024),     // Rebuild at 100MB
+        wrapper.WithMaxRequestsPerIoCycle(20),             // Limit concurrent requests
+    )
+}
+```
+
+## Custom Logging
+
+Add structured fields to access logs:
+
+```go
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    // Set custom attributes
+    ctx.SetUserAttribute("user_id", "12345")
+    ctx.SetUserAttribute("request_type", "api")
+    
+    return types.HeaderContinue
+}
+
+func onHttpResponseHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    // Write to access log
+    ctx.WriteUserAttributeToLog()
+    
+    // Or write to trace spans
+    ctx.WriteUserAttributeToTrace()
+    
+    return types.HeaderContinue
+}
+```
+
+## Disable Re-routing
+
+Prevent Envoy from recalculating routes after header modification:
+
+```go
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    // Call BEFORE modifying headers
+    ctx.DisableReroute()
+    
+    // Now safe to modify headers without triggering re-route
+    proxywasm.ReplaceHttpRequestHeader(":path", "/new-path")
+    
+    return types.HeaderContinue
+}
+```
+
+## Buffer Limits
+
+Set per-request buffer limits to control memory usage:
+
+```go
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    // Allow larger request bodies for this request
+    ctx.SetRequestBodyBufferLimit(10 * 1024 * 1024) // 10MB
+    return types.HeaderContinue
+}
+
+func onHttpResponseHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    // Allow larger response bodies
+    ctx.SetResponseBodyBufferLimit(50 * 1024 * 1024) // 50MB
+    return types.HeaderContinue
+}
+```
--- a/.claude/skills/higress-wasm-go-plugin/references/http-client.md
+++ b/.claude/skills/higress-wasm-go-plugin/references/http-client.md
@@ -0,0 +1,179 @@
+# HTTP Client Reference
+
+## Cluster Types
+
+### FQDNCluster (Most Common)
+
+For services registered in Higress with FQDN:
+
+```go
+wrapper.NewClusterClient(wrapper.FQDNCluster{
+    FQDN: "my-service.dns",      // Service FQDN with suffix
+    Port: 8080,
+    Host: "optional-host-header", // Optional
+})
+```
+
+Common FQDN suffixes:
+- `.dns` - DNS service
+- `.static` - Static IP service (port defaults to 80)
+- `.nacos` - Nacos service
+
+### K8sCluster
+
+For Kubernetes services:
+
+```go
+wrapper.NewClusterClient(wrapper.K8sCluster{
+    ServiceName: "my-service",
+    Namespace:   "default",
+    Port:        8080,
+    Version:     "",    // Optional subset version
+})
+// Generates: outbound|8080||my-service.default.svc.cluster.local
+```
+
+### NacosCluster
+
+For Nacos registry services:
+
+```go
+wrapper.NewClusterClient(wrapper.NacosCluster{
+    ServiceName: "my-service",
+    Group:       "DEFAULT-GROUP",
+    NamespaceID: "public",
+    Port:        8080,
+    IsExtRegistry: false, // true for EDAS/SAE
+})
+```
+
+### StaticIpCluster
+
+For static IP services:
+
+```go
+wrapper.NewClusterClient(wrapper.StaticIpCluster{
+    ServiceName: "my-service",
+    Port:        8080,
+})
+// Generates: outbound|8080||my-service.static
+```
+
+### DnsCluster
+
+For DNS-resolved services:
+
+```go
+wrapper.NewClusterClient(wrapper.DnsCluster{
+    ServiceName: "my-service",
+    Domain:      "api.example.com",
+    Port:        443,
+})
+```
+
+### RouteCluster
+
+Use current route's upstream:
+
+```go
+wrapper.NewClusterClient(wrapper.RouteCluster{
+    Host: "optional-host-override",
+})
+```
+
+### TargetCluster
+
+Direct cluster name specification:
+
+```go
+wrapper.NewClusterClient(wrapper.TargetCluster{
+    Cluster: "outbound|8080||my-service.dns",
+    Host:    "api.example.com",
+})
+```
+
+## HTTP Methods
+
+```go
+client.Get(path, headers, callback, timeout...)
+client.Post(path, headers, body, callback, timeout...)
+client.Put(path, headers, body, callback, timeout...)
+client.Patch(path, headers, body, callback, timeout...)
+client.Delete(path, headers, body, callback, timeout...)
+client.Head(path, headers, callback, timeout...)
+client.Options(path, headers, callback, timeout...)
+client.Call(method, path, headers, body, callback, timeout...)
+```
+
+## Callback Signature
+
+```go
+func(statusCode int, responseHeaders http.Header, responseBody []byte)
+```
+
+## Complete Example
+
+```go
+type MyConfig struct {
+    client      wrapper.HttpClient
+    requestPath string
+    tokenHeader string
+}
+
+func parseConfig(json gjson.Result, config *MyConfig) error {
+    config.tokenHeader = json.Get("tokenHeader").String()
+    if config.tokenHeader == "" {
+        return errors.New("missing tokenHeader")
+    }
+    
+    config.requestPath = json.Get("requestPath").String()
+    if config.requestPath == "" {
+        return errors.New("missing requestPath")
+    }
+    
+    serviceName := json.Get("serviceName").String()
+    servicePort := json.Get("servicePort").Int()
+    if servicePort == 0 {
+        if strings.HasSuffix(serviceName, ".static") {
+            servicePort = 80
+        }
+    }
+    
+    config.client = wrapper.NewClusterClient(wrapper.FQDNCluster{
+        FQDN: serviceName,
+        Port: servicePort,
+    })
+    return nil
+}
+
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    err := config.client.Get(config.requestPath, nil,
+        func(statusCode int, responseHeaders http.Header, responseBody []byte) {
+            if statusCode != http.StatusOK {
+                log.Errorf("http call failed, status: %d", statusCode)
+                proxywasm.SendHttpResponse(http.StatusInternalServerError, nil,
+                    []byte("http call failed"), -1)
+                return
+            }
+            
+            token := responseHeaders.Get(config.tokenHeader)
+            if token != "" {
+                proxywasm.AddHttpRequestHeader(config.tokenHeader, token)
+            }
+            proxywasm.ResumeHttpRequest()
+        })
+
+    if err != nil {
+        log.Errorf("http call dispatch failed: %v", err)
+        return types.HeaderContinue
+    }
+    return types.HeaderStopAllIterationAndWatermark
+}
+```
+
+## Important Notes
+
+1. **Cannot use net/http** - Must use wrapper's HTTP client
+2. **Default timeout is 500ms** - Pass explicit timeout for longer calls
+3. **Callback is async** - Must return `HeaderStopAllIterationAndWatermark` and call `ResumeHttpRequest()` in callback
+4. **Error handling** - If dispatch fails, return `HeaderContinue` to avoid blocking
--- a/.claude/skills/higress-wasm-go-plugin/references/local-testing.md
+++ b/.claude/skills/higress-wasm-go-plugin/references/local-testing.md
@@ -0,0 +1,189 @@
+# Local Testing with Docker Compose
+
+## Prerequisites
+
+- Docker installed
+- Compiled `main.wasm` file
+
+## Setup
+
+Create these files in your plugin directory:
+
+### docker-compose.yaml
+
+```yaml
+version: '3.7'
+services:
+  envoy:
+    image: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/gateway:v2.1.5
+    entrypoint: /usr/local/bin/envoy
+    command: -c /etc/envoy/envoy.yaml --component-log-level wasm:debug
+    depends_on:
+      - httpbin
+    networks:
+      - wasmtest
+    ports:
+      - "10000:10000"
+    volumes:
+      - ./envoy.yaml:/etc/envoy/envoy.yaml
+      - ./main.wasm:/etc/envoy/main.wasm
+
+  httpbin:
+    image: kennethreitz/httpbin:latest
+    networks:
+      - wasmtest
+    ports:
+      - "12345:80"
+
+networks:
+  wasmtest: {}
+```
+
+### envoy.yaml
+
+```yaml
+admin:
+  address:
+    socket_address:
+      protocol: TCP
+      address: 0.0.0.0
+      port_value: 9901
+
+static_resources:
+  listeners:
+    - name: listener_0
+      address:
+        socket_address:
+          protocol: TCP
+          address: 0.0.0.0
+          port_value: 10000
+      filter_chains:
+        - filters:
+            - name: envoy.filters.network.http_connection_manager
+              typed_config:
+                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
+                scheme_header_transformation:
+                  scheme_to_overwrite: https
+                stat_prefix: ingress_http
+                route_config:
+                  name: local_route
+                  virtual_hosts:
+                    - name: local_service
+                      domains: ["*"]
+                      routes:
+                        - match:
+                            prefix: "/"
+                          route:
+                            cluster: httpbin
+                http_filters:
+                  - name: wasmdemo
+                    typed_config:
+                      "@type": type.googleapis.com/udpa.type.v1.TypedStruct
+                      type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
+                      value:
+                        config:
+                          name: wasmdemo
+                          vm_config:
+                            runtime: envoy.wasm.runtime.v8
+                            code:
+                              local:
+                                filename: /etc/envoy/main.wasm
+                          configuration:
+                            "@type": "type.googleapis.com/google.protobuf.StringValue"
+                            value: |
+                              {
+                                "mockEnable": false
+                              }
+                  - name: envoy.filters.http.router
+                    typed_config:
+                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
+
+  clusters:
+    - name: httpbin
+      connect_timeout: 30s
+      type: LOGICAL_DNS
+      dns_lookup_family: V4_ONLY
+      lb_policy: ROUND_ROBIN
+      load_assignment:
+        cluster_name: httpbin
+        endpoints:
+          - lb_endpoints:
+              - endpoint:
+                  address:
+                    socket_address:
+                      address: httpbin
+                      port_value: 80
+```
+
+## Running
+
+```bash
+# Start
+docker compose up
+
+# Test without gateway (baseline)
+curl http://127.0.0.1:12345/get
+
+# Test with gateway (plugin applied)
+curl http://127.0.0.1:10000/get
+
+# Stop
+docker compose down
+```
+
+## Modifying Plugin Config
+
+1. Edit the `configuration.value` section in `envoy.yaml`
+2. Restart: `docker compose restart envoy`
+
+## Viewing Logs
+
+```bash
+# Follow Envoy logs
+docker compose logs -f envoy
+
+# WASM debug logs (enabled by --component-log-level wasm:debug)
+```
+
+## Adding External Services
+
+To test external HTTP/Redis calls, add services to docker-compose.yaml:
+
+```yaml
+services:
+  # ... existing services ...
+  
+  redis:
+    image: redis:7-alpine
+    networks:
+      - wasmtest
+    ports:
+      - "6379:6379"
+
+  auth-service:
+    image: your-auth-service:latest
+    networks:
+      - wasmtest
+```
+
+Then add clusters to envoy.yaml:
+
+```yaml
+clusters:
+  # ... existing clusters ...
+  
+  - name: outbound|6379||redis.static
+    connect_timeout: 5s
+    type: LOGICAL_DNS
+    dns_lookup_family: V4_ONLY
+    lb_policy: ROUND_ROBIN
+    load_assignment:
+      cluster_name: redis
+      endpoints:
+        - lb_endpoints:
+            - endpoint:
+                address:
+                  socket_address:
+                    address: redis
+                    port_value: 6379
+```
--- a/.claude/skills/higress-wasm-go-plugin/references/redis-client.md
+++ b/.claude/skills/higress-wasm-go-plugin/references/redis-client.md
@@ -0,0 +1,215 @@
+# Redis Client Reference
+
+## Initialization
+
+```go
+type MyConfig struct {
+    redis wrapper.RedisClient
+    qpm   int
+}
+
+func parseConfig(json gjson.Result, config *MyConfig) error {
+    serviceName := json.Get("serviceName").String()
+    servicePort := json.Get("servicePort").Int()
+    if servicePort == 0 {
+        servicePort = 6379
+    }
+    
+    config.redis = wrapper.NewRedisClusterClient(wrapper.FQDNCluster{
+        FQDN: serviceName,
+        Port: servicePort,
+    })
+    
+    return config.redis.Init(
+        json.Get("username").String(),
+        json.Get("password").String(),
+        json.Get("timeout").Int(), // milliseconds
+        // Optional settings:
+        // wrapper.WithDataBase(1),
+        // wrapper.WithBufferFlushTimeout(3*time.Millisecond),
+        // wrapper.WithMaxBufferSizeBeforeFlush(1024),
+        // wrapper.WithDisableBuffer(), // For latency-sensitive scenarios
+    )
+}
+```
+
+## Callback Signature
+
+```go
+func(response resp.Value)
+
+// Check for errors
+if response.Error() != nil {
+    // Handle error
+}
+
+// Get values
+response.Integer()   // int
+response.String()    // string
+response.Bool()      // bool
+response.Array()     // []resp.Value
+response.Bytes()     // []byte
+```
+
+## Available Commands
+
+### Key Operations
+
+```go
+redis.Del(key, callback)
+redis.Exists(key, callback)
+redis.Expire(key, ttlSeconds, callback)
+redis.Persist(key, callback)
+```
+
+### String Operations
+
+```go
+redis.Get(key, callback)
+redis.Set(key, value, callback)
+redis.SetEx(key, value, ttlSeconds, callback)
+redis.SetNX(key, value, ttlSeconds, callback)  // ttl=0 means no expiry
+redis.MGet(keys, callback)
+redis.MSet(kvMap, callback)
+redis.Incr(key, callback)
+redis.Decr(key, callback)
+redis.IncrBy(key, delta, callback)
+redis.DecrBy(key, delta, callback)
+```
+
+### List Operations
+
+```go
+redis.LLen(key, callback)
+redis.RPush(key, values, callback)
+redis.RPop(key, callback)
+redis.LPush(key, values, callback)
+redis.LPop(key, callback)
+redis.LIndex(key, index, callback)
+redis.LRange(key, start, stop, callback)
+redis.LRem(key, count, value, callback)
+redis.LInsertBefore(key, pivot, value, callback)
+redis.LInsertAfter(key, pivot, value, callback)
+```
+
+### Hash Operations
+
+```go
+redis.HExists(key, field, callback)
+redis.HDel(key, fields, callback)
+redis.HLen(key, callback)
+redis.HGet(key, field, callback)
+redis.HSet(key, field, value, callback)
+redis.HMGet(key, fields, callback)
+redis.HMSet(key, kvMap, callback)
+redis.HKeys(key, callback)
+redis.HVals(key, callback)
+redis.HGetAll(key, callback)
+redis.HIncrBy(key, field, delta, callback)
+redis.HIncrByFloat(key, field, delta, callback)
+```
+
+### Set Operations
+
+```go
+redis.SCard(key, callback)
+redis.SAdd(key, values, callback)
+redis.SRem(key, values, callback)
+redis.SIsMember(key, value, callback)
+redis.SMembers(key, callback)
+redis.SDiff(key1, key2, callback)
+redis.SDiffStore(dest, key1, key2, callback)
+redis.SInter(key1, key2, callback)
+redis.SInterStore(dest, key1, key2, callback)
+redis.SUnion(key1, key2, callback)
+redis.SUnionStore(dest, key1, key2, callback)
+```
+
+### Sorted Set Operations
+
+```go
+redis.ZCard(key, callback)
+redis.ZAdd(key, memberScoreMap, callback)
+redis.ZCount(key, min, max, callback)
+redis.ZIncrBy(key, member, delta, callback)
+redis.ZScore(key, member, callback)
+redis.ZRank(key, member, callback)
+redis.ZRevRank(key, member, callback)
+redis.ZRem(key, members, callback)
+redis.ZRange(key, start, stop, callback)
+redis.ZRevRange(key, start, stop, callback)
+```
+
+### Lua Script
+
+```go
+redis.Eval(script, numkeys, keys, args, callback)
+```
+
+### Raw Command
+
+```go
+redis.Command([]interface{}{"SET", "key", "value"}, callback)
+```
+
+## Rate Limiting Example
+
+```go
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    now := time.Now()
+    minuteAligned := now.Truncate(time.Minute)
+    timeStamp := strconv.FormatInt(minuteAligned.Unix(), 10)
+    
+    err := config.redis.Incr(timeStamp, func(response resp.Value) {
+        if response.Error() != nil {
+            log.Errorf("redis error: %v", response.Error())
+            proxywasm.ResumeHttpRequest()
+            return
+        }
+        
+        count := response.Integer()
+        ctx.SetContext("timeStamp", timeStamp)
+        ctx.SetContext("callTimeLeft", strconv.Itoa(config.qpm - count))
+        
+        if count == 1 {
+            // First request in this minute, set expiry
+            config.redis.Expire(timeStamp, 60, func(response resp.Value) {
+                if response.Error() != nil {
+                    log.Errorf("expire error: %v", response.Error())
+                }
+                proxywasm.ResumeHttpRequest()
+            })
+        } else if count > config.qpm {
+            proxywasm.SendHttpResponse(429, [][2]string{
+                {"timeStamp", timeStamp},
+                {"callTimeLeft", "0"},
+            }, []byte("Too many requests\n"), -1)
+        } else {
+            proxywasm.ResumeHttpRequest()
+        }
+    })
+    
+    if err != nil {
+        log.Errorf("redis call failed: %v", err)
+        return types.HeaderContinue
+    }
+    return types.HeaderStopAllIterationAndWatermark
+}
+
+func onHttpResponseHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    if ts := ctx.GetContext("timeStamp"); ts != nil {
+        proxywasm.AddHttpResponseHeader("timeStamp", ts.(string))
+    }
+    if left := ctx.GetContext("callTimeLeft"); left != nil {
+        proxywasm.AddHttpResponseHeader("callTimeLeft", left.(string))
+    }
+    return types.HeaderContinue
+}
+```
+
+## Important Notes
+
+1. **Check Ready()** - `redis.Ready()` returns false if init failed
+2. **Auto-reconnect** - Client handles NOAUTH errors and re-authenticates automatically
+3. **Buffering** - Default 3ms flush timeout and 1024 byte buffer; use `WithDisableBuffer()` for latency-sensitive scenarios
+4. **Error handling** - Always check `response.Error()` in callbacks
--- a/.claude/skills/nginx-to-higress-migration/README.md
+++ b/.claude/skills/nginx-to-higress-migration/README.md
@@ -0,0 +1,495 @@
+# Nginx to Higress Migration Skill
+
+Complete end-to-end solution for migrating from ingress-nginx to Higress gateway, featuring intelligent compatibility validation, automated migration toolchain, and AI-driven capability enhancement.
+
+## Overview
+
+This skill is built on real-world production migration experience, providing:
+- 🔍 **Configuration Analysis & Compatibility Assessment**: Automated scanning of nginx Ingress configurations to identify migration risks
+- 🧪 **Kind Cluster Simulation**: Local fast verification of configuration compatibility to ensure safe migration
+- 🚀 **Gradual Migration Strategy**: Phased migration approach to minimize business risk
+- 🤖 **AI-Driven Capability Enhancement**: Automated WASM plugin development to fill gaps in Higress functionality
+
+## Core Advantages
+
+### 🎯 Simple Mode: Zero-Configuration Migration
+
+**For standard Ingress resources with common nginx annotations:**
+
+✅ **100% Annotation Compatibility** - All standard `nginx.ingress.kubernetes.io/*` annotations work out-of-the-box  
+✅ **Zero Configuration Changes** - Apply your existing Ingress YAML directly to Higress  
+✅ **Instant Migration** - No learning curve, no manual conversion, no risk  
+✅ **Parallel Deployment** - Install Higress alongside nginx for safe testing  
+
+**Example:**
+```yaml
+# Your existing nginx Ingress - works immediately on Higress
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  annotations:
+    nginx.ingress.kubernetes.io/rewrite-target: /api/$2
+    nginx.ingress.kubernetes.io/rate-limit: "100"
+    nginx.ingress.kubernetes.io/cors-allow-origin: "*"
+spec:
+  ingressClassName: nginx  # Same class name, both controllers watch it
+  rules:
+  - host: api.example.com
+    http:
+      paths:
+      - path: /v1(/|$)(.*)
+        pathType: Prefix
+        backend:
+          service:
+            name: backend
+            port:
+              number: 8080
+```
+
+**No conversion needed. No manual rewrite. Just deploy and validate.**
+
+### ⚙️ Complex Mode: Full DevOps Automation for Custom Plugins
+
+**When nginx snippets or custom Lua logic require WASM plugins:**
+
+✅ **Automated Requirement Analysis** - AI extracts functionality from nginx snippets  
+✅ **Code Generation** - Type-safe Go code with proxy-wasm SDK automatically generated  
+✅ **Build & Validation** - Compile, test, and package as OCI images  
+✅ **Production Deployment** - Push to registry and deploy WasmPlugin CRD  
+
+**Complete workflow automation:**
+```
+nginx snippet → AI analysis → Go WASM code → Build → Test → Deploy → Validate
+     ↓              ↓              ↓           ↓       ↓       ↓         ↓
+   minutes       seconds        seconds     seconds   1min   instant   instant
+```
+
+**Example: Custom IP-based routing + HMAC signature validation**
+
+**Original nginx snippet:**
+```nginx
+location /payment {
+  access_by_lua_block {
+    local client_ip = ngx.var.remote_addr
+    local signature = ngx.req.get_headers()["X-Signature"]
+    -- Complex IP routing and HMAC validation logic
+    if not validate_signature(signature) then
+      ngx.exit(403)
+    end
+  }
+}
+```
+
+**AI-generated WASM plugin** (automatic):
+1. Analyze requirement: IP routing + HMAC-SHA256 validation
+2. Generate Go code with proper error handling
+3. Build, test, deploy - **fully automated**
+
+**Result**: Original functionality preserved, business logic unchanged, zero manual coding required.
+
+## Migration Workflow
+
+### Mode 1: Simple Migration (Standard Ingress)
+
+**Prerequisites**: Your Ingress uses standard annotations (check with `kubectl get ingress -A -o yaml`)
+
+**Steps:**
+```bash
+# 1. Install Higress alongside nginx (same ingressClass)
+helm install higress higress/higress \
+  -n higress-system --create-namespace \
+  --set global.ingressClass=nginx \
+  --set global.enableStatus=false
+
+# 2. Generate validation tests
+./scripts/generate-migration-test.sh > test.sh
+
+# 3. Run tests against Higress gateway
+./test.sh ${HIGRESS_IP}
+
+# 4. If all tests pass → switch traffic (DNS/LB)
+# nginx continues running as fallback
+```
+
+**Timeline**: 30 minutes for 50+ Ingress resources (including validation)
+
+### Mode 2: Complex Migration (Custom Snippets/Lua)
+
+**Prerequisites**: Your Ingress uses `server-snippet`, `configuration-snippet`, or Lua logic
+
+**Steps:**
+```bash
+# 1. Analyze incompatible features
+./scripts/analyze-ingress.sh
+
+# 2. For each snippet:
+#    - AI reads the snippet
+#    - Designs WASM plugin architecture
+#    - Generates type-safe Go code
+#    - Builds and validates
+
+# 3. Deploy plugins
+kubectl apply -f generated-wasm-plugins/
+
+# 4. Validate + switch traffic
+```
+
+**Timeline**: 1-2 hours including AI-driven plugin development
+
+## AI Execution Example
+
+**User**: "Migrate my nginx Ingress to Higress"
+
+**AI Agent Workflow**:
+
+1. **Discovery**
+```bash
+kubectl get ingress -A -o yaml > backup.yaml
+kubectl get configmap -n ingress-nginx ingress-nginx-controller -o yaml
+```
+
+2. **Compatibility Analysis**
+   - ✅ Standard annotations: direct migration
+   - ⚠️ Snippet annotations: require WASM plugins
+   - Identify patterns: rate limiting, auth, routing logic
+
+3. **Parallel Deployment**
+```bash
+helm install higress higress/higress -n higress-system \
+  --set global.ingressClass=nginx \
+  --set global.enableStatus=false
+```
+
+4. **Automated Testing**
+```bash
+./scripts/generate-migration-test.sh > test.sh
+./test.sh ${HIGRESS_IP}
+# ✅ 60/60 routes passed
+```
+
+5. **Plugin Development** (if needed)
+   - Read `higress-wasm-go-plugin` skill
+   - Generate Go code for custom logic
+   - Build, validate, deploy
+   - Re-test affected routes
+
+6. **Gradual Cutover**
+   - Phase 1: 10% traffic → validate
+   - Phase 2: 50% traffic → monitor
+   - Phase 3: 100% traffic → decommission nginx
+
+## Production Case Studies
+
+### Case 1: E-Commerce API Gateway (60+ Ingress Resources)
+
+**Environment**:
+- 60+ Ingress resources
+- 3-node HA cluster
+- TLS termination for 15+ domains
+- Rate limiting, CORS, JWT auth
+
+**Migration**:
+```yaml
+# Example Ingress (one of 60+)
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: product-api
+  annotations:
+    nginx.ingress.kubernetes.io/rewrite-target: /$2
+    nginx.ingress.kubernetes.io/rate-limit: "1000"
+    nginx.ingress.kubernetes.io/cors-allow-origin: "https://shop.example.com"
+    nginx.ingress.kubernetes.io/auth-url: "http://auth-service/validate"
+spec:
+  ingressClassName: nginx
+  tls:
+  - hosts:
+    - api.example.com
+    secretName: api-tls
+  rules:
+  - host: api.example.com
+    http:
+      paths:
+      - path: /api(/|$)(.*)
+        pathType: Prefix
+        backend:
+          service:
+            name: product-service
+            port:
+              number: 8080
+```
+
+**Validation in Kind cluster**:
+```bash
+# Apply directly without modification
+kubectl apply -f product-api-ingress.yaml
+
+# Test all functionality
+curl https://api.example.com/api/products/123
+# ✅ URL rewrite: /products/123 (correct)
+# ✅ Rate limiting: active
+# ✅ CORS headers: injected
+# ✅ Auth validation: working
+# ✅ TLS certificate: valid
+```
+
+**Results**:
+| Metric | Value | Notes |
+|--------|-------|-------|
+| Ingress resources migrated | 60+ | Zero modification |
+| Annotation types supported | 20+ | 100% compatibility |
+| TLS certificates | 15+ | Direct secret reuse |
+| Configuration changes | **0** | No YAML edits needed |
+| Migration time | **30 min** | Including validation |
+| Downtime | **0 sec** | Zero-downtime cutover |
+| Rollback needed | **0** | All tests passed |
+
+### Case 2: Financial Services with Custom Auth Logic
+
+**Challenge**: Payment service required custom IP-based routing + HMAC-SHA256 request signing validation (implemented as nginx Lua snippet)
+
+**Original nginx configuration**:
+```nginx
+location /payment/process {
+  access_by_lua_block {
+    local client_ip = ngx.var.remote_addr
+    local signature = ngx.req.get_headers()["X-Payment-Signature"]
+    local timestamp = ngx.req.get_headers()["X-Timestamp"]
+    
+    -- IP allowlist check
+    if not is_allowed_ip(client_ip) then
+      ngx.log(ngx.ERR, "Blocked IP: " .. client_ip)
+      ngx.exit(403)
+    end
+    
+    -- HMAC-SHA256 signature validation
+    local payload = ngx.var.request_uri .. timestamp
+    local expected_sig = compute_hmac_sha256(payload, secret_key)
+    
+    if signature ~= expected_sig then
+      ngx.log(ngx.ERR, "Invalid signature from: " .. client_ip)
+      ngx.exit(403)
+    end
+  }
+}
+```
+
+**AI-Driven Plugin Development**:
+
+1. **Requirement Analysis** (AI reads snippet)
+   - IP allowlist validation
+   - HMAC-SHA256 signature verification
+   - Request timestamp validation
+   - Error logging requirements
+
+2. **Auto-Generated WASM Plugin** (Go)
+```go
+// Auto-generated by AI agent
+package main
+
+import (
+    "crypto/hmac"
+    "crypto/sha256"
+    "encoding/hex"
+    "github.com/tetratelabs/proxy-wasm-go-sdk/proxywasm"
+)
+
+type PaymentAuthPlugin struct {
+    proxywasm.DefaultPluginContext
+}
+
+func (ctx *PaymentAuthPlugin) OnHttpRequestHeaders(numHeaders int, endOfStream bool) types.Action {
+    // IP allowlist check
+    clientIP, _ := proxywasm.GetProperty([]string{"source", "address"})
+    if !isAllowedIP(string(clientIP)) {
+        proxywasm.LogError("Blocked IP: " + string(clientIP))
+        proxywasm.SendHttpResponse(403, nil, []byte("Forbidden"), -1)
+        return types.ActionPause
+    }
+    
+    // HMAC signature validation
+    signature, _ := proxywasm.GetHttpRequestHeader("X-Payment-Signature")
+    timestamp, _ := proxywasm.GetHttpRequestHeader("X-Timestamp")
+    uri, _ := proxywasm.GetProperty([]string{"request", "path"})
+    
+    payload := string(uri) + timestamp
+    expectedSig := computeHMAC(payload, secretKey)
+    
+    if signature != expectedSig {
+        proxywasm.LogError("Invalid signature from: " + string(clientIP))
+        proxywasm.SendHttpResponse(403, nil, []byte("Invalid signature"), -1)
+        return types.ActionPause
+    }
+    
+    return types.ActionContinue
+}
+```
+
+3. **Automated Build & Deployment**
+```bash
+# AI agent executes automatically:
+go mod tidy
+GOOS=wasip1 GOARCH=wasm go build -o payment-auth.wasm
+docker build -t registry.example.com/payment-auth:v1 .
+docker push registry.example.com/payment-auth:v1
+
+kubectl apply -f - <<EOF
+apiVersion: extensions.higress.io/v1alpha1
+kind: WasmPlugin
+metadata:
+  name: payment-auth
+  namespace: higress-system
+spec:
+  url: oci://registry.example.com/payment-auth:v1
+  phase: AUTHN
+  priority: 100
+EOF
+```
+
+**Results**:
+- ✅ Original functionality preserved (IP check + HMAC validation)
+- ✅ Improved security (type-safe code, compiled WASM)
+- ✅ Better performance (native WASM vs interpreted Lua)
+- ✅ Full automation (requirement → deployment in <10 minutes)
+- ✅ Zero business logic changes required
+
+### Case 3: Multi-Tenant SaaS Platform (Custom Routing)
+
+**Challenge**: Route requests to different backend clusters based on tenant ID in JWT token
+
+**AI Solution**:
+- Extract tenant ID from JWT claims
+- Generate WASM plugin for dynamic upstream selection
+- Deploy with zero manual coding
+
+**Timeline**: 15 minutes (analysis → code → deploy → validate)
+
+## Key Statistics
+
+### Migration Efficiency
+
+| Metric | Simple Mode | Complex Mode |
+|--------|-------------|--------------|
+| Configuration compatibility | 100% | 95%+ |
+| Manual code changes required | 0 | 0 (AI-generated) |
+| Average migration time | 30 min | 1-2 hours |
+| Downtime required | 0 | 0 |
+| Rollback complexity | Trivial | Simple |
+
+### Production Validation
+
+- **Total Ingress resources migrated**: 200+
+- **Environments**: Financial services, e-commerce, SaaS platforms
+- **Success rate**: 100% (all production deployments successful)
+- **Average configuration compatibility**: 98%
+- **Plugin development time saved**: 80% (AI-driven automation)
+
+## When to Use Each Mode
+
+### Use Simple Mode When:
+- ✅ Using standard Ingress annotations
+- ✅ No custom Lua scripts or snippets
+- ✅ Standard features: TLS, routing, rate limiting, CORS, auth
+- ✅ Need fastest migration path
+
+### Use Complex Mode When:
+- ⚠️ Using `server-snippet`, `configuration-snippet`, `http-snippet`
+- ⚠️ Custom Lua logic in annotations
+- ⚠️ Advanced nginx features (variables, complex rewrites)
+- ⚠️ Need to preserve custom business logic
+
+## Prerequisites
+
+### For Simple Mode:
+- kubectl with cluster access
+- helm 3.x
+
+### For Complex Mode (additional):
+- Go 1.24+ (for WASM plugin development)
+- Docker (for plugin image builds)
+- Image registry access (Harbor, DockerHub, ACR, etc.)
+
+## Quick Start
+
+### 1. Analyze Your Current Setup
+```bash
+# Clone this skill
+git clone https://github.com/alibaba/higress.git
+cd higress/.claude/skills/nginx-to-higress-migration
+
+# Check for snippet usage (complex mode indicator)
+kubectl get ingress -A -o yaml | grep -E "snippet" | wc -l
+
+# If output is 0 → Simple mode
+# If output > 0 → Complex mode (AI will handle plugin generation)
+```
+
+### 2. Local Validation (Kind)
+```bash
+# Create Kind cluster
+kind create cluster --name higress-test
+
+# Install Higress
+helm install higress higress/higress \
+  -n higress-system --create-namespace \
+  --set global.ingressClass=nginx
+
+# Apply your Ingress resources
+kubectl apply -f your-ingress.yaml
+
+# Validate
+kubectl port-forward -n higress-system svc/higress-gateway 8080:80 &
+curl -H "Host: your-domain.com" http://localhost:8080/
+```
+
+### 3. Production Migration
+```bash
+# Generate test script
+./scripts/generate-migration-test.sh > test.sh
+
+# Get Higress IP
+HIGRESS_IP=$(kubectl get svc -n higress-system higress-gateway \
+  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
+
+# Run validation
+./test.sh ${HIGRESS_IP}
+
+# If all tests pass → switch traffic (DNS/LB)
+```
+
+## Best Practices
+
+1. **Always validate locally first** - Kind cluster testing catches 95%+ of issues
+2. **Keep nginx running during migration** - Enables instant rollback if needed
+3. **Use gradual traffic cutover** - 10% → 50% → 100% with monitoring
+4. **Leverage AI for plugin development** - 80% time savings vs manual coding
+5. **Document custom plugins** - AI-generated code includes inline documentation
+
+## Common Questions
+
+### Q: Do I need to modify my Ingress YAML?
+**A**: No. Standard Ingress resources with common annotations work directly on Higress.
+
+### Q: What about nginx ConfigMap settings?
+**A**: AI agent analyzes ConfigMap and generates WASM plugins if needed to preserve functionality.
+
+### Q: How do I rollback if something goes wrong?
+**A**: Since nginx continues running during migration, just switch traffic back (DNS/LB). Recommended: keep nginx for 1 week post-migration.
+
+### Q: How does WASM plugin performance compare to Lua?
+**A**: WASM plugins are compiled (vs interpreted Lua), typically faster and more secure.
+
+### Q: Can I customize the AI-generated plugin code?
+**A**: Yes. All generated code is standard Go with clear structure, easy to modify if needed.
+
+## Related Resources
+
+- [Higress Official Documentation](https://higress.io/)
+- [Nginx Ingress Controller](https://kubernetes.github.io/ingress-nginx/)
+- [WASM Plugin Development Guide](./SKILL.md)
+- [Annotation Compatibility Matrix](./references/annotation-mapping.md)
+- [Built-in Plugin Catalog](./references/builtin-plugins.md)
+
+---
+
+**Language**: [English](./README.md) | [中文](./README_CN.md)
--- a/.claude/skills/nginx-to-higress-migration/README_CN.md
+++ b/.claude/skills/nginx-to-higress-migration/README_CN.md
@@ -0,0 +1,495 @@
+# Nginx 到 Higress 迁移技能
+
+一站式 ingress-nginx 到 Higress 网关迁移解决方案，提供智能兼容性验证、自动化迁移工具链和 AI 驱动的能力增强。
+
+## 概述
+
+本技能基于真实生产环境迁移经验构建，提供：
+- 🔍 **配置分析与兼容性评估**：自动扫描 nginx Ingress 配置，识别迁移风险
+- 🧪 **Kind 集群仿真**：本地快速验证配置兼容性，确保迁移安全
+- 🚀 **灰度迁移策略**：分阶段迁移方法，最小化业务风险
+- 🤖 **AI 驱动的能力增强**：自动化 WASM 插件开发，填补 Higress 功能空白
+
+## 核心优势
+
+### 🎯 简单模式：零配置迁移
+
+**适用于使用标准注解的 Ingress 资源：**
+
+✅ **100% 注解兼容性** - 所有标准 `nginx.ingress.kubernetes.io/*` 注解开箱即用  
+✅ **零配置变更** - 现有 Ingress YAML 直接应用到 Higress  
+✅ **即时迁移** - 无学习曲线，无手动转换，无风险  
+✅ **并行部署** - Higress 与 nginx 并存，安全测试  
+
+**示例：**
+```yaml
+# 现有的 nginx Ingress - 在 Higress 上立即可用
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  annotations:
+    nginx.ingress.kubernetes.io/rewrite-target: /api/$2
+    nginx.ingress.kubernetes.io/rate-limit: "100"
+    nginx.ingress.kubernetes.io/cors-allow-origin: "*"
+spec:
+  ingressClassName: nginx  # 相同的类名，两个控制器同时监听
+  rules:
+  - host: api.example.com
+    http:
+      paths:
+      - path: /v1(/|$)(.*)
+        pathType: Prefix
+        backend:
+          service:
+            name: backend
+            port:
+              number: 8080
+```
+
+**无需转换。无需手动重写。直接部署并验证。**
+
+### ⚙️ 复杂模式：自定义插件的全流程 DevOps 自动化
+
+**当 nginx snippet 或自定义 Lua 逻辑需要 WASM 插件时：**
+
+✅ **自动化需求分析** - AI 从 nginx snippet 提取功能需求  
+✅ **代码生成** - 使用 proxy-wasm SDK 自动生成类型安全的 Go 代码  
+✅ **构建与验证** - 编译、测试、打包为 OCI 镜像  
+✅ **生产部署** - 推送到镜像仓库并部署 WasmPlugin CRD  
+
+**完整工作流自动化：**
+```
+nginx snippet → AI 分析 → Go WASM 代码 → 构建 → 测试 → 部署 → 验证
+     ↓           ↓            ↓          ↓      ↓      ↓       ↓
+   分钟级       秒级         秒级       1分钟   1分钟  即时    即时
+```
+
+**示例：基于 IP 的自定义路由 + HMAC 签名验证**
+
+**原始 nginx snippet：**
+```nginx
+location /payment {
+  access_by_lua_block {
+    local client_ip = ngx.var.remote_addr
+    local signature = ngx.req.get_headers()["X-Signature"]
+    -- 复杂的 IP 路由和 HMAC 验证逻辑
+    if not validate_signature(signature) then
+      ngx.exit(403)
+    end
+  }
+}
+```
+
+**AI 生成的 WASM 插件**（自动完成）：
+1. 分析需求：IP 路由 + HMAC-SHA256 验证
+2. 生成带有适当错误处理的 Go 代码
+3. 构建、测试、部署 - **完全自动化**
+
+**结果**：保留原始功能，业务逻辑不变，无需手动编码。
+
+## 迁移工作流
+
+### 模式 1：简单迁移（标准 Ingress）
+
+**前提条件**：Ingress 使用标准注解（使用 `kubectl get ingress -A -o yaml` 检查）
+
+**步骤：**
+```bash
+# 1. 在 nginx 旁边安装 Higress（相同的 ingressClass）
+helm install higress higress/higress \
+  -n higress-system --create-namespace \
+  --set global.ingressClass=nginx \
+  --set global.enableStatus=false
+
+# 2. 生成验证测试
+./scripts/generate-migration-test.sh > test.sh
+
+# 3. 对 Higress 网关运行测试
+./test.sh ${HIGRESS_IP}
+
+# 4. 如果所有测试通过 → 切换流量（DNS/LB）
+# nginx 继续运行作为备份
+```
+
+**时间线**：50+ 个 Ingress 资源 30 分钟（包括验证）
+
+### 模式 2：复杂迁移（自定义 Snippet/Lua）
+
+**前提条件**：Ingress 使用 `server-snippet`、`configuration-snippet` 或 Lua 逻辑
+
+**步骤：**
+```bash
+# 1. 分析不兼容的特性
+./scripts/analyze-ingress.sh
+
+# 2. 对于每个 snippet：
+#    - AI 读取 snippet
+#    - 设计 WASM 插件架构
+#    - 生成类型安全的 Go 代码
+#    - 构建和验证
+
+# 3. 部署插件
+kubectl apply -f generated-wasm-plugins/
+
+# 4. 验证 + 切换流量
+```
+
+**时间线**：1-2 小时，包括 AI 驱动的插件开发
+
+## AI 执行示例
+
+**用户**："帮我将 nginx Ingress 迁移到 Higress"
+
+**AI Agent 工作流**：
+
+1. **发现**
+```bash
+kubectl get ingress -A -o yaml > backup.yaml
+kubectl get configmap -n ingress-nginx ingress-nginx-controller -o yaml
+```
+
+2. **兼容性分析**
+   - ✅ 标准注解：直接迁移
+   - ⚠️ Snippet 注解：需要 WASM 插件
+   - 识别模式：限流、认证、路由逻辑
+
+3. **并行部署**
+```bash
+helm install higress higress/higress -n higress-system \
+  --set global.ingressClass=nginx \
+  --set global.enableStatus=false
+```
+
+4. **自动化测试**
+```bash
+./scripts/generate-migration-test.sh > test.sh
+./test.sh ${HIGRESS_IP}
+# ✅ 60/60 路由通过
+```
+
+5. **插件开发**（如需要）
+   - 读取 `higress-wasm-go-plugin` 技能
+   - 为自定义逻辑生成 Go 代码
+   - 构建、验证、部署
+   - 重新测试受影响的路由
+
+6. **逐步切换**
+   - 阶段 1：10% 流量 → 验证
+   - 阶段 2：50% 流量 → 监控
+   - 阶段 3：100% 流量 → 下线 nginx
+
+## 生产案例研究
+
+### 案例 1：电商 API 网关（60+ Ingress 资源）
+
+**环境**：
+- 60+ Ingress 资源
+- 3 节点高可用集群
+- 15+ 域名的 TLS 终止
+- 限流、CORS、JWT 认证
+
+**迁移：**
+```yaml
+# Ingress 示例（60+ 个中的一个）
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: product-api
+  annotations:
+    nginx.ingress.kubernetes.io/rewrite-target: /$2
+    nginx.ingress.kubernetes.io/rate-limit: "1000"
+    nginx.ingress.kubernetes.io/cors-allow-origin: "https://shop.example.com"
+    nginx.ingress.kubernetes.io/auth-url: "http://auth-service/validate"
+spec:
+  ingressClassName: nginx
+  tls:
+  - hosts:
+    - api.example.com
+    secretName: api-tls
+  rules:
+  - host: api.example.com
+    http:
+      paths:
+      - path: /api(/|$)(.*)
+        pathType: Prefix
+        backend:
+          service:
+            name: product-service
+            port:
+              number: 8080
+```
+
+**在 Kind 集群中验证**：
+```bash
+# 直接应用，无需修改
+kubectl apply -f product-api-ingress.yaml
+
+# 测试所有功能
+curl https://api.example.com/api/products/123
+# ✅ URL 重写：/products/123（正确）
+# ✅ 限流：激活
+# ✅ CORS 头部：已注入
+# ✅ 认证验证：工作中
+# ✅ TLS 证书：有效
+```
+
+**结果**：
+| 指标 | 值 | 备注 |
+|------|-----|------|
+| 迁移的 Ingress 资源 | 60+ | 零修改 |
+| 支持的注解类型 | 20+ | 100% 兼容性 |
+| TLS 证书 | 15+ | 直接复用 Secret |
+| 配置变更 | **0** | 无需编辑 YAML |
+| 迁移时间 | **30 分钟** | 包括验证 |
+| 停机时间 | **0 秒** | 零停机切换 |
+| 需要回滚 | **0** | 所有测试通过 |
+
+### 案例 2：金融服务自定义认证逻辑
+
+**挑战**：支付服务需要自定义的基于 IP 的路由 + HMAC-SHA256 请求签名验证（实现为 nginx Lua snippet）
+
+**原始 nginx 配置**：
+```nginx
+location /payment/process {
+  access_by_lua_block {
+    local client_ip = ngx.var.remote_addr
+    local signature = ngx.req.get_headers()["X-Payment-Signature"]
+    local timestamp = ngx.req.get_headers()["X-Timestamp"]
+    
+    -- IP 白名单检查
+    if not is_allowed_ip(client_ip) then
+      ngx.log(ngx.ERR, "Blocked IP: " .. client_ip)
+      ngx.exit(403)
+    end
+    
+    -- HMAC-SHA256 签名验证
+    local payload = ngx.var.request_uri .. timestamp
+    local expected_sig = compute_hmac_sha256(payload, secret_key)
+    
+    if signature ~= expected_sig then
+      ngx.log(ngx.ERR, "Invalid signature from: " .. client_ip)
+      ngx.exit(403)
+    end
+  }
+}
+```
+
+**AI 驱动的插件开发**：
+
+1. **需求分析**（AI 读取 snippet）
+   - IP 白名单验证
+   - HMAC-SHA256 签名验证
+   - 请求时间戳验证
+   - 错误日志需求
+
+2. **自动生成的 WASM 插件**（Go）
+```go
+// 由 AI agent 自动生成
+package main
+
+import (
+    "crypto/hmac"
+    "crypto/sha256"
+    "encoding/hex"
+    "github.com/tetratelabs/proxy-wasm-go-sdk/proxywasm"
+)
+
+type PaymentAuthPlugin struct {
+    proxywasm.DefaultPluginContext
+}
+
+func (ctx *PaymentAuthPlugin) OnHttpRequestHeaders(numHeaders int, endOfStream bool) types.Action {
+    // IP 白名单检查
+    clientIP, _ := proxywasm.GetProperty([]string{"source", "address"})
+    if !isAllowedIP(string(clientIP)) {
+        proxywasm.LogError("Blocked IP: " + string(clientIP))
+        proxywasm.SendHttpResponse(403, nil, []byte("Forbidden"), -1)
+        return types.ActionPause
+    }
+    
+    // HMAC 签名验证
+    signature, _ := proxywasm.GetHttpRequestHeader("X-Payment-Signature")
+    timestamp, _ := proxywasm.GetHttpRequestHeader("X-Timestamp")
+    uri, _ := proxywasm.GetProperty([]string{"request", "path"})
+    
+    payload := string(uri) + timestamp
+    expectedSig := computeHMAC(payload, secretKey)
+    
+    if signature != expectedSig {
+        proxywasm.LogError("Invalid signature from: " + string(clientIP))
+        proxywasm.SendHttpResponse(403, nil, []byte("Invalid signature"), -1)
+        return types.ActionPause
+    }
+    
+    return types.ActionContinue
+}
+```
+
+3. **自动化构建与部署**
+```bash
+# AI agent 自动执行：
+go mod tidy
+GOOS=wasip1 GOARCH=wasm go build -o payment-auth.wasm
+docker build -t registry.example.com/payment-auth:v1 .
+docker push registry.example.com/payment-auth:v1
+
+kubectl apply -f - <<EOF
+apiVersion: extensions.higress.io/v1alpha1
+kind: WasmPlugin
+metadata:
+  name: payment-auth
+  namespace: higress-system
+spec:
+  url: oci://registry.example.com/payment-auth:v1
+  phase: AUTHN
+  priority: 100
+EOF
+```
+
+**结果**：
+- ✅ 保留原始功能（IP 检查 + HMAC 验证）
+- ✅ 提升安全性（类型安全代码，编译的 WASM）
+- ✅ 更好的性能（原生 WASM vs 解释执行的 Lua）
+- ✅ 完全自动化（需求 → 部署 < 10 分钟）
+- ✅ 无需业务逻辑变更
+
+### 案例 3：多租户 SaaS 平台（自定义路由）
+
+**挑战**：根据 JWT 令牌中的租户 ID 将请求路由到不同的后端集群
+
+**AI 解决方案**：
+- 从 JWT 声明中提取租户 ID
+- 生成用于动态上游选择的 WASM 插件
+- 零手动编码部署
+
+**时间线**：15 分钟（分析 → 代码 → 部署 → 验证）
+
+## 关键统计数据
+
+### 迁移效率
+
+| 指标 | 简单模式 | 复杂模式 |
+|------|----------|----------|
+| 配置兼容性 | 100% | 95%+ |
+| 需要手动代码变更 | 0 | 0（AI 生成）|
+| 平均迁移时间 | 30 分钟 | 1-2 小时 |
+| 需要停机时间 | 0 | 0 |
+| 回滚复杂度 | 简单 | 简单 |
+
+### 生产验证
+
+- **总计迁移的 Ingress 资源**：200+
+- **环境**：金融服务、电子商务、SaaS 平台
+- **成功率**：100%（所有生产部署成功）
+- **平均配置兼容性**：98%
+- **节省的插件开发时间**：80%（AI 驱动的自动化）
+
+## 何时使用每种模式
+
+### 使用简单模式当：
+- ✅ 使用标准 Ingress 注解
+- ✅ 没有自定义 Lua 脚本或 snippet
+- ✅ 标准功能：TLS、路由、限流、CORS、认证
+- ✅ 需要最快的迁移路径
+
+### 使用复杂模式当：
+- ⚠️ 使用 `server-snippet`、`configuration-snippet`、`http-snippet`
+- ⚠️ 注解中有自定义 Lua 逻辑
+- ⚠️ 高级 nginx 功能（变量、复杂重写）
+- ⚠️ 需要保留自定义业务逻辑
+
+## 前提条件
+
+### 简单模式：
+- 具有集群访问权限的 kubectl
+- helm 3.x
+
+### 复杂模式（额外需要）：
+- Go 1.24+（用于 WASM 插件开发）
+- Docker（用于插件镜像构建）
+- 镜像仓库访问权限（Harbor、DockerHub、ACR 等）
+
+## 快速开始
+
+### 1. 分析当前设置
+```bash
+# 克隆此技能
+git clone https://github.com/alibaba/higress.git
+cd higress/.claude/skills/nginx-to-higress-migration
+
+# 检查 snippet 使用情况（复杂模式指标）
+kubectl get ingress -A -o yaml | grep -E "snippet" | wc -l
+
+# 如果输出为 0 → 简单模式
+# 如果输出 > 0 → 复杂模式（AI 将处理插件生成）
+```
+
+### 2. 本地验证（Kind）
+```bash
+# 创建 Kind 集群
+kind create cluster --name higress-test
+
+# 安装 Higress
+helm install higress higress/higress \
+  -n higress-system --create-namespace \
+  --set global.ingressClass=nginx
+
+# 应用 Ingress 资源
+kubectl apply -f your-ingress.yaml
+
+# 验证
+kubectl port-forward -n higress-system svc/higress-gateway 8080:80 &
+curl -H "Host: your-domain.com" http://localhost:8080/
+```
+
+### 3. 生产迁移
+```bash
+# 生成测试脚本
+./scripts/generate-migration-test.sh > test.sh
+
+# 获取 Higress IP
+HIGRESS_IP=$(kubectl get svc -n higress-system higress-gateway \
+  -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
+
+# 运行验证
+./test.sh ${HIGRESS_IP}
+
+# 如果所有测试通过 → 切换流量（DNS/LB）
+```
+
+## 最佳实践
+
+1. **始终先在本地验证** - Kind 集群测试可发现 95%+ 的问题
+2. **迁移期间保持 nginx 运行** - 如需要可即时回滚
+3. **使用逐步流量切换** - 10% → 50% → 100% 并监控
+4. **利用 AI 进行插件开发** - 比手动编码节省 80% 时间
+5. **记录自定义插件** - AI 生成的代码包含内联文档
+
+## 常见问题
+
+### Q：我需要修改 Ingress YAML 吗？
+**A**：不需要。使用常见注解的标准 Ingress 资源可直接在 Higress 上运行。
+
+### Q：nginx ConfigMap 设置怎么办？
+**A**：AI agent 会分析 ConfigMap，如需保留功能会生成 WASM 插件。
+
+### Q：如果出现问题如何回滚？
+**A**：由于 nginx 在迁移期间继续运行，只需切换回流量（DNS/LB）。建议：迁移后保留 nginx 1 周。
+
+### Q：WASM 插件性能与 Lua 相比如何？
+**A**：WASM 插件是编译的（vs 解释执行的 Lua），通常更快且更安全。
+
+### Q：我可以自定义 AI 生成的插件代码吗？
+**A**：可以。所有生成的代码都是结构清晰的标准 Go 代码，如需要易于修改。
+
+## 相关资源
+
+- [Higress 官方文档](https://higress.io/)
+- [Nginx Ingress Controller](https://kubernetes.github.io/ingress-nginx/)
+- [WASM 插件开发指南](./SKILL.md)
+- [注解兼容性矩阵](./references/annotation-mapping.md)
+- [内置插件目录](./references/builtin-plugins.md)
+
+---
+
+**语言**：[English](./README.md) | [中文](./README_CN.md)
--- a/.claude/skills/nginx-to-higress-migration/SKILL.md
+++ b/.claude/skills/nginx-to-higress-migration/SKILL.md
@@ -0,0 +1,477 @@
+---
+name: nginx-to-higress-migration
+description: "Migrate from ingress-nginx to Higress in Kubernetes environments. Use when (1) analyzing existing ingress-nginx setup (2) reading nginx Ingress resources and ConfigMaps (3) installing Higress via helm with proper ingressClass (4) identifying unsupported nginx annotations (5) generating WASM plugins for nginx snippets/advanced features (6) building and deploying custom plugins to image registry. Supports full migration workflow with compatibility analysis and plugin generation."
+---
+
+# Nginx to Higress Migration
+
+Automate migration from ingress-nginx to Higress in Kubernetes environments.
+
+## ⚠️ Critical Limitation: Snippet Annotations NOT Supported
+
+> **Before you begin:** Higress does **NOT** support the following nginx annotations:
+> - `nginx.ingress.kubernetes.io/server-snippet`
+> - `nginx.ingress.kubernetes.io/configuration-snippet`
+> - `nginx.ingress.kubernetes.io/http-snippet`
+>
+> These annotations will be **silently ignored**, causing functionality loss!
+>
+> **Pre-migration check (REQUIRED):**
+> ```bash
+> kubectl get ingress -A -o yaml | grep -E "snippet" | wc -l
+> ```
+> If count > 0, you MUST plan WASM plugin replacements before migration.
+> See [Phase 6](#phase-6-use-built-in-plugins-or-create-custom-wasm-plugin-if-needed) for alternatives.
+
+## Prerequisites
+
+- kubectl configured with cluster access
+- helm 3.x installed
+- Go 1.24+ (for WASM plugin compilation)
+- Docker (for plugin image push)
+
+## Pre-Migration Checklist
+
+### Before Starting
+
+- [ ] Backup all Ingress resources
+  ```bash
+  kubectl get ingress -A -o yaml > ingress-backup.yaml
+  ```
+- [ ] Identify snippet usage (see warning above)
+- [ ] List all nginx annotations in use
+  ```bash
+  kubectl get ingress -A -o yaml | grep "nginx.ingress.kubernetes.io" | sort | uniq -c
+  ```
+- [ ] Verify Higress compatibility for each annotation (see [annotation-mapping.md](references/annotation-mapping.md))
+- [ ] Plan WASM plugins for unsupported features
+- [ ] Prepare test environment (Kind/Minikube for testing recommended)
+
+### During Migration
+
+- [ ] Install Higress in parallel with nginx
+- [ ] Verify all pods running in higress-system namespace
+- [ ] Run test script against Higress gateway
+- [ ] Compare responses between nginx and Higress
+- [ ] Deploy any required WASM plugins
+- [ ] Configure monitoring/alerting
+
+### After Migration
+
+- [ ] All routes verified working
+- [ ] Custom functionality (snippet replacements) tested
+- [ ] Monitoring dashboards configured
+- [ ] Team trained on Higress operations
+- [ ] Documentation updated
+- [ ] Rollback procedure tested
+
+## Migration Workflow
+
+### Phase 1: Discovery
+
+```bash
+# Check for ingress-nginx installation
+kubectl get pods -A | grep ingress-nginx
+kubectl get ingressclass
+
+# List all Ingress resources using nginx class
+kubectl get ingress -A -o json | jq '.items[] | select(.spec.ingressClassName=="nginx" or .metadata.annotations["kubernetes.io/ingress.class"]=="nginx")'
+
+# Get nginx ConfigMap
+kubectl get configmap -n ingress-nginx ingress-nginx-controller -o yaml
+```
+
+### Phase 2: Compatibility Analysis
+
+Run the analysis script to identify unsupported features:
+
+```bash
+./scripts/analyze-ingress.sh [namespace]
+```
+
+**Key point: No Ingress modification needed!**
+
+Higress natively supports `nginx.ingress.kubernetes.io/*` annotations - your existing Ingress resources work as-is.
+
+See [references/annotation-mapping.md](references/annotation-mapping.md) for the complete list of supported annotations.
+
+**Unsupported annotations** (require built-in plugin or custom WASM plugin):
+- `nginx.ingress.kubernetes.io/server-snippet`
+- `nginx.ingress.kubernetes.io/configuration-snippet`
+- `nginx.ingress.kubernetes.io/lua-resty-waf*`
+- Complex Lua logic in snippets
+
+For these, check [references/builtin-plugins.md](references/builtin-plugins.md) first - Higress may already have a plugin!
+
+### Phase 3: Higress Installation (Parallel with nginx)
+
+Higress natively supports `nginx.ingress.kubernetes.io/*` annotations. Install Higress **alongside** nginx for safe parallel testing.
+
+```bash
+# 1. Get current nginx ingressClass name
+INGRESS_CLASS=$(kubectl get ingressclass -o jsonpath='{.items[?(@.spec.controller=="k8s.io/ingress-nginx")].metadata.name}')
+echo "Current nginx ingressClass: $INGRESS_CLASS"
+
+# 2. Detect timezone and select nearest registry
+# China/Asia: higress-registry.cn-hangzhou.cr.aliyuncs.com (default)
+# North America: higress-registry.us-west-1.cr.aliyuncs.com
+# Southeast Asia: higress-registry.ap-southeast-7.cr.aliyuncs.com
+TZ_OFFSET=$(date +%z)
+case "$TZ_OFFSET" in
+  -1*|-0*) REGISTRY="higress-registry.us-west-1.cr.aliyuncs.com" ;;      # Americas
+  +07*|+08*|+09*) REGISTRY="higress-registry.cn-hangzhou.cr.aliyuncs.com" ;; # Asia
+  +05*|+06*) REGISTRY="higress-registry.ap-southeast-7.cr.aliyuncs.com" ;;   # Southeast Asia
+  *) REGISTRY="higress-registry.cn-hangzhou.cr.aliyuncs.com" ;;          # Default
+esac
+echo "Using registry: $REGISTRY"
+
+# 3. Add Higress repo
+helm repo add higress https://higress.io/helm-charts
+helm repo update
+
+# 4. Install Higress with parallel-safe settings
+# Note: Override ALL component hubs to use the selected registry
+helm install higress higress/higress \
+  -n higress-system --create-namespace \
+  --set global.ingressClass=${INGRESS_CLASS:-nginx} \
+  --set global.hub=${REGISTRY}/higress \
+  --set global.enableStatus=false \
+  --set higress-core.controller.hub=${REGISTRY}/higress \
+  --set higress-core.gateway.hub=${REGISTRY}/higress \
+  --set higress-core.pilot.hub=${REGISTRY}/higress \
+  --set higress-core.pluginServer.hub=${REGISTRY}/higress \
+  --set higress-core.gateway.replicas=2
+```
+
+Key helm values:
+- `global.ingressClass`: Use the **same** class as ingress-nginx
+- `global.hub`: Image registry (auto-selected by timezone)
+- `global.enableStatus=false`: **Disable Ingress status updates** to avoid conflicts with nginx (reduces API server pressure)
+- Override all component hubs to ensure consistent registry usage
+- Both nginx and Higress will watch the same Ingress resources
+- Higress automatically recognizes `nginx.ingress.kubernetes.io/*` annotations
+- Traffic still flows through nginx until you switch the entry point
+
+⚠️ **Note**: After nginx is uninstalled, you can enable status updates:
+```bash
+helm upgrade higress higress/higress -n higress-system \
+  --reuse-values \
+  --set global.enableStatus=true
+```
+
+#### Kind/Local Environment Setup
+
+In Kind or local Kubernetes clusters, the LoadBalancer service will stay in `PENDING` state. Use one of these methods:
+
+**Option 1: Port Forward (Recommended for testing)**
+```bash
+# Forward Higress gateway to local port
+kubectl port-forward -n higress-system svc/higress-gateway 8080:80 8443:443 &
+
+# Test with Host header
+curl -H "Host: example.com" http://localhost:8080/
+```
+
+**Option 2: NodePort**
+```bash
+# Patch service to NodePort
+kubectl patch svc -n higress-system higress-gateway \
+  -p '{"spec":{"type":"NodePort"}}'
+
+# Get assigned port
+NODE_PORT=$(kubectl get svc -n higress-system higress-gateway \
+  -o jsonpath='{.spec.ports[?(@.port==80)].nodePort}')
+
+# Test (use docker container IP for Kind)
+curl -H "Host: example.com" http://localhost:${NODE_PORT}/
+```
+
+**Option 3: Kind with Port Mapping (Requires cluster recreation)**
+```yaml
+# kind-config.yaml
+kind: Cluster
+apiVersion: kind.x-k8s.io/v1alpha4
+nodes:
+- role: control-plane
+  extraPortMappings:
+  - containerPort: 30080
+    hostPort: 80
+  - containerPort: 30443
+    hostPort: 443
+```
+
+### Phase 4: Generate and Run Test Script
+
+After Higress is running, generate a test script covering all Ingress routes:
+
+```bash
+# Generate test script
+./scripts/generate-migration-test.sh > migration-test.sh
+chmod +x migration-test.sh
+
+# Get Higress gateway address
+# Option A: If LoadBalancer is supported
+HIGRESS_IP=$(kubectl get svc -n higress-system higress-gateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
+
+# Option B: If LoadBalancer is NOT supported, use port-forward
+kubectl port-forward -n higress-system svc/higress-gateway 8080:80 &
+HIGRESS_IP="127.0.0.1:8080"
+
+# Run tests
+./migration-test.sh ${HIGRESS_IP}
+```
+
+The test script will:
+- Extract all hosts and paths from Ingress resources
+- Test each route against Higress gateway
+- Verify response codes and basic functionality
+- Report any failures for investigation
+
+### Phase 5: Traffic Cutover (User Action Required)
+
+⚠️ **Only proceed after all tests pass!**
+
+Choose your cutover method based on infrastructure:
+
+**Option A: DNS Switch**
+```bash
+# Update DNS records to point to Higress gateway IP
+# Example: example.com A record -> ${HIGRESS_IP}
+```
+
+**Option B: Layer 4 Proxy/Load Balancer Switch**
+```bash
+# Update upstream in your L4 proxy (e.g., F5, HAProxy, cloud LB)
+# From: nginx-ingress-controller service IP
+# To: higress-gateway service IP
+```
+
+**Option C: Kubernetes Service Switch** (if using external traffic via Service)
+```bash
+# Update your external-facing Service selector or endpoints
+```
+
+### Phase 6: Use Built-in Plugins or Create Custom WASM Plugin (If Needed)
+
+Before writing custom plugins, check if Higress has a built-in plugin that meets your needs!
+
+#### Built-in Plugins (Recommended First)
+
+Higress provides many built-in plugins. Check [references/builtin-plugins.md](references/builtin-plugins.md) for the full list.
+
+Common replacements for nginx features:
+| nginx feature | Higress built-in plugin |
+|---------------|------------------------|
+| Basic Auth snippet | `basic-auth` |
+| IP restriction | `ip-restriction` |
+| Rate limiting | `key-rate-limit`, `cluster-key-rate-limit` |
+| WAF/ModSecurity | `waf` |
+| Request validation | `request-validation` |
+| Bot detection | `bot-detect` |
+| JWT auth | `jwt-auth` |
+| CORS headers | `cors` |
+| Custom response | `custom-response` |
+| Request/Response transform | `transformer` |
+
+#### Common Snippet Replacements
+
+| nginx snippet pattern | Higress solution |
+|----------------------|------------------|
+| Custom health endpoint (`location /health`) | WASM plugin: custom-location |
+| Add response headers | WASM plugin: custom-response-headers |
+| Request validation/blocking | WASM plugin with `OnHttpRequestHeaders` |
+| Lua rate limiting | `key-rate-limit` plugin |
+
+#### Custom WASM Plugin (If No Built-in Matches)
+
+When nginx snippets or Lua logic has no built-in equivalent:
+
+1. **Analyze snippet** - Extract nginx directives/Lua code
+2. **Generate Go WASM code** - Use higress-wasm-go-plugin skill
+3. **Build plugin**:
+```bash
+cd plugin-dir
+go mod tidy
+GOOS=wasip1 GOARCH=wasm go build -buildmode=c-shared -o main.wasm ./
+```
+
+4. **Push to registry**:
+
+If you don't have an image registry, install Harbor:
+```bash
+./scripts/install-harbor.sh
+# Follow the prompts to install Harbor in your cluster
+```
+
+If you have your own registry:
+```bash
+# Build OCI image
+docker build -t <registry>/higress-plugin-<name>:v1 .
+docker push <registry>/higress-plugin-<name>:v1
+```
+
+5. **Deploy plugin**:
+```yaml
+apiVersion: extensions.higress.io/v1alpha1
+kind: WasmPlugin
+metadata:
+  name: custom-plugin
+  namespace: higress-system
+spec:
+  url: oci://<registry>/higress-plugin-<name>:v1
+  phase: UNSPECIFIED_PHASE
+  priority: 100
+```
+
+See [references/plugin-deployment.md](references/plugin-deployment.md) for detailed plugin deployment.
+
+## Common Snippet Conversions
+
+### Header Manipulation
+```nginx
+# nginx snippet
+more_set_headers "X-Custom: value";
+```
+→ Use `headerControl` annotation or generate plugin with `proxywasm.AddHttpResponseHeader()`.
+
+### Request Validation
+```nginx
+# nginx snippet
+if ($request_uri ~* "pattern") { return 403; }
+```
+→ Generate WASM plugin with request header/path check.
+
+### Rate Limiting with Custom Logic
+```nginx
+# nginx snippet with Lua
+access_by_lua_block { ... }
+```
+→ Generate WASM plugin implementing the logic.
+
+See [references/snippet-patterns.md](references/snippet-patterns.md) for common patterns.
+
+## Validation
+
+Before traffic switch, use the generated test script:
+
+```bash
+# Generate test script
+./scripts/generate-migration-test.sh > migration-test.sh
+chmod +x migration-test.sh
+
+# Get Higress gateway IP
+HIGRESS_IP=$(kubectl get svc -n higress-system higress-gateway -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
+
+# Run all tests
+./migration-test.sh ${HIGRESS_IP}
+```
+
+The test script will:
+- Test every host/path combination from all Ingress resources
+- Report pass/fail for each route
+- Provide a summary and next steps
+
+**Only proceed with traffic cutover after all tests pass!**
+
+## Troubleshooting
+
+### Common Issues
+
+#### Q1: Ingress created but routes return 404
+**Symptoms:** Ingress shows Ready, but curl returns 404
+
+**Check:**
+1. Verify IngressClass matches Higress config
+   ```bash
+   kubectl get ingress <name> -o yaml | grep ingressClassName
+   ```
+2. Check controller logs
+   ```bash
+   kubectl logs -n higress-system -l app=higress-controller --tail=100
+   ```
+3. Verify backend service is reachable
+   ```bash
+   kubectl run test --rm -it --image=curlimages/curl -- \
+     curl http://<service>.<namespace>.svc
+   ```
+
+#### Q2: rewrite-target not working
+**Symptoms:** Path not being rewritten, backend receives original path
+
+**Solution:** Ensure `use-regex: "true"` is also set:
+```yaml
+annotations:
+  nginx.ingress.kubernetes.io/rewrite-target: /$2
+  nginx.ingress.kubernetes.io/use-regex: "true"
+```
+
+#### Q3: Snippet annotations silently ignored
+**Symptoms:** nginx snippet features not working after migration
+
+**Cause:** Higress does not support snippet annotations (by design, for security)
+
+**Solution:** 
+- Check [references/builtin-plugins.md](references/builtin-plugins.md) for built-in alternatives
+- Create custom WASM plugin (see Phase 6)
+
+#### Q4: TLS certificate issues
+**Symptoms:** HTTPS not working or certificate errors
+
+**Check:**
+1. Verify Secret exists and is type `kubernetes.io/tls`
+   ```bash
+   kubectl get secret <secret-name> -o yaml
+   ```
+2. Check TLS configuration in Ingress
+   ```bash
+   kubectl get ingress <name> -o jsonpath='{.spec.tls}'
+   ```
+
+### Useful Debug Commands
+
+```bash
+# View Higress controller logs
+kubectl logs -n higress-system -l app=higress-controller -c higress-core
+
+# View gateway access logs
+kubectl logs -n higress-system -l app=higress-gateway | grep "GET\|POST"
+
+# Check Envoy config dump
+kubectl exec -n higress-system deploy/higress-gateway -c istio-proxy -- \
+  curl -s localhost:15000/config_dump | jq '.configs[2].dynamic_listeners'
+
+# View gateway stats
+kubectl exec -n higress-system deploy/higress-gateway -c istio-proxy -- \
+  curl -s localhost:15000/stats | grep http
+```
+
+## Rollback
+
+Since nginx keeps running during migration, rollback is simply switching traffic back:
+
+```bash
+# If traffic was switched via DNS:
+# - Revert DNS records to nginx gateway IP
+
+# If traffic was switched via L4 proxy:
+# - Revert upstream to nginx service IP
+
+# Nginx is still running, no action needed on k8s side
+```
+
+## Post-Migration Cleanup
+
+**Only after traffic has been fully migrated and stable:**
+
+```bash
+# 1. Monitor Higress for a period (recommended: 24-48h)
+
+# 2. Backup nginx resources
+kubectl get all -n ingress-nginx -o yaml > ingress-nginx-backup.yaml
+
+# 3. Scale down nginx (keep for emergency rollback)
+kubectl scale deployment -n ingress-nginx ingress-nginx-controller --replicas=0
+
+# 4. (Optional) After extended stable period, remove nginx
+kubectl delete namespace ingress-nginx
+```
--- a/.claude/skills/nginx-to-higress-migration/references/annotation-mapping.md
+++ b/.claude/skills/nginx-to-higress-migration/references/annotation-mapping.md
@@ -0,0 +1,192 @@
+# Nginx to Higress Annotation Compatibility
+
+## ⚠️ Important: Do NOT Modify Your Ingress Resources!
+
+**Higress natively supports `nginx.ingress.kubernetes.io/*` annotations** - no conversion or modification needed!
+
+The Higress controller uses `ParseStringASAP()` which first tries `nginx.ingress.kubernetes.io/*` prefix, then falls back to `higress.io/*`. Your existing Ingress resources work as-is with Higress.
+
+## Fully Compatible Annotations (Work As-Is)
+
+These nginx annotations work directly with Higress without any changes:
+
+| nginx annotation (keep as-is) | Higress also accepts | Notes |
+|-------------------------------|---------------------|-------|
+| `nginx.ingress.kubernetes.io/rewrite-target` | `higress.io/rewrite-target` | Supports capture groups |
+| `nginx.ingress.kubernetes.io/use-regex` | `higress.io/use-regex` | Enable regex path matching |
+| `nginx.ingress.kubernetes.io/ssl-redirect` | `higress.io/ssl-redirect` | Force HTTPS |
+| `nginx.ingress.kubernetes.io/force-ssl-redirect` | `higress.io/force-ssl-redirect` | Same behavior |
+| `nginx.ingress.kubernetes.io/backend-protocol` | `higress.io/backend-protocol` | HTTP/HTTPS/GRPC |
+| `nginx.ingress.kubernetes.io/proxy-body-size` | `higress.io/proxy-body-size` | Max body size |
+
+### CORS
+
+| nginx annotation | Higress annotation |
+|------------------|-------------------|
+| `nginx.ingress.kubernetes.io/enable-cors` | `higress.io/enable-cors` |
+| `nginx.ingress.kubernetes.io/cors-allow-origin` | `higress.io/cors-allow-origin` |
+| `nginx.ingress.kubernetes.io/cors-allow-methods` | `higress.io/cors-allow-methods` |
+| `nginx.ingress.kubernetes.io/cors-allow-headers` | `higress.io/cors-allow-headers` |
+| `nginx.ingress.kubernetes.io/cors-expose-headers` | `higress.io/cors-expose-headers` |
+| `nginx.ingress.kubernetes.io/cors-allow-credentials` | `higress.io/cors-allow-credentials` |
+| `nginx.ingress.kubernetes.io/cors-max-age` | `higress.io/cors-max-age` |
+
+### Timeout & Retry
+
+| nginx annotation | Higress annotation |
+|------------------|-------------------|
+| `nginx.ingress.kubernetes.io/proxy-connect-timeout` | `higress.io/proxy-connect-timeout` |
+| `nginx.ingress.kubernetes.io/proxy-send-timeout` | `higress.io/proxy-send-timeout` |
+| `nginx.ingress.kubernetes.io/proxy-read-timeout` | `higress.io/proxy-read-timeout` |
+| `nginx.ingress.kubernetes.io/proxy-next-upstream-tries` | `higress.io/proxy-next-upstream-tries` |
+
+### Canary (Grayscale)
+
+| nginx annotation | Higress annotation |
+|------------------|-------------------|
+| `nginx.ingress.kubernetes.io/canary` | `higress.io/canary` |
+| `nginx.ingress.kubernetes.io/canary-weight` | `higress.io/canary-weight` |
+| `nginx.ingress.kubernetes.io/canary-header` | `higress.io/canary-header` |
+| `nginx.ingress.kubernetes.io/canary-header-value` | `higress.io/canary-header-value` |
+| `nginx.ingress.kubernetes.io/canary-header-pattern` | `higress.io/canary-header-pattern` |
+| `nginx.ingress.kubernetes.io/canary-by-cookie` | `higress.io/canary-by-cookie` |
+
+### Authentication
+
+| nginx annotation | Higress annotation |
+|------------------|-------------------|
+| `nginx.ingress.kubernetes.io/auth-type` | `higress.io/auth-type` |
+| `nginx.ingress.kubernetes.io/auth-secret` | `higress.io/auth-secret` |
+| `nginx.ingress.kubernetes.io/auth-realm` | `higress.io/auth-realm` |
+
+### Load Balancing
+
+| nginx annotation | Higress annotation |
+|------------------|-------------------|
+| `nginx.ingress.kubernetes.io/load-balance` | `higress.io/load-balance` |
+| `nginx.ingress.kubernetes.io/upstream-hash-by` | `higress.io/upstream-hash-by` |
+
+### IP Access Control
+
+| nginx annotation | Higress annotation |
+|------------------|-------------------|
+| `nginx.ingress.kubernetes.io/whitelist-source-range` | `higress.io/whitelist-source-range` |
+| `nginx.ingress.kubernetes.io/denylist-source-range` | `higress.io/denylist-source-range` |
+
+### Redirect
+
+| nginx annotation | Higress annotation |
+|------------------|-------------------|
+| `nginx.ingress.kubernetes.io/permanent-redirect` | `higress.io/permanent-redirect` |
+| `nginx.ingress.kubernetes.io/temporal-redirect` | `higress.io/temporal-redirect` |
+| `nginx.ingress.kubernetes.io/permanent-redirect-code` | `higress.io/permanent-redirect-code` |
+
+### Header Control
+
+| nginx annotation | Higress annotation |
+|------------------|-------------------|
+| `nginx.ingress.kubernetes.io/proxy-set-headers` | `higress.io/proxy-set-headers` |
+| `nginx.ingress.kubernetes.io/proxy-hide-headers` | `higress.io/proxy-hide-headers` |
+| `nginx.ingress.kubernetes.io/proxy-pass-headers` | `higress.io/proxy-pass-headers` |
+
+### Upstream TLS
+
+| nginx annotation | Higress annotation |
+|------------------|-------------------|
+| `nginx.ingress.kubernetes.io/proxy-ssl-secret` | `higress.io/proxy-ssl-secret` |
+| `nginx.ingress.kubernetes.io/proxy-ssl-verify` | `higress.io/proxy-ssl-verify` |
+
+### TLS Protocol & Cipher Control
+
+Higress provides fine-grained TLS control via dedicated annotations:
+
+| nginx annotation | Higress annotation | Notes |
+|------------------|-------------------|-------|
+| `nginx.ingress.kubernetes.io/ssl-protocols` | (see below) | Use Higress-specific annotations |
+
+**Higress TLS annotations (no nginx equivalent - use these directly):**
+
+| Higress annotation | Description | Example value |
+|-------------------|-------------|---------------|
+| `higress.io/tls-min-protocol-version` | Minimum TLS version | `TLSv1.2` |
+| `higress.io/tls-max-protocol-version` | Maximum TLS version | `TLSv1.3` |
+| `higress.io/ssl-cipher` | Allowed cipher suites | `ECDHE-RSA-AES128-GCM-SHA256` |
+
+**Example: Restrict to TLS 1.2+**
+```yaml
+# nginx (using ssl-protocols)
+annotations:
+  nginx.ingress.kubernetes.io/ssl-protocols: "TLSv1.2 TLSv1.3"
+
+# Higress (use dedicated annotations)
+annotations:
+  higress.io/tls-min-protocol-version: "TLSv1.2"
+  higress.io/tls-max-protocol-version: "TLSv1.3"
+```
+
+**Example: Custom cipher suites**
+```yaml
+annotations:
+  higress.io/ssl-cipher: "ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384"
+```
+
+## Unsupported Annotations (Require WASM Plugin)
+
+These annotations have no direct Higress equivalent and require custom WASM plugins:
+
+### Configuration Snippets
+```yaml
+# NOT supported - requires WASM plugin
+nginx.ingress.kubernetes.io/server-snippet: |
+  location /custom { ... }
+nginx.ingress.kubernetes.io/configuration-snippet: |
+  more_set_headers "X-Custom: value";
+nginx.ingress.kubernetes.io/stream-snippet: |
+  # TCP/UDP snippets
+```
+
+### Lua Scripting
+```yaml
+# NOT supported - convert to WASM plugin
+nginx.ingress.kubernetes.io/lua-resty-waf: "active"
+nginx.ingress.kubernetes.io/lua-resty-waf-score-threshold: "10"
+```
+
+### ModSecurity
+```yaml
+# NOT supported - use Higress WAF plugin or custom WASM
+nginx.ingress.kubernetes.io/enable-modsecurity: "true"
+nginx.ingress.kubernetes.io/modsecurity-snippet: |
+  SecRule ...
+```
+
+### Rate Limiting (Complex)
+```yaml
+# Basic rate limiting supported via plugin
+# Complex Lua-based rate limiting requires WASM
+nginx.ingress.kubernetes.io/limit-rps: "10"
+nginx.ingress.kubernetes.io/limit-connections: "5"
+```
+
+### Other Unsupported
+```yaml
+# NOT directly supported
+nginx.ingress.kubernetes.io/client-body-buffer-size
+nginx.ingress.kubernetes.io/proxy-buffering
+nginx.ingress.kubernetes.io/proxy-buffers-number
+nginx.ingress.kubernetes.io/proxy-buffer-size
+nginx.ingress.kubernetes.io/mirror-uri
+nginx.ingress.kubernetes.io/mirror-request-body
+nginx.ingress.kubernetes.io/grpc-backend
+nginx.ingress.kubernetes.io/custom-http-errors
+nginx.ingress.kubernetes.io/default-backend
+```
+
+## Migration Script
+
+Use this script to analyze Ingress annotations:
+
+```bash
+# scripts/analyze-ingress.sh in this skill
+./scripts/analyze-ingress.sh <namespace>
+```
--- a/.claude/skills/nginx-to-higress-migration/references/builtin-plugins.md
+++ b/.claude/skills/nginx-to-higress-migration/references/builtin-plugins.md
@@ -0,0 +1,115 @@
+# Higress Built-in Plugins
+
+Before writing custom WASM plugins, check if Higress has a built-in plugin that meets your needs.
+
+**Plugin docs and images**: https://github.com/higress-group/higress-console/tree/main/backend/sdk/src/main/resources/plugins
+
+## Authentication & Authorization
+
+| Plugin | Description | Replaces nginx feature |
+|--------|-------------|----------------------|
+| `basic-auth` | HTTP Basic Authentication | `auth_basic` directive |
+| `jwt-auth` | JWT token validation | JWT Lua scripts |
+| `key-auth` | API Key authentication | Custom auth headers |
+| `hmac-auth` | HMAC signature authentication | Signature validation |
+| `oauth` | OAuth 2.0 authentication | OAuth Lua scripts |
+| `oidc` | OpenID Connect | OIDC integration |
+| `ext-auth` | External authorization service | `auth_request` directive |
+| `opa` | Open Policy Agent integration | Complex auth logic |
+
+## Traffic Control
+
+| Plugin | Description | Replaces nginx feature |
+|--------|-------------|----------------------|
+| `key-rate-limit` | Rate limiting by key | `limit_req` directive |
+| `cluster-key-rate-limit` | Distributed rate limiting | `limit_req` with shared state |
+| `ip-restriction` | IP whitelist/blacklist | `allow`/`deny` directives |
+| `request-block` | Block requests by pattern | `if` + `return 403` |
+| `traffic-tag` | Traffic tagging | Custom headers for routing |
+| `bot-detect` | Bot detection & blocking | Bot detection Lua scripts |
+
+## Request/Response Modification
+
+| Plugin | Description | Replaces nginx feature |
+|--------|-------------|----------------------|
+| `transformer` | Transform request/response | `proxy_set_header`, `more_set_headers` |
+| `cors` | CORS headers | `add_header` CORS headers |
+| `custom-response` | Custom static response | `return` directive |
+| `request-validation` | Request parameter validation | Validation Lua scripts |
+| `de-graphql` | GraphQL to REST conversion | GraphQL handling |
+
+## Security
+
+| Plugin | Description | Replaces nginx feature |
+|--------|-------------|----------------------|
+| `waf` | Web Application Firewall | ModSecurity module |
+| `geo-ip` | GeoIP-based access control | `geoip` module |
+
+## Caching & Performance
+
+| Plugin | Description | Replaces nginx feature |
+|--------|-------------|----------------------|
+| `cache-control` | Cache control headers | `expires`, `add_header Cache-Control` |
+
+## AI Features (Higress-specific)
+
+| Plugin | Description |
+|--------|-------------|
+| `ai-proxy` | AI model proxy |
+| `ai-cache` | AI response caching |
+| `ai-quota` | AI token quota |
+| `ai-token-ratelimit` | AI token rate limiting |
+| `ai-transformer` | AI request/response transform |
+| `ai-security-guard` | AI content security |
+| `ai-statistics` | AI usage statistics |
+| `mcp-server` | Model Context Protocol server |
+
+## Using Built-in Plugins
+
+### Via WasmPlugin CRD
+
+```yaml
+apiVersion: extensions.higress.io/v1alpha1
+kind: WasmPlugin
+metadata:
+  name: basic-auth-plugin
+  namespace: higress-system
+spec:
+  # Use built-in plugin image
+  url: oci://higress-registry.cn-hangzhou.cr.aliyuncs.com/plugins/basic-auth:1.0.0
+  phase: AUTHN
+  priority: 320
+  defaultConfig:
+    consumers:
+    - name: user1
+      credential: "admin:123456"
+```
+
+### Via Higress Console
+
+1. Navigate to **Plugins** → **Plugin Market**
+2. Find the desired plugin
+3. Click **Enable** and configure
+
+## Image Registry Locations
+
+Select the nearest registry based on your location:
+
+| Region | Registry |
+|--------|----------|
+| China/Default | `higress-registry.cn-hangzhou.cr.aliyuncs.com` |
+| North America | `higress-registry.us-west-1.cr.aliyuncs.com` |
+| Southeast Asia | `higress-registry.ap-southeast-7.cr.aliyuncs.com` |
+
+Example with regional registry:
+```yaml
+spec:
+  url: oci://higress-registry.us-west-1.cr.aliyuncs.com/plugins/basic-auth:1.0.0
+```
+
+## Plugin Configuration Reference
+
+Each plugin has its own configuration schema. View the spec.yaml in the plugin directory:
+https://github.com/higress-group/higress-console/tree/main/backend/sdk/src/main/resources/plugins/<plugin-name>/spec.yaml
+
+Or check the README files for detailed documentation.
--- a/.claude/skills/nginx-to-higress-migration/references/plugin-deployment.md
+++ b/.claude/skills/nginx-to-higress-migration/references/plugin-deployment.md
@@ -0,0 +1,245 @@
+# WASM Plugin Build and Deployment
+
+## Plugin Project Structure
+
+```
+my-plugin/
+├── main.go          # Plugin entry point
+├── go.mod           # Go module
+├── go.sum           # Dependencies
+├── Dockerfile       # OCI image build
+└── wasmplugin.yaml  # K8s deployment manifest
+```
+
+## Build Process
+
+### 1. Initialize Project
+
+```bash
+mkdir my-plugin && cd my-plugin
+go mod init my-plugin
+
+# Set proxy (only needed in China due to network restrictions)
+# Skip this step if you're outside China or have direct access to GitHub
+go env -w GOPROXY=https://proxy.golang.com.cn,direct
+
+# Get dependencies
+go get github.com/higress-group/proxy-wasm-go-sdk@go-1.24
+go get github.com/higress-group/wasm-go@main
+go get github.com/tidwall/gjson
+```
+
+### 2. Write Plugin Code
+
+See the higress-wasm-go-plugin skill for detailed API reference. Basic template:
+
+```go
+package main
+
+import (
+    "github.com/higress-group/wasm-go/pkg/wrapper"
+    "github.com/higress-group/proxy-wasm-go-sdk/proxywasm"
+    "github.com/higress-group/proxy-wasm-go-sdk/proxywasm/types"
+    "github.com/tidwall/gjson"
+)
+
+func main() {}
+
+func init() {
+    wrapper.SetCtx(
+        "my-plugin",
+        wrapper.ParseConfig(parseConfig),
+        wrapper.ProcessRequestHeaders(onHttpRequestHeaders),
+    )
+}
+
+type MyConfig struct {
+    // Config fields
+}
+
+func parseConfig(json gjson.Result, config *MyConfig) error {
+    // Parse YAML config (converted to JSON)
+    return nil
+}
+
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    // Process request
+    return types.HeaderContinue
+}
+```
+
+### 3. Compile to WASM
+
+```bash
+go mod tidy
+GOOS=wasip1 GOARCH=wasm go build -buildmode=c-shared -o main.wasm ./
+```
+
+### 4. Create Dockerfile
+
+```dockerfile
+FROM scratch
+COPY main.wasm /plugin.wasm
+```
+
+### 5. Build and Push Image
+
+#### Option A: Use Your Own Registry
+
+```bash
+# User provides registry
+REGISTRY=your-registry.com/higress-plugins
+
+# Build
+docker build -t ${REGISTRY}/my-plugin:v1 .
+
+# Push
+docker push ${REGISTRY}/my-plugin:v1
+```
+
+#### Option B: Install Harbor (If No Registry Available)
+
+If you don't have an image registry, we can install Harbor for you:
+
+```bash
+# Prerequisites
+# - Kubernetes cluster with LoadBalancer or Ingress support
+# - Persistent storage (PVC)
+# - At least 4GB RAM and 2 CPU cores available
+
+# Install Harbor via Helm
+helm repo add harbor https://helm.goharbor.io
+helm repo update
+
+# Install with minimal configuration
+helm install harbor harbor/harbor \
+  --namespace harbor-system --create-namespace \
+  --set expose.type=nodePort \
+  --set expose.tls.enabled=false \
+  --set persistence.enabled=true \
+  --set harborAdminPassword=Harbor12345
+
+# Get Harbor access info
+export NODE_PORT=$(kubectl get svc -n harbor-system harbor-core -o jsonpath='{.spec.ports[0].nodePort}')
+export NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}')
+echo "Harbor URL: http://${NODE_IP}:${NODE_PORT}"
+echo "Username: admin"
+echo "Password: Harbor12345"
+
+# Login to Harbor
+docker login ${NODE_IP}:${NODE_PORT} -u admin -p Harbor12345
+
+# Create project in Harbor UI (http://${NODE_IP}:${NODE_PORT})
+# - Project Name: higress-plugins
+# - Access Level: Public
+
+# Build and push plugin
+docker build -t ${NODE_IP}:${NODE_PORT}/higress-plugins/my-plugin:v1 .
+docker push ${NODE_IP}:${NODE_PORT}/higress-plugins/my-plugin:v1
+```
+
+**Note**: For production use, enable TLS and use proper persistent storage.
+
+## Deployment
+
+### WasmPlugin CRD
+
+```yaml
+apiVersion: extensions.higress.io/v1alpha1
+kind: WasmPlugin
+metadata:
+  name: my-plugin
+  namespace: higress-system
+spec:
+  # OCI image URL
+  url: oci://your-registry.com/higress-plugins/my-plugin:v1
+  
+  # Plugin phase (when to execute)
+  # UNSPECIFIED_PHASE | AUTHN | AUTHZ | STATS
+  phase: UNSPECIFIED_PHASE
+  
+  # Priority (higher = earlier execution)
+  priority: 100
+  
+  # Plugin configuration
+  defaultConfig:
+    key: value
+  
+  # Optional: specific routes/domains
+  matchRules:
+  - domain:
+    - "*.example.com"
+    config:
+      key: domain-specific-value
+  - ingress:
+    - default/my-ingress
+    config:
+      key: ingress-specific-value
+```
+
+### Apply to Cluster
+
+```bash
+kubectl apply -f wasmplugin.yaml
+```
+
+### Verify Deployment
+
+```bash
+# Check plugin status
+kubectl get wasmplugin -n higress-system
+
+# Check gateway logs
+kubectl logs -n higress-system -l app=higress-gateway | grep -i plugin
+
+# Test endpoint
+curl -v http://<gateway-ip>/test-path
+```
+
+## Troubleshooting
+
+### Plugin Not Loading
+
+```bash
+# Check image accessibility
+kubectl run test --rm -it --image=your-registry.com/higress-plugins/my-plugin:v1 -- ls
+
+# Check gateway events
+kubectl describe pod -n higress-system -l app=higress-gateway
+```
+
+### Plugin Errors
+
+```bash
+# Enable debug logging
+kubectl set env deployment/higress-gateway -n higress-system LOG_LEVEL=debug
+
+# View plugin logs
+kubectl logs -n higress-system -l app=higress-gateway -f
+```
+
+### Image Pull Issues
+
+```bash
+# Create image pull secret if needed
+kubectl create secret docker-registry regcred \
+  --docker-server=your-registry.com \
+  --docker-username=user \
+  --docker-password=pass \
+  -n higress-system
+
+# Reference in WasmPlugin
+spec:
+  imagePullSecrets:
+  - name: regcred
+```
+
+## Plugin Configuration via Console
+
+If using Higress Console:
+
+1. Navigate to **Plugins** → **Custom Plugins**
+2. Click **Add Plugin**
+3. Enter OCI URL: `oci://your-registry.com/higress-plugins/my-plugin:v1`
+4. Configure plugin settings
+5. Apply to routes/domains as needed
--- a/.claude/skills/nginx-to-higress-migration/references/snippet-patterns.md
+++ b/.claude/skills/nginx-to-higress-migration/references/snippet-patterns.md
@@ -0,0 +1,331 @@
+# Common Nginx Snippet to WASM Plugin Patterns
+
+## Header Manipulation
+
+### Add Response Header
+
+**Nginx snippet:**
+```nginx
+more_set_headers "X-Custom-Header: custom-value";
+more_set_headers "X-Request-ID: $request_id";
+```
+
+**WASM plugin:**
+```go
+func onHttpResponseHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    proxywasm.AddHttpResponseHeader("X-Custom-Header", "custom-value")
+    
+    // For request ID, get from request context
+    if reqId, err := proxywasm.GetHttpRequestHeader("x-request-id"); err == nil {
+        proxywasm.AddHttpResponseHeader("X-Request-ID", reqId)
+    }
+    return types.HeaderContinue
+}
+```
+
+### Remove Headers
+
+**Nginx snippet:**
+```nginx
+more_clear_headers "Server";
+more_clear_headers "X-Powered-By";
+```
+
+**WASM plugin:**
+```go
+func onHttpResponseHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    proxywasm.RemoveHttpResponseHeader("Server")
+    proxywasm.RemoveHttpResponseHeader("X-Powered-By")
+    return types.HeaderContinue
+}
+```
+
+### Conditional Header
+
+**Nginx snippet:**
+```nginx
+if ($http_x_custom_flag = "enabled") {
+    more_set_headers "X-Feature: active";
+}
+```
+
+**WASM plugin:**
+```go
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    flag, _ := proxywasm.GetHttpRequestHeader("x-custom-flag")
+    if flag == "enabled" {
+        proxywasm.AddHttpRequestHeader("X-Feature", "active")
+    }
+    return types.HeaderContinue
+}
+```
+
+## Request Validation
+
+### Block by Path Pattern
+
+**Nginx snippet:**
+```nginx
+if ($request_uri ~* "(\.php|\.asp|\.aspx)$") {
+    return 403;
+}
+```
+
+**WASM plugin:**
+```go
+import "regexp"
+
+type MyConfig struct {
+    BlockPattern *regexp.Regexp
+}
+
+func parseConfig(json gjson.Result, config *MyConfig) error {
+    pattern := json.Get("blockPattern").String()
+    if pattern == "" {
+        pattern = `\.(php|asp|aspx)$`
+    }
+    config.BlockPattern = regexp.MustCompile(pattern)
+    return nil
+}
+
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    path := ctx.Path()
+    if config.BlockPattern.MatchString(path) {
+        proxywasm.SendHttpResponse(403, nil, []byte("Forbidden"), -1)
+        return types.HeaderStopAllIterationAndWatermark
+    }
+    return types.HeaderContinue
+}
+```
+
+### Block by User Agent
+
+**Nginx snippet:**
+```nginx
+if ($http_user_agent ~* "(bot|crawler|spider)") {
+    return 403;
+}
+```
+
+**WASM plugin:**
+```go
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    ua, _ := proxywasm.GetHttpRequestHeader("user-agent")
+    ua = strings.ToLower(ua)
+    
+    blockedPatterns := []string{"bot", "crawler", "spider"}
+    for _, pattern := range blockedPatterns {
+        if strings.Contains(ua, pattern) {
+            proxywasm.SendHttpResponse(403, nil, []byte("Blocked"), -1)
+            return types.HeaderStopAllIterationAndWatermark
+        }
+    }
+    return types.HeaderContinue
+}
+```
+
+### Request Size Validation
+
+**Nginx snippet:**
+```nginx
+if ($content_length > 10485760) {
+    return 413;
+}
+```
+
+**WASM plugin:**
+```go
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    clStr, _ := proxywasm.GetHttpRequestHeader("content-length")
+    if cl, err := strconv.ParseInt(clStr, 10, 64); err == nil {
+        if cl > 10*1024*1024 { // 10MB
+            proxywasm.SendHttpResponse(413, nil, []byte("Request too large"), -1)
+            return types.HeaderStopAllIterationAndWatermark
+        }
+    }
+    return types.HeaderContinue
+}
+```
+
+## Request Modification
+
+### URL Rewrite with Logic
+
+**Nginx snippet:**
+```nginx
+set $backend "default";
+if ($http_x_version = "v2") {
+    set $backend "v2";
+}
+rewrite ^/api/(.*)$ /api/$backend/$1 break;
+```
+
+**WASM plugin:**
+```go
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    version, _ := proxywasm.GetHttpRequestHeader("x-version")
+    backend := "default"
+    if version == "v2" {
+        backend = "v2"
+    }
+    
+    path := ctx.Path()
+    if strings.HasPrefix(path, "/api/") {
+        newPath := "/api/" + backend + path[4:]
+        proxywasm.ReplaceHttpRequestHeader(":path", newPath)
+    }
+    return types.HeaderContinue
+}
+```
+
+### Add Query Parameter
+
+**Nginx snippet:**
+```nginx
+if ($args !~ "source=") {
+    set $args "${args}&source=gateway";
+}
+```
+
+**WASM plugin:**
+```go
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    path := ctx.Path()
+    if !strings.Contains(path, "source=") {
+        separator := "?"
+        if strings.Contains(path, "?") {
+            separator = "&"
+        }
+        newPath := path + separator + "source=gateway"
+        proxywasm.ReplaceHttpRequestHeader(":path", newPath)
+    }
+    return types.HeaderContinue
+}
+```
+
+## Lua Script Conversion
+
+### Simple Lua Access Check
+
+**Nginx Lua:**
+```lua
+access_by_lua_block {
+    local token = ngx.var.http_authorization
+    if not token or token == "" then
+        ngx.exit(401)
+    end
+}
+```
+
+**WASM plugin:**
+```go
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    token, _ := proxywasm.GetHttpRequestHeader("authorization")
+    if token == "" {
+        proxywasm.SendHttpResponse(401, [][2]string{
+            {"WWW-Authenticate", "Bearer"},
+        }, []byte("Unauthorized"), -1)
+        return types.HeaderStopAllIterationAndWatermark
+    }
+    return types.HeaderContinue
+}
+```
+
+### Lua with Redis
+
+**Nginx Lua:**
+```lua
+access_by_lua_block {
+    local redis = require "resty.redis"
+    local red = redis:new()
+    red:connect("127.0.0.1", 6379)
+    
+    local ip = ngx.var.remote_addr
+    local count = red:incr("rate:" .. ip)
+    if count > 100 then
+        ngx.exit(429)
+    end
+    red:expire("rate:" .. ip, 60)
+}
+```
+
+**WASM plugin:**
+```go
+// See references/redis-client.md in higress-wasm-go-plugin skill
+func parseConfig(json gjson.Result, config *MyConfig) error {
+    config.redis = wrapper.NewRedisClusterClient(wrapper.FQDNCluster{
+        FQDN: json.Get("redisService").String(),
+        Port: json.Get("redisPort").Int(),
+    })
+    return config.redis.Init("", json.Get("redisPassword").String(), 1000)
+}
+
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    ip, _ := proxywasm.GetHttpRequestHeader("x-real-ip")
+    if ip == "" {
+        ip, _ = proxywasm.GetHttpRequestHeader("x-forwarded-for")
+    }
+    
+    key := "rate:" + ip
+    err := config.redis.Incr(key, func(val int) {
+        if val > 100 {
+            proxywasm.SendHttpResponse(429, nil, []byte("Rate limited"), -1)
+            return
+        }
+        config.redis.Expire(key, 60, nil)
+        proxywasm.ResumeHttpRequest()
+    })
+    
+    if err != nil {
+        return types.HeaderContinue // Fallback on Redis error
+    }
+    return types.HeaderStopAllIterationAndWatermark
+}
+```
+
+## Response Modification
+
+### Inject Script/Content
+
+**Nginx snippet:**
+```nginx
+sub_filter '</head>' '<script src="/tracking.js"></script></head>';
+sub_filter_once on;
+```
+
+**WASM plugin:**
+```go
+func init() {
+    wrapper.SetCtx(
+        "inject-script",
+        wrapper.ParseConfig(parseConfig),
+        wrapper.ProcessResponseHeaders(onHttpResponseHeaders),
+        wrapper.ProcessResponseBody(onHttpResponseBody),
+    )
+}
+
+func onHttpResponseHeaders(ctx wrapper.HttpContext, config MyConfig) types.Action {
+    contentType, _ := proxywasm.GetHttpResponseHeader("content-type")
+    if strings.Contains(contentType, "text/html") {
+        ctx.BufferResponseBody()
+        proxywasm.RemoveHttpResponseHeader("content-length")
+    }
+    return types.HeaderContinue
+}
+
+func onHttpResponseBody(ctx wrapper.HttpContext, config MyConfig, body []byte) types.Action {
+    bodyStr := string(body)
+    injection := `<script src="/tracking.js"></script></head>`
+    newBody := strings.Replace(bodyStr, "</head>", injection, 1)
+    proxywasm.ReplaceHttpResponseBody([]byte(newBody))
+    return types.BodyContinue
+}
+```
+
+## Best Practices
+
+1. **Error Handling**: Always handle external call failures gracefully
+2. **Performance**: Cache regex patterns in config, avoid recompiling
+3. **Timeout**: Set appropriate timeouts for external calls (default 500ms)
+4. **Logging**: Use `proxywasm.LogInfo/Warn/Error` for debugging
+5. **Testing**: Test locally with Docker Compose before deploying
--- a/.claude/skills/nginx-to-higress-migration/scripts/analyze-ingress.sh
+++ b/.claude/skills/nginx-to-higress-migration/scripts/analyze-ingress.sh
@@ -0,0 +1,198 @@
+#!/bin/bash
+# Analyze nginx Ingress resources and identify migration requirements
+
+set -e
+
+NAMESPACE="${1:-}"
+OUTPUT_FORMAT="${2:-text}"
+
+# Colors
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+BLUE='\033[0;34m'
+NC='\033[0m'
+
+# Supported nginx annotations that map to Higress
+SUPPORTED_ANNOTATIONS=(
+    "rewrite-target"
+    "use-regex"
+    "ssl-redirect"
+    "force-ssl-redirect"
+    "backend-protocol"
+    "proxy-body-size"
+    "enable-cors"
+    "cors-allow-origin"
+    "cors-allow-methods"
+    "cors-allow-headers"
+    "cors-expose-headers"
+    "cors-allow-credentials"
+    "cors-max-age"
+    "proxy-connect-timeout"
+    "proxy-send-timeout"
+    "proxy-read-timeout"
+    "proxy-next-upstream-tries"
+    "canary"
+    "canary-weight"
+    "canary-header"
+    "canary-header-value"
+    "canary-header-pattern"
+    "canary-by-cookie"
+    "auth-type"
+    "auth-secret"
+    "auth-realm"
+    "load-balance"
+    "upstream-hash-by"
+    "whitelist-source-range"
+    "denylist-source-range"
+    "permanent-redirect"
+    "temporal-redirect"
+    "permanent-redirect-code"
+    "proxy-set-headers"
+    "proxy-hide-headers"
+    "proxy-pass-headers"
+    "proxy-ssl-secret"
+    "proxy-ssl-verify"
+)
+
+# Unsupported annotations requiring WASM plugins
+UNSUPPORTED_ANNOTATIONS=(
+    "server-snippet"
+    "configuration-snippet"
+    "stream-snippet"
+    "lua-resty-waf"
+    "lua-resty-waf-score-threshold"
+    "enable-modsecurity"
+    "modsecurity-snippet"
+    "limit-rps"
+    "limit-connections"
+    "limit-rate"
+    "limit-rate-after"
+    "client-body-buffer-size"
+    "proxy-buffering"
+    "proxy-buffers-number"
+    "proxy-buffer-size"
+    "custom-http-errors"
+    "default-backend"
+)
+
+echo -e "${BLUE}========================================${NC}"
+echo -e "${BLUE}Nginx to Higress Migration Analysis${NC}"
+echo -e "${BLUE}========================================${NC}"
+echo ""
+
+# Check for ingress-nginx
+echo -e "${YELLOW}Checking for ingress-nginx...${NC}"
+if kubectl get pods -A 2>/dev/null | grep -q ingress-nginx; then
+    echo -e "${GREEN}✓ ingress-nginx found${NC}"
+    kubectl get pods -A | grep ingress-nginx | head -5
+else
+    echo -e "${RED}✗ ingress-nginx not found${NC}"
+fi
+echo ""
+
+# Check IngressClass
+echo -e "${YELLOW}IngressClass resources:${NC}"
+kubectl get ingressclass 2>/dev/null || echo "No IngressClass resources found"
+echo ""
+
+# Get Ingress resources
+if [ -n "$NAMESPACE" ]; then
+    INGRESS_LIST=$(kubectl get ingress -n "$NAMESPACE" -o json 2>/dev/null)
+else
+    INGRESS_LIST=$(kubectl get ingress -A -o json 2>/dev/null)
+fi
+
+if [ -z "$INGRESS_LIST" ] || [ "$(echo "$INGRESS_LIST" | jq '.items | length')" -eq 0 ]; then
+    echo -e "${RED}No Ingress resources found${NC}"
+    exit 0
+fi
+
+TOTAL_INGRESS=$(echo "$INGRESS_LIST" | jq '.items | length')
+echo -e "${YELLOW}Found ${TOTAL_INGRESS} Ingress resources${NC}"
+echo ""
+
+# Analyze each Ingress
+COMPATIBLE_COUNT=0
+NEEDS_PLUGIN_COUNT=0
+UNSUPPORTED_FOUND=()
+
+echo "$INGRESS_LIST" | jq -c '.items[]' | while read -r ingress; do
+    NAME=$(echo "$ingress" | jq -r '.metadata.name')
+    NS=$(echo "$ingress" | jq -r '.metadata.namespace')
+    INGRESS_CLASS=$(echo "$ingress" | jq -r '.spec.ingressClassName // .metadata.annotations["kubernetes.io/ingress.class"] // "unknown"')
+    
+    # Skip non-nginx ingresses
+    if [[ "$INGRESS_CLASS" != "nginx" && "$INGRESS_CLASS" != "unknown" ]]; then
+        continue
+    fi
+    
+    echo -e "${BLUE}-------------------------------------------${NC}"
+    echo -e "${BLUE}Ingress: ${NS}/${NAME}${NC}"
+    echo -e "IngressClass: ${INGRESS_CLASS}"
+    
+    # Get annotations
+    ANNOTATIONS=$(echo "$ingress" | jq -r '.metadata.annotations // {}')
+    
+    HAS_UNSUPPORTED=false
+    SUPPORTED_LIST=()
+    UNSUPPORTED_LIST=()
+    
+    # Check each annotation
+    echo "$ANNOTATIONS" | jq -r 'keys[]' | while read -r key; do
+        # Extract annotation name (remove prefix)
+        ANNO_NAME=$(echo "$key" | sed 's/nginx.ingress.kubernetes.io\///' | sed 's/higress.io\///')
+        
+        if [[ "$key" == nginx.ingress.kubernetes.io/* ]]; then
+            # Check if supported
+            IS_SUPPORTED=false
+            for supported in "${SUPPORTED_ANNOTATIONS[@]}"; do
+                if [[ "$ANNO_NAME" == "$supported" ]]; then
+                    IS_SUPPORTED=true
+                    break
+                fi
+            done
+            
+            # Check if explicitly unsupported
+            for unsupported in "${UNSUPPORTED_ANNOTATIONS[@]}"; do
+                if [[ "$ANNO_NAME" == "$unsupported" ]]; then
+                    IS_SUPPORTED=false
+                    HAS_UNSUPPORTED=true
+                    VALUE=$(echo "$ANNOTATIONS" | jq -r --arg k "$key" '.[$k]')
+                    echo -e "  ${RED}✗ $ANNO_NAME${NC} (requires WASM plugin)"
+                    if [[ "$ANNO_NAME" == *"snippet"* ]]; then
+                        echo -e "    Value preview: $(echo "$VALUE" | head -1)"
+                    fi
+                    break
+                fi
+            done
+            
+            if [ "$IS_SUPPORTED" = true ]; then
+                echo -e "  ${GREEN}✓ $ANNO_NAME${NC}"
+            fi
+        fi
+    done
+    
+    if [ "$HAS_UNSUPPORTED" = true ]; then
+        echo -e "\n  ${YELLOW}Status: Requires WASM plugin for full compatibility${NC}"
+    else
+        echo -e "\n  ${GREEN}Status: Fully compatible${NC}"
+    fi
+    echo ""
+done
+
+echo -e "${BLUE}========================================${NC}"
+echo -e "${BLUE}Summary${NC}"
+echo -e "${BLUE}========================================${NC}"
+echo -e "Total Ingress resources: ${TOTAL_INGRESS}"
+echo ""
+echo -e "${GREEN}✓ No Ingress modification needed!${NC}"
+echo "  Higress natively supports nginx.ingress.kubernetes.io/* annotations."
+echo ""
+echo -e "${YELLOW}Next Steps:${NC}"
+echo "1. Install Higress with the SAME ingressClass as nginx"
+echo "   (set global.enableStatus=false to disable Ingress status updates)"
+echo "2. For snippets/Lua: check Higress built-in plugins first, then generate custom WASM if needed"
+echo "3. Generate and run migration test script"
+echo "4. Switch traffic via DNS or L4 proxy after tests pass"
+echo "5. After stable period, uninstall nginx and enable status updates (global.enableStatus=true)"
--- a/.claude/skills/nginx-to-higress-migration/scripts/generate-migration-test.sh
+++ b/.claude/skills/nginx-to-higress-migration/scripts/generate-migration-test.sh
@@ -0,0 +1,210 @@
+#!/bin/bash
+# Generate test script for all Ingress routes
+# Tests each route against Higress gateway to validate migration
+
+set -e
+
+NAMESPACE="${1:-}"
+
+# Colors for output script
+cat << 'HEADER'
+#!/bin/bash
+# Higress Migration Test Script
+# Auto-generated - tests all Ingress routes against Higress gateway
+
+set -e
+
+GATEWAY_IP="${1:-}"
+TIMEOUT="${2:-5}"
+VERBOSE="${3:-false}"
+
+if [ -z "$GATEWAY_IP" ]; then
+    echo "Usage: $0 <higress-gateway-ip[:port]> [timeout] [verbose]"
+    echo ""
+    echo "Examples:"
+    echo "  # With LoadBalancer IP"
+    echo "  $0 10.0.0.100 5 true"
+    echo ""
+    echo "  # With port-forward (run this first: kubectl port-forward -n higress-system svc/higress-gateway 8080:80 &)"
+    echo "  $0 127.0.0.1:8080 5 true"
+    exit 1
+fi
+
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+NC='\033[0m'
+
+TOTAL=0
+PASSED=0
+FAILED=0
+FAILED_TESTS=()
+
+test_route() {
+    local host="$1"
+    local path="$2"
+    local expected_code="${3:-200}"
+    local description="$4"
+    
+    TOTAL=$((TOTAL + 1))
+    
+    # Build URL
+    local url="http://${GATEWAY_IP}${path}"
+    
+    # Make request
+    local response
+    response=$(curl -s -o /dev/null -w "%{http_code}" \
+        -H "Host: ${host}" \
+        --connect-timeout "${TIMEOUT}" \
+        --max-time $((TIMEOUT * 2)) \
+        "${url}" 2>/dev/null) || response="000"
+    
+    # Check result
+    if [ "$response" = "$expected_code" ] || [ "$expected_code" = "*" ]; then
+        PASSED=$((PASSED + 1))
+        echo -e "${GREEN}✓${NC} [${response}] ${host}${path}"
+        if [ "$VERBOSE" = "true" ]; then
+            echo "  Expected: ${expected_code}, Got: ${response}"
+        fi
+    else
+        FAILED=$((FAILED + 1))
+        FAILED_TESTS+=("${host}${path} (expected ${expected_code}, got ${response})")
+        echo -e "${RED}✗${NC} [${response}] ${host}${path}"
+        echo "  Expected: ${expected_code}, Got: ${response}"
+    fi
+}
+
+echo "========================================"
+echo "Higress Migration Test"
+echo "========================================"
+echo "Gateway IP: ${GATEWAY_IP}"
+echo "Timeout: ${TIMEOUT}s"
+echo ""
+echo "Testing routes..."
+echo ""
+
+HEADER
+
+# Get Ingress resources
+if [ -n "$NAMESPACE" ]; then
+    INGRESS_JSON=$(kubectl get ingress -n "$NAMESPACE" -o json 2>/dev/null)
+else
+    INGRESS_JSON=$(kubectl get ingress -A -o json 2>/dev/null)
+fi
+
+if [ -z "$INGRESS_JSON" ] || [ "$(echo "$INGRESS_JSON" | jq '.items | length')" -eq 0 ]; then
+    echo "# No Ingress resources found"
+    echo "echo 'No Ingress resources found to test'"
+    echo "exit 0"
+    exit 0
+fi
+
+# Generate test cases for each Ingress
+echo "$INGRESS_JSON" | jq -c '.items[]' | while read -r ingress; do
+    NAME=$(echo "$ingress" | jq -r '.metadata.name')
+    NS=$(echo "$ingress" | jq -r '.metadata.namespace')
+    
+    echo ""
+    echo "# ================================================"
+    echo "# Ingress: ${NS}/${NAME}"
+    echo "# ================================================"
+    
+    # Check for TLS hosts
+    TLS_HOSTS=$(echo "$ingress" | jq -r '.spec.tls[]?.hosts[]?' 2>/dev/null | sort -u)
+    
+    # Process each rule
+    echo "$ingress" | jq -c '.spec.rules[]?' | while read -r rule; do
+        HOST=$(echo "$rule" | jq -r '.host // "*"')
+        
+        # Process each path
+        echo "$rule" | jq -c '.http.paths[]?' | while read -r path_item; do
+            PATH=$(echo "$path_item" | jq -r '.path // "/"')
+            PATH_TYPE=$(echo "$path_item" | jq -r '.pathType // "Prefix"')
+            SERVICE=$(echo "$path_item" | jq -r '.backend.service.name // .backend.serviceName // "unknown"')
+            PORT=$(echo "$path_item" | jq -r '.backend.service.port.number // .backend.service.port.name // .backend.servicePort // "80"')
+            
+            # Generate test
+            # For Prefix paths, test the exact path
+            # For Exact paths, test exactly
+            # Add a simple 200 or * expectation (can be customized)
+            
+            echo ""
+            echo "# Path: ${PATH} (${PATH_TYPE}) -> ${SERVICE}:${PORT}"
+            
+            # Test the path
+            if [ "$PATH_TYPE" = "Exact" ]; then
+                echo "test_route \"${HOST}\" \"${PATH}\" \"*\" \"Exact path\""
+            else
+                # For Prefix, test base path and a subpath
+                echo "test_route \"${HOST}\" \"${PATH}\" \"*\" \"Prefix path\""
+                
+                # If path doesn't end with /, add a subpath test
+                if [[ ! "$PATH" =~ /$ ]] && [ "$PATH" != "/" ]; then
+                    echo "test_route \"${HOST}\" \"${PATH}/\" \"*\" \"Prefix path with trailing slash\""
+                fi
+            fi
+        done
+    done
+    
+    # Check for specific annotations that might need special testing
+    REWRITE=$(echo "$ingress" | jq -r '.metadata.annotations["nginx.ingress.kubernetes.io/rewrite-target"] // .metadata.annotations["higress.io/rewrite-target"] // ""')
+    if [ -n "$REWRITE" ] && [ "$REWRITE" != "null" ]; then
+        echo ""
+        echo "# Note: This Ingress has rewrite-target: ${REWRITE}"
+        echo "# Verify the rewritten path manually if needed"
+    fi
+    
+    CANARY=$(echo "$ingress" | jq -r '.metadata.annotations["nginx.ingress.kubernetes.io/canary"] // .metadata.annotations["higress.io/canary"] // ""')
+    if [ "$CANARY" = "true" ]; then
+        echo ""
+        echo "# Note: This is a canary Ingress - test with appropriate headers/cookies"
+        CANARY_HEADER=$(echo "$ingress" | jq -r '.metadata.annotations["nginx.ingress.kubernetes.io/canary-header"] // .metadata.annotations["higress.io/canary-header"] // ""')
+        CANARY_VALUE=$(echo "$ingress" | jq -r '.metadata.annotations["nginx.ingress.kubernetes.io/canary-header-value"] // .metadata.annotations["higress.io/canary-header-value"] // ""')
+        if [ -n "$CANARY_HEADER" ] && [ "$CANARY_HEADER" != "null" ]; then
+            echo "# Canary header: ${CANARY_HEADER}=${CANARY_VALUE}"
+        fi
+    fi
+done
+
+# Generate summary section
+cat << 'FOOTER'
+
+# ================================================
+# Summary
+# ================================================
+echo ""
+echo "========================================"
+echo "Test Summary"
+echo "========================================"
+echo -e "Total:  ${TOTAL}"
+echo -e "Passed: ${GREEN}${PASSED}${NC}"
+echo -e "Failed: ${RED}${FAILED}${NC}"
+echo ""
+
+if [ ${FAILED} -gt 0 ]; then
+    echo -e "${YELLOW}Failed tests:${NC}"
+    for test in "${FAILED_TESTS[@]}"; do
+        echo -e "  ${RED}•${NC} $test"
+    done
+    echo ""
+    echo -e "${YELLOW}⚠ Some tests failed. Please investigate before switching traffic.${NC}"
+    exit 1
+else
+    echo -e "${GREEN}✓ All tests passed!${NC}"
+    echo ""
+    echo "========================================"
+    echo -e "${GREEN}Ready for Traffic Cutover${NC}"
+    echo "========================================"
+    echo ""
+    echo "Next steps:"
+    echo "1. Switch traffic to Higress gateway:"
+    echo "   - DNS: Update A/CNAME records to ${GATEWAY_IP}"
+    echo "   - L4 Proxy: Update upstream to ${GATEWAY_IP}"
+    echo ""
+    echo "2. Monitor for errors after switch"
+    echo ""
+    echo "3. Once stable, scale down nginx:"
+    echo "   kubectl scale deployment -n ingress-nginx ingress-nginx-controller --replicas=0"
+    echo ""
+fi
+FOOTER
--- a/.claude/skills/nginx-to-higress-migration/scripts/generate-plugin-scaffold.sh
+++ b/.claude/skills/nginx-to-higress-migration/scripts/generate-plugin-scaffold.sh
@@ -0,0 +1,261 @@
+#!/bin/bash
+# Generate WASM plugin scaffold for nginx snippet migration
+
+set -e
+
+if [ "$#" -lt 1 ]; then
+    echo "Usage: $0 <plugin-name> [output-dir]"
+    echo ""
+    echo "Example: $0 custom-headers ./plugins"
+    exit 1
+fi
+
+PLUGIN_NAME="$1"
+OUTPUT_DIR="${2:-.}"
+PLUGIN_DIR="${OUTPUT_DIR}/${PLUGIN_NAME}"
+
+# Colors
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+NC='\033[0m'
+
+echo -e "${YELLOW}Generating WASM plugin scaffold: ${PLUGIN_NAME}${NC}"
+
+# Create directory
+mkdir -p "$PLUGIN_DIR"
+
+# Generate go.mod
+cat > "${PLUGIN_DIR}/go.mod" << EOF
+module ${PLUGIN_NAME}
+
+go 1.24
+
+require (
+	github.com/higress-group/proxy-wasm-go-sdk v1.0.1-0.20241230091623-edc7227eb588
+	github.com/higress-group/wasm-go v1.0.1-0.20250107151137-19a0ab53cfec
+	github.com/tidwall/gjson v1.18.0
+)
+EOF
+
+# Generate main.go
+cat > "${PLUGIN_DIR}/main.go" << 'EOF'
+package main
+
+import (
+	"github.com/higress-group/proxy-wasm-go-sdk/proxywasm"
+	"github.com/higress-group/proxy-wasm-go-sdk/proxywasm/types"
+	"github.com/higress-group/wasm-go/pkg/wrapper"
+	"github.com/tidwall/gjson"
+)
+
+func main() {}
+
+func init() {
+	wrapper.SetCtx(
+		"PLUGIN_NAME_PLACEHOLDER",
+		wrapper.ParseConfig(parseConfig),
+		wrapper.ProcessRequestHeaders(onHttpRequestHeaders),
+		wrapper.ProcessRequestBody(onHttpRequestBody),
+		wrapper.ProcessResponseHeaders(onHttpResponseHeaders),
+		wrapper.ProcessResponseBody(onHttpResponseBody),
+	)
+}
+
+// PluginConfig holds the plugin configuration
+type PluginConfig struct {
+	// TODO: Add configuration fields
+	// Example:
+	// HeaderName  string
+	// HeaderValue string
+	Enabled bool
+}
+
+// parseConfig parses the plugin configuration from YAML (converted to JSON)
+func parseConfig(json gjson.Result, config *PluginConfig) error {
+	// TODO: Parse configuration
+	// Example:
+	// config.HeaderName = json.Get("headerName").String()
+	// config.HeaderValue = json.Get("headerValue").String()
+	config.Enabled = json.Get("enabled").Bool()
+	
+	proxywasm.LogInfof("Plugin config loaded: enabled=%v", config.Enabled)
+	return nil
+}
+
+// onHttpRequestHeaders is called when request headers are received
+func onHttpRequestHeaders(ctx wrapper.HttpContext, config PluginConfig) types.Action {
+	if !config.Enabled {
+		return types.HeaderContinue
+	}
+
+	// TODO: Implement request header processing
+	// Example: Add custom header
+	// proxywasm.AddHttpRequestHeader(config.HeaderName, config.HeaderValue)
+	
+	// Example: Check path and block
+	// path := ctx.Path()
+	// if strings.Contains(path, "/blocked") {
+	//     proxywasm.SendHttpResponse(403, nil, []byte("Forbidden"), -1)
+	//     return types.HeaderStopAllIterationAndWatermark
+	// }
+
+	return types.HeaderContinue
+}
+
+// onHttpRequestBody is called when request body is received
+// Remove this function from init() if not needed
+func onHttpRequestBody(ctx wrapper.HttpContext, config PluginConfig, body []byte) types.Action {
+	if !config.Enabled {
+		return types.BodyContinue
+	}
+
+	// TODO: Implement request body processing
+	// Example: Log body size
+	// proxywasm.LogInfof("Request body size: %d", len(body))
+
+	return types.BodyContinue
+}
+
+// onHttpResponseHeaders is called when response headers are received
+func onHttpResponseHeaders(ctx wrapper.HttpContext, config PluginConfig) types.Action {
+	if !config.Enabled {
+		return types.HeaderContinue
+	}
+
+	// TODO: Implement response header processing
+	// Example: Add security headers
+	// proxywasm.AddHttpResponseHeader("X-Content-Type-Options", "nosniff")
+	// proxywasm.AddHttpResponseHeader("X-Frame-Options", "DENY")
+
+	return types.HeaderContinue
+}
+
+// onHttpResponseBody is called when response body is received
+// Remove this function from init() if not needed
+func onHttpResponseBody(ctx wrapper.HttpContext, config PluginConfig, body []byte) types.Action {
+	if !config.Enabled {
+		return types.BodyContinue
+	}
+
+	// TODO: Implement response body processing
+	// Example: Modify response body
+	// newBody := strings.Replace(string(body), "old", "new", -1)
+	// proxywasm.ReplaceHttpResponseBody([]byte(newBody))
+
+	return types.BodyContinue
+}
+EOF
+
+# Replace plugin name placeholder
+sed -i "s/PLUGIN_NAME_PLACEHOLDER/${PLUGIN_NAME}/g" "${PLUGIN_DIR}/main.go"
+
+# Generate Dockerfile
+cat > "${PLUGIN_DIR}/Dockerfile" << 'EOF'
+FROM scratch
+COPY main.wasm /plugin.wasm
+EOF
+
+# Generate build script
+cat > "${PLUGIN_DIR}/build.sh" << 'EOF'
+#!/bin/bash
+set -e
+
+echo "Downloading dependencies..."
+go mod tidy
+
+echo "Building WASM plugin..."
+GOOS=wasip1 GOARCH=wasm go build -buildmode=c-shared -o main.wasm ./
+
+echo "Build complete: main.wasm"
+ls -lh main.wasm
+EOF
+chmod +x "${PLUGIN_DIR}/build.sh"
+
+# Generate WasmPlugin manifest
+cat > "${PLUGIN_DIR}/wasmplugin.yaml" << EOF
+apiVersion: extensions.higress.io/v1alpha1
+kind: WasmPlugin
+metadata:
+  name: ${PLUGIN_NAME}
+  namespace: higress-system
+spec:
+  # TODO: Replace with your registry
+  url: oci://YOUR_REGISTRY/${PLUGIN_NAME}:v1
+  phase: UNSPECIFIED_PHASE
+  priority: 100
+  defaultConfig:
+    enabled: true
+    # TODO: Add your configuration
+  # Optional: Apply to specific routes/domains
+  # matchRules:
+  # - domain:
+  #   - "*.example.com"
+  #   config:
+  #     enabled: true
+EOF
+
+# Generate README
+cat > "${PLUGIN_DIR}/README.md" << EOF
+# ${PLUGIN_NAME}
+
+A Higress WASM plugin migrated from nginx configuration.
+
+## Build
+
+\`\`\`bash
+./build.sh
+\`\`\`
+
+## Push to Registry
+
+\`\`\`bash
+# Set your registry
+REGISTRY=your-registry.com/higress-plugins
+
+# Build Docker image
+docker build -t \${REGISTRY}/${PLUGIN_NAME}:v1 .
+
+# Push
+docker push \${REGISTRY}/${PLUGIN_NAME}:v1
+\`\`\`
+
+## Deploy
+
+1. Update \`wasmplugin.yaml\` with your registry URL
+2. Apply to cluster:
+   \`\`\`bash
+   kubectl apply -f wasmplugin.yaml
+   \`\`\`
+
+## Configuration
+
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| enabled | bool | true | Enable/disable plugin |
+
+## TODO
+
+- [ ] Implement plugin logic in main.go
+- [ ] Add configuration fields
+- [ ] Test locally
+- [ ] Push to registry
+- [ ] Deploy to cluster
+EOF
+
+echo -e "\n${GREEN}✓ Plugin scaffold generated at: ${PLUGIN_DIR}${NC}"
+echo ""
+echo "Files created:"
+echo "  - ${PLUGIN_DIR}/main.go        (plugin source)"
+echo "  - ${PLUGIN_DIR}/go.mod         (Go module)"
+echo "  - ${PLUGIN_DIR}/Dockerfile     (OCI image)"
+echo "  - ${PLUGIN_DIR}/build.sh       (build script)"
+echo "  - ${PLUGIN_DIR}/wasmplugin.yaml (K8s manifest)"
+echo "  - ${PLUGIN_DIR}/README.md      (documentation)"
+echo ""
+echo -e "${YELLOW}Next steps:${NC}"
+echo "1. cd ${PLUGIN_DIR}"
+echo "2. Edit main.go to implement your logic"
+echo "3. Run: ./build.sh"
+echo "4. Push image to your registry"
+echo "5. Update wasmplugin.yaml with registry URL"
+echo "6. Deploy: kubectl apply -f wasmplugin.yaml"
--- a/.claude/skills/nginx-to-higress-migration/scripts/install-harbor.sh
+++ b/.claude/skills/nginx-to-higress-migration/scripts/install-harbor.sh
@@ -0,0 +1,157 @@
+#!/bin/bash
+# Install Harbor registry for WASM plugin images
+# Only use this if you don't have an existing image registry
+
+set -e
+
+# Colors
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+BLUE='\033[0;34m'
+NC='\033[0m'
+
+HARBOR_NAMESPACE="${1:-harbor-system}"
+HARBOR_PASSWORD="${2:-Harbor12345}"
+
+echo -e "${BLUE}========================================${NC}"
+echo -e "${BLUE}Harbor Registry Installation${NC}"
+echo -e "${BLUE}========================================${NC}"
+echo ""
+echo -e "${YELLOW}This will install Harbor in your cluster.${NC}"
+echo ""
+echo "Configuration:"
+echo "  Namespace: ${HARBOR_NAMESPACE}"
+echo "  Admin Password: ${HARBOR_PASSWORD}"
+echo "  Exposure: NodePort (no TLS)"
+echo "  Persistence: Enabled (default StorageClass)"
+echo ""
+read -p "Continue? (y/N): " -n 1 -r
+echo
+if [[ ! $REPLY =~ ^[Yy]$ ]]; then
+    echo "Aborted."
+    exit 1
+fi
+
+# Check prerequisites
+echo -e "\n${YELLOW}Checking prerequisites...${NC}"
+
+# Check for helm
+if ! command -v helm &> /dev/null; then
+    echo -e "${RED}✗ helm not found. Please install helm 3.x${NC}"
+    exit 1
+fi
+echo -e "${GREEN}✓ helm found${NC}"
+
+# Check for kubectl
+if ! command -v kubectl &> /dev/null; then
+    echo -e "${RED}✗ kubectl not found${NC}"
+    exit 1
+fi
+echo -e "${GREEN}✓ kubectl found${NC}"
+
+# Check cluster access
+if ! kubectl get nodes &> /dev/null; then
+    echo -e "${RED}✗ Cannot access cluster${NC}"
+    exit 1
+fi
+echo -e "${GREEN}✓ Cluster access OK${NC}"
+
+# Check for default StorageClass
+if ! kubectl get storageclass -o name | grep -q .; then
+    echo -e "${YELLOW}⚠ No StorageClass found. Harbor needs persistent storage.${NC}"
+    echo "  You may need to install a storage provisioner first."
+    read -p "Continue anyway? (y/N): " -n 1 -r
+    echo
+    if [[ ! $REPLY =~ ^[Yy]$ ]]; then
+        exit 1
+    fi
+fi
+
+# Add Harbor helm repo
+echo -e "\n${YELLOW}Adding Harbor helm repository...${NC}"
+helm repo add harbor https://helm.goharbor.io
+helm repo update
+echo -e "${GREEN}✓ Repository added${NC}"
+
+# Install Harbor
+echo -e "\n${YELLOW}Installing Harbor...${NC}"
+helm install harbor harbor/harbor \
+  --namespace "${HARBOR_NAMESPACE}" --create-namespace \
+  --set expose.type=nodePort \
+  --set expose.tls.enabled=false \
+  --set persistence.enabled=true \
+  --set harborAdminPassword="${HARBOR_PASSWORD}" \
+  --wait --timeout 10m
+
+if [ $? -ne 0 ]; then
+    echo -e "${RED}✗ Harbor installation failed${NC}"
+    exit 1
+fi
+
+echo -e "${GREEN}✓ Harbor installed successfully${NC}"
+
+# Wait for Harbor to be ready
+echo -e "\n${YELLOW}Waiting for Harbor to be ready...${NC}"
+kubectl wait --for=condition=ready pod -l app=harbor -n "${HARBOR_NAMESPACE}" --timeout=300s
+
+# Get access information
+echo -e "\n${BLUE}========================================${NC}"
+echo -e "${BLUE}Harbor Access Information${NC}"
+echo -e "${BLUE}========================================${NC}"
+
+NODE_PORT=$(kubectl get svc -n "${HARBOR_NAMESPACE}" harbor-core -o jsonpath='{.spec.ports[0].nodePort}')
+NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="ExternalIP")].address}')
+if [ -z "$NODE_IP" ]; then
+    NODE_IP=$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[?(@.type=="InternalIP")].address}')
+fi
+
+HARBOR_URL="${NODE_IP}:${NODE_PORT}"
+
+echo ""
+echo -e "Harbor URL: ${GREEN}http://${HARBOR_URL}${NC}"
+echo -e "Username: ${GREEN}admin${NC}"
+echo -e "Password: ${GREEN}${HARBOR_PASSWORD}${NC}"
+echo ""
+
+# Test Docker login
+echo -e "${YELLOW}Testing Docker login...${NC}"
+if docker login "${HARBOR_URL}" -u admin -p "${HARBOR_PASSWORD}" &> /dev/null; then
+    echo -e "${GREEN}✓ Docker login successful${NC}"
+else
+    echo -e "${YELLOW}⚠ Docker login failed. You may need to:${NC}"
+    echo "  1. Add '${HARBOR_URL}' to Docker's insecure registries"
+    echo "  2. Restart Docker daemon"
+    echo ""
+    echo "  Edit /etc/docker/daemon.json (Linux) or Docker Desktop settings (Mac/Windows):"
+    echo "  {"
+    echo "    \"insecure-registries\": [\"${HARBOR_URL}\"]"
+    echo "  }"
+fi
+
+echo ""
+echo -e "${BLUE}========================================${NC}"
+echo -e "${BLUE}Next Steps${NC}"
+echo -e "${BLUE}========================================${NC}"
+echo ""
+echo "1. Open Harbor UI: http://${HARBOR_URL}"
+echo "2. Login with admin/${HARBOR_PASSWORD}"
+echo "3. Create a new project:"
+echo "   - Click 'Projects' → 'New Project'"
+echo "   - Name: higress-plugins"
+echo "   - Access Level: Public"
+echo ""
+echo "4. Build and push your plugin:"
+echo "   docker build -t ${HARBOR_URL}/higress-plugins/my-plugin:v1 ."
+echo "   docker push ${HARBOR_URL}/higress-plugins/my-plugin:v1"
+echo ""
+echo "5. Use in WasmPlugin:"
+echo "   url: oci://${HARBOR_URL}/higress-plugins/my-plugin:v1"
+echo ""
+echo -e "${YELLOW}⚠ Note: This is a basic installation for testing.${NC}"
+echo "  For production use:"
+echo "  - Enable TLS (set expose.tls.enabled=true)"
+echo "  - Use LoadBalancer or Ingress instead of NodePort"
+echo "  - Configure proper persistent storage"
+echo "  - Set strong admin password"
+echo ""
--- a/.github/workflows/build-and-test-plugin.yaml
+++ b/.github/workflows/build-and-test-plugin.yaml
@@ -19,7 +19,7 @@ on:

 jobs:
  lint:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
@@ -30,7 +30,7 @@ jobs:
    # - run: make lint

  higress-wasmplugin-test:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
    strategy:
      matrix:
        # TODO(Xunzhuo): Enable C WASM Filters in CI
@@ -38,6 +38,18 @@ jobs:
    steps:
      - uses: actions/checkout@v4

+      - name: Disable containerd image store
+        run: |
+          sudo bash -c 'cat > /etc/docker/daemon.json << EOF
+          {
+            "features": {
+              "containerd-snapshotter": false
+            }
+          }
+          EOF'
+          sudo systemctl restart docker
+          docker info -f '{{ .DriverStatus }}'
+
      - name: Free Up GitHub Actions Ubuntu Runner Disk Space 🔧
        uses: jlumbroso/free-disk-space@main
        with:
@@ -79,7 +91,7 @@ jobs:
          command: GOPROXY="https://proxy.golang.org,direct" PLUGIN_TYPE=${{ matrix.wasmPluginType }} make higress-wasmplugin-test

  publish:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
    needs: [higress-wasmplugin-test]
    steps:
      - uses: actions/checkout@v4
--- a/.github/workflows/build-and-test.yaml
+++ b/.github/workflows/build-and-test.yaml
@@ -10,7 +10,7 @@ env:
  GO_VERSION: 1.24
 jobs:
  lint:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
@@ -21,7 +21,7 @@ jobs:
    # - run: make lint

  coverage-test:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
    steps:
      - uses: actions/checkout@v4

@@ -57,7 +57,7 @@ jobs:

  build:
    # The type of runner that the job will run on
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
    needs: [lint, coverage-test]
    steps:
      - name: "Checkout ${{ github.ref }}"
@@ -91,17 +91,29 @@ jobs:
          path: out/

  gateway-conformance-test:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
    needs: [build]
    steps:
      - uses: actions/checkout@v3

  higress-conformance-test:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
    needs: [build]
    steps:
      - uses: actions/checkout@v4

+      - name: Disable containerd image store
+        run: |
+          sudo bash -c 'cat > /etc/docker/daemon.json << EOF
+          {
+            "features": {
+              "containerd-snapshotter": false
+            }
+          }
+          EOF'
+          sudo systemctl restart docker
+          docker info -f '{{ .DriverStatus }}'
+
      - name: Free Up GitHub Actions Ubuntu Runner Disk Space 🔧
        uses: jlumbroso/free-disk-space@main
        with:
@@ -139,7 +151,7 @@ jobs:
        run: GOPROXY="https://proxy.golang.org,direct" make higress-conformance-test

  publish:
-    runs-on: ubuntu-latest
+    runs-on: ubuntu-22.04
    needs: [higress-conformance-test, gateway-conformance-test]
    steps:
      - uses: actions/checkout@v4
--- a/.github/workflows/deploy-standalone-to-oss.yaml
+++ b/.github/workflows/deploy-standalone-to-oss.yaml
@@ -31,7 +31,8 @@ jobs:
      - name: Upload to OSS
        uses: go-choppy/ossutil-github-action@master
        with:
-          ossArgs: 'cp -r -u ./artifact/ oss://higress-website-cn-hongkong/standalone/'
+          ossArgs: 'cp -r -u ./artifact/ oss://higress-ai/standalone/'
          accessKey: ${{ secrets.ACCESS_KEYID }}
          accessSecret: ${{ secrets.ACCESS_KEYSECRET }}
          endpoint: oss-cn-hongkong.aliyuncs.com
+
--- a/.github/workflows/deploy-to-oss.yaml
+++ b/.github/workflows/deploy-to-oss.yaml
@@ -19,7 +19,7 @@ jobs:
      - name: Download Helm Charts Index
        uses: go-choppy/ossutil-github-action@master
        with:
-          ossArgs: 'cp oss://higress-website-cn-hongkong/helm-charts/index.yaml ./artifact/'
+          ossArgs: 'cp oss://higress-ai/helm-charts/index.yaml ./artifact/'
          accessKey: ${{ secrets.ACCESS_KEYID }}
          accessSecret: ${{ secrets.ACCESS_KEYSECRET }}
          endpoint: oss-cn-hongkong.aliyuncs.com
@@ -48,7 +48,8 @@ jobs:
      - name: Upload to OSS
        uses: go-choppy/ossutil-github-action@master
        with:
-          ossArgs: 'cp -r -u ./artifact/ oss://higress-website-cn-hongkong/helm-charts/'
+          ossArgs: 'cp -r -u ./artifact/ oss://higress-ai/helm-charts/'
          accessKey: ${{ secrets.ACCESS_KEYID }}
          accessSecret: ${{ secrets.ACCESS_KEYSECRET }}
          endpoint: oss-cn-hongkong.aliyuncs.com
+
--- a/.github/workflows/sync-skills-to-oss.yaml
+++ b/.github/workflows/sync-skills-to-oss.yaml
@@ -0,0 +1,50 @@
+name: Sync Skills to OSS
+
+on:
+  push:
+    branches:
+      - main
+    paths:
+      - '.claude/skills/**'
+  workflow_dispatch: ~
+
+jobs:
+  sync-skills-to-oss:
+    runs-on: ubuntu-latest
+    environment:
+      name: oss
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+
+      - name: Download AI Gateway Install Script
+        run: |
+          wget -O install.sh https://raw.githubusercontent.com/higress-group/higress-standalone/main/all-in-one/get-ai-gateway.sh
+          chmod +x install.sh
+
+      - name: Package Skills
+        run: |
+          mkdir -p packaged-skills
+          for skill_dir in .claude/skills/*/; do
+            if [ -d "$skill_dir" ]; then
+              skill_name=$(basename "$skill_dir")
+              echo "Packaging $skill_name..."
+              (cd "$skill_dir" && zip -r "$GITHUB_WORKSPACE/packaged-skills/${skill_name}.zip" .)
+            fi
+          done
+
+      - name: Sync Skills to OSS
+        uses: go-choppy/ossutil-github-action@master
+        with:
+          ossArgs: 'cp -r -u packaged-skills/ oss://higress-ai/skills/'
+          accessKey: ${{ secrets.ACCESS_KEYID }}
+          accessSecret: ${{ secrets.ACCESS_KEYSECRET }}
+          endpoint: oss-cn-hongkong.aliyuncs.com
+
+      - name: Sync Install Script to OSS
+        uses: go-choppy/ossutil-github-action@master
+        with:
+          ossArgs: 'cp -u install.sh oss://higress-ai/ai-gateway/install.sh'
+          accessKey: ${{ secrets.ACCESS_KEYID }}
+          accessSecret: ${{ secrets.ACCESS_KEYSECRET }}
+          endpoint: oss-cn-hongkong.aliyuncs.com
--- a/.gitmodules
+++ b/.gitmodules
@@ -21,15 +21,15 @@
 [submodule "istio/proxy"]
 	path = istio/proxy
 	url = https://github.com/higress-group/proxy
-	branch = istio-1.19
+	branch = envoy-1.36
 	shallow = true
 [submodule "envoy/go-control-plane"]
 	path = envoy/go-control-plane
 	url = https://github.com/higress-group/go-control-plane
-	branch = istio-1.27
+	branch = envoy-1.36
 	shallow = true
 [submodule "envoy/envoy"]
 	path = envoy/envoy
 	url = https://github.com/higress-group/envoy
-	branch = envoy-1.27
+	branch = envoy-1.36
 	shallow = true
--- a/.licenserc.yaml
+++ b/.licenserc.yaml
@@ -35,7 +35,8 @@ header:
    - 'hgctl/pkg/manifests'
    - 'pkg/ingress/kube/gateway/istio/testdata'
    - 'release-notes/**'
-    - '.cursor/**'    
+    - '.cursor/**'
+    - '.claude/**'    

  comment: on-failure
 dependency:
--- a/Makefile.core.mk
+++ b/Makefile.core.mk
@@ -146,7 +146,7 @@ docker-buildx-push: clean-env docker.higress-buildx
 export PARENT_GIT_TAG:=$(shell cat VERSION)
 export PARENT_GIT_REVISION:=$(TAG)

-export ENVOY_PACKAGE_URL_PATTERN?=https://github.com/higress-group/proxy/releases/download/v2.2.0/envoy-symbol-ARCH.tar.gz
+export ENVOY_PACKAGE_URL_PATTERN?=https://github.com/higress-group/proxy/releases/download/v2.2.1/envoy-symbol-ARCH.tar.gz

 build-envoy: prebuild
 	./tools/hack/build-envoy.sh
@@ -200,8 +200,8 @@ install: pre-install
 	helm install higress helm/higress -n higress-system --create-namespace --set 'global.local=true'

 HIGRESS_LATEST_IMAGE_TAG ?= latest
-ENVOY_LATEST_IMAGE_TAG ?= cdf0f16bf622102f89a0d0257834f43f502e4b99
-ISTIO_LATEST_IMAGE_TAG ?= a7525f292c38d7d3380f3ce7ee971ad6e3c46adf
+ENVOY_LATEST_IMAGE_TAG ?= ca6ff3a92e3fa592bff706894b22e0509a69757b
+ISTIO_LATEST_IMAGE_TAG ?= c482b42b9a14885bd6692c6abd01345d50a372f7

 install-dev: pre-install
 	helm install higress helm/core -n higress-system --create-namespace --set 'controller.tag=$(TAG)' --set 'gateway.replicas=1' --set 'pilot.tag=$(ISTIO_LATEST_IMAGE_TAG)' --set 'gateway.tag=$(ENVOY_LATEST_IMAGE_TAG)' --set 'global.local=true'
--- a/README.md
+++ b/README.md
@@ -45,7 +45,7 @@ Higress was born within Alibaba to solve the issues of Tengine reload affecting

 You can click the button below to install the enterprise version of Higress:

-[![Deploy on AlibabaCloud](https://img.alicdn.com/imgextra/i1/O1CN01e6vwe71EWTHoZEcpK_!!6000000000359-55-tps-170-40.svg)](https://www.aliyun.com/product/apigateway?spm=higress-github.topbar.0.0.0)
+[![Deploy on AlibabaCloud](https://img.alicdn.com/imgextra/i1/O1CN01e6vwe71EWTHoZEcpK_!!6000000000359-55-tps-170-40.svg)](https://www.aliyun.com/product/api-gateway?spm=higress-github.topbar.0.0.0)


 If you use open-source Higress and wish to obtain enterprise-level support, you can contact the project maintainer johnlanni's email: **zty98751@alibaba-inc.com** or social media accounts (WeChat ID: **nomadao**, DingTalk ID: **chengtanzty**). Please note **Higress** when adding as a friend :)
@@ -86,6 +86,18 @@ Port descriptions:
 > 
 > **Southeast Asia**: `higress-registry.ap-southeast-7.cr.aliyuncs.com`

+> **For Kubernetes deployments**, you can configure the `global.hub` parameter in Helm values to use a mirror registry closer to your region. This applies to both Higress component images and built-in Wasm plugin images:
+> 
+> ```bash
+> # Example: Using North America mirror
+> helm install higress -n higress-system higress.io/higress --set global.hub=higress-registry.us-west-1.cr.aliyuncs.com --create-namespace
+> ```
+> 
+> Available mirror registries:
+> - **China (Hangzhou)**: `higress-registry.cn-hangzhou.cr.aliyuncs.com` (default)
+> - **North America**: `higress-registry.us-west-1.cr.aliyuncs.com`
+> - **Southeast Asia**: `higress-registry.ap-southeast-7.cr.aliyuncs.com`
+
 For other installation methods such as Helm deployment under K8s, please refer to the official [Quick Start documentation](https://higress.io/en-us/docs/user/quickstart).

 If you are deploying on the cloud, it is recommended to use the [Enterprise Edition](https://www.aliyun.com/product/apigateway?spm=higress-github.topbar.0.0.0)
@@ -119,7 +131,16 @@ If you are deploying on the cloud, it is recommended to use the [Enterprise Edit

  Higress can function as a feature-rich ingress controller, which is compatible with many annotations of K8s' nginx ingress controller.
  
-  [Gateway API](https://gateway-api.sigs.k8s.io/) support is coming soon and will support smooth migration from Ingress API to Gateway API.
+  [Gateway API](https://gateway-api.sigs.k8s.io/) is already supported, and it supports a smooth migration from Ingress API to Gateway API.
+
+  Compared to ingress-nginx, the resource overhead has significantly decreased, and the speed at which route changes take effect has improved by ten times.
+
+  > The following resource overhead comparison comes from [sealos](https://github.com/labring).
+  >
+  > For details, you can read this [article](https://sealos.io/blog/sealos-envoy-vs-nginx-2000-tenants) to understand how sealos migrates the monitoring of **tens of thousands of ingress** resources from nginx ingress to higress.
+
+   ![](https://img.alicdn.com/imgextra/i1/O1CN01bhEtb229eeMNBWmdP_!!6000000008093-2-tps-750-547.png)
+
  
 - **Microservice gateway**:

@@ -173,6 +194,8 @@ Higress would not be possible without the valuable open-source work of projects

 - Higress Console: https://github.com/higress-group/higress-console
 - Higress Standalone: https://github.com/higress-group/higress-standalone
+- Higress Plugin Server：https://github.com/higress-group/plugin-server
+- Higress Wasm Plugin Golang SDK：https://github.com/higress-group/wasm-go

 ### Contributors

--- a/README_JP.md
+++ b/README_JP.md
@@ -208,6 +208,8 @@ WeChat公式アカウント：

 - Higressコンソール：https://github.com/higress-group/higress-console
 - Higress（スタンドアロン版）：https://github.com/higress-group/higress-standalone
+- Higress Plugin Server：https://github.com/higress-group/plugin-server
+- Higress Wasm Plugin Golang SDK：https://github.com/higress-group/wasm-go

 ### 貢献者

--- a/README_ZH.md
+++ b/README_ZH.md
@@ -80,6 +80,24 @@ docker run -d --rm --name higress-ai -v ${PWD}:/data \

 **Higress 的所有 Docker 镜像都一直使用自己独享的仓库，不受 Docker Hub 境内访问受限的影响**

+> 如果从 `higress-registry.cn-hangzhou.cr.aliyuncs.com` 拉取镜像超时，可以尝试使用以下镜像加速源：
+> 
+> **北美**: `higress-registry.us-west-1.cr.aliyuncs.com`
+> 
+> **东南亚**: `higress-registry.ap-southeast-7.cr.aliyuncs.com`
+
+> **K8s 部署时**，可以通过 Helm values 配置 `global.hub` 参数来使用距离部署区域更近的镜像仓库，该参数会同时应用于 Higress 组件镜像和内置 Wasm 插件镜像：
+> 
+> ```bash
+> # 示例：使用北美镜像源
+> helm install higress -n higress-system higress.io/higress --set global.hub=higress-registry.us-west-1.cr.aliyuncs.com --create-namespace
+> ```
+> 
+> 可用镜像仓库：
+> - **中国（杭州）**: `higress-registry.cn-hangzhou.cr.aliyuncs.com`（默认）
+> - **北美**: `higress-registry.us-west-1.cr.aliyuncs.com`
+> - **东南亚**: `higress-registry.ap-southeast-7.cr.aliyuncs.com`
+
 K8s 下使用 Helm 部署等其他安装方式可以参考官网 [Quick Start 文档](https://higress.cn/docs/latest/user/quickstart/)。

 如果您是在云上部署，推荐使用[企业版](https://www.aliyun.com/product/apigateway?spm=higress-github.topbar.0.0.0)
@@ -221,6 +239,8 @@ K8s 下使用 Helm 部署等其他安装方式可以参考官网 [Quick Start

 - Higress 控制台：https://github.com/higress-group/higress-console
 - Higress（独立运行版）：https://github.com/higress-group/higress-standalone
+- Higress 插件服务器：https://github.com/higress-group/plugin-server
+- Higress Wasm 插件 Golang SDK：https://github.com/higress-group/wasm-go

 ### 贡献者

--- a/2
+++ b/2
@@ -1 +1 @@
-v2.1.9
+v2.2.0
--- a/envoy/envoy
+++ b/envoy/envoy
--- a/envoy/go-control-plane
+++ b/envoy/go-control-plane
--- a/go.mod
+++ b/go.mod
@@ -20,7 +20,7 @@ require (
 	github.com/caddyserver/certmagic v0.21.3
 	github.com/dubbogo/go-zookeeper v1.0.4-0.20211212162352-f9d2183d89d5
 	github.com/dubbogo/gost v1.13.1
-	github.com/envoyproxy/go-control-plane/envoy v1.35.0
+	github.com/envoyproxy/go-control-plane/envoy v1.36.0
 	github.com/go-errors/errors v1.5.1
 	github.com/gogo/protobuf v1.3.2
 	github.com/golang/protobuf v1.5.4
@@ -38,10 +38,10 @@ require (
 	github.com/tidwall/gjson v1.17.0
 	go.uber.org/atomic v1.11.0
 	go.uber.org/zap v1.27.0
-	golang.org/x/net v0.44.0
-	google.golang.org/genproto/googleapis/api v0.0.0-20250929231259-57b25ae835d4
-	google.golang.org/grpc v1.76.0
-	google.golang.org/protobuf v1.36.10
+	golang.org/x/net v0.47.0
+	google.golang.org/genproto/googleapis/api v0.0.0-20251029180050-ab9386a59fda
+	google.golang.org/grpc v1.78.0
+	google.golang.org/protobuf v1.36.11
 	istio.io/api v1.27.1-0.20250820125923-f5a5d3a605a9
 	istio.io/client-go v1.27.1-0.20250820130622-12f6d11feb40
 	istio.io/istio v0.0.0
@@ -65,7 +65,7 @@ require (
 	cloud.google.com/go v0.120.0 // indirect
 	cloud.google.com/go/auth v0.16.5 // indirect
 	cloud.google.com/go/auth/oauth2adapt v0.2.8 // indirect
-	cloud.google.com/go/compute/metadata v0.8.4 // indirect
+	cloud.google.com/go/compute/metadata v0.9.0 // indirect
 	cloud.google.com/go/logging v1.13.0 // indirect
 	cloud.google.com/go/longrunning v0.6.7 // indirect
 	dario.cat/mergo v1.0.2 // indirect
@@ -103,11 +103,10 @@ require (
 	github.com/buger/jsonparser v1.1.1 // indirect
 	github.com/cenkalti/backoff/v4 v4.3.0 // indirect
 	github.com/cenkalti/backoff/v5 v5.0.3 // indirect
-	github.com/census-instrumentation/opencensus-proto v0.4.1 // indirect
 	github.com/cespare/xxhash/v2 v2.3.0 // indirect
 	github.com/clbanning/mxj v1.8.4 // indirect
 	github.com/clbanning/mxj/v2 v2.5.5 // indirect
-	github.com/cncf/xds/go v0.0.0-20250501225837-2ac532fd4443 // indirect
+	github.com/cncf/xds/go v0.0.0-20251110193048-8bfbf64dc13e // indirect
 	github.com/containerd/stargz-snapshotter/estargz v0.16.3 // indirect
 	github.com/coreos/go-oidc/v3 v3.14.1 // indirect
 	github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
@@ -117,23 +116,23 @@ require (
 	github.com/docker/distribution v2.8.3+incompatible // indirect
 	github.com/docker/docker-credential-helpers v0.9.3 // indirect
 	github.com/emicklei/go-restful/v3 v3.13.0 // indirect
-	github.com/envoyproxy/go-control-plane v0.13.4 // indirect
+	github.com/envoyproxy/go-control-plane v0.14.0 // indirect
 	github.com/envoyproxy/go-control-plane/contrib v0.0.0-20251016030003-90eca0228178 // indirect
-	github.com/envoyproxy/protoc-gen-validate v1.2.1 // indirect
+	github.com/envoyproxy/protoc-gen-validate v1.3.0 // indirect
 	github.com/evanphx/json-patch/v5 v5.9.11 // indirect
 	github.com/fatih/color v1.18.0 // indirect
 	github.com/felixge/httpsnoop v1.0.4 // indirect
 	github.com/franela/goreq v0.0.0-20171204163338-bcd34c9993f8 // indirect
 	github.com/fsnotify/fsnotify v1.9.0 // indirect
 	github.com/fxamacker/cbor/v2 v2.9.0 // indirect
-	github.com/go-jose/go-jose/v4 v4.1.2 // indirect
+	github.com/go-jose/go-jose/v4 v4.1.3 // indirect
 	github.com/go-logr/logr v1.4.3 // indirect
 	github.com/go-logr/stdr v1.2.2 // indirect
 	github.com/go-openapi/jsonpointer v0.21.2 // indirect
 	github.com/go-openapi/jsonreference v0.21.0 // indirect
 	github.com/go-openapi/swag v0.23.1 // indirect
 	github.com/goccy/go-json v0.10.5 // indirect
-	github.com/golang/mock v1.6.0 // indirect
+	github.com/golang/mock v1.7.0-rc.1 // indirect
 	github.com/google/btree v1.1.3 // indirect
 	github.com/google/cel-go v0.26.0 // indirect
 	github.com/google/gnostic-models v0.7.0 // indirect
@@ -220,7 +219,7 @@ require (
 	github.com/yl2chen/cidranger v1.0.2 // indirect
 	github.com/zeebo/blake3 v0.2.3 // indirect
 	go.opencensus.io v0.24.0 // indirect
-	go.opentelemetry.io/auto/sdk v1.1.0 // indirect
+	go.opentelemetry.io/auto/sdk v1.2.1 // indirect
 	go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.61.0 // indirect
 	go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.63.0 // indirect
 	go.opentelemetry.io/otel v1.38.0 // indirect
@@ -231,24 +230,24 @@ require (
 	go.opentelemetry.io/otel/sdk v1.38.0 // indirect
 	go.opentelemetry.io/otel/sdk/metric v1.38.0 // indirect
 	go.opentelemetry.io/otel/trace v1.38.0 // indirect
-	go.opentelemetry.io/proto/otlp v1.7.1 // indirect
+	go.opentelemetry.io/proto/otlp v1.9.0 // indirect
 	go.uber.org/multierr v1.11.0 // indirect
 	go.yaml.in/yaml/v2 v2.4.3 // indirect
 	go.yaml.in/yaml/v3 v3.0.4 // indirect
-	golang.org/x/crypto v0.42.0 // indirect
+	golang.org/x/crypto v0.44.0 // indirect
 	golang.org/x/exp v0.0.0-20250808145144-a408d31f581a // indirect
-	golang.org/x/mod v0.28.0 // indirect
-	golang.org/x/oauth2 v0.31.0 // indirect
-	golang.org/x/sync v0.17.0 // indirect
-	golang.org/x/sys v0.36.0 // indirect
-	golang.org/x/term v0.35.0 // indirect
-	golang.org/x/text v0.29.0 // indirect
+	golang.org/x/mod v0.29.0 // indirect
+	golang.org/x/oauth2 v0.32.0 // indirect
+	golang.org/x/sync v0.18.0 // indirect
+	golang.org/x/sys v0.38.0 // indirect
+	golang.org/x/term v0.37.0 // indirect
+	golang.org/x/text v0.31.0 // indirect
 	golang.org/x/time v0.13.0 // indirect
-	golang.org/x/tools v0.37.0 // indirect
+	golang.org/x/tools v0.38.0 // indirect
 	gomodules.xyz/jsonpatch/v2 v2.5.0 // indirect
 	google.golang.org/api v0.250.0 // indirect
 	google.golang.org/genproto v0.0.0-20250603155806-513f23925822 // indirect
-	google.golang.org/genproto/googleapis/rpc v0.0.0-20250922171735-9219d122eba9 // indirect
+	google.golang.org/genproto/googleapis/rpc v0.0.0-20251029180050-ab9386a59fda // indirect
 	gopkg.in/evanphx/json-patch.v4 v4.13.0 // indirect
 	gopkg.in/gcfg.v1 v1.2.3 // indirect
 	gopkg.in/inf.v0 v0.9.1 // indirect
--- a/go.sum
+++ b/go.sum
--- a/helm/core/Chart.yaml
+++ b/helm/core/Chart.yaml
@@ -1,5 +1,5 @@
 apiVersion: v2
-appVersion: 2.1.9
+appVersion: 2.2.0
 description: Helm chart for deploying higress gateways
 icon: https://higress.io/img/higress_logo_small.png
 home: http://higress.io/
@@ -15,4 +15,4 @@ dependencies:
    repository: "file://../redis"
    version: 0.0.1
 type: application
-version: 2.1.9
+version: 2.2.0
--- a/helm/core/charts/redis/templates/statefulset.yaml
+++ b/helm/core/charts/redis/templates/statefulset.yaml
@@ -23,7 +23,7 @@ spec:
      {{- end }}
      containers:
        - name: {{ .Chart.Name }}
-          image: "{{ .Values.global.hub }}/{{ .Values.redis.image | default "redis-stack-server" }}:{{ .Values.redis.tag | default .Chart.AppVersion }}"
+          image: "{{ .Values.global.hub }}/higress/{{ .Values.redis.image | default "redis-stack-server" }}:{{ .Values.redis.tag | default .Chart.AppVersion }}"
          {{- if .Values.global.imagePullPolicy }}
          imagePullPolicy: {{ .Values.global.imagePullPolicy }}
          {{- end }}
--- a/helm/core/charts/redis/values.yaml
+++ b/helm/core/charts/redis/values.yaml
@@ -3,7 +3,8 @@
 # Declare variables to be passed into your templates.
 global:
  # -- Specify the image registry and pull policy
-  hub: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress
+  # Will inherit from parent chart's global.hub if not set
+  hub: ""
  # -- Specify image pull policy if default behavior isn't desired.
  # Default behavior: latest images will be Always else IfNotPresent.
  imagePullPolicy: ""
--- a/helm/core/templates/_pod.tpl
+++ b/helm/core/templates/_pod.tpl
@@ -39,7 +39,7 @@ template:
    {{- end }}
    containers:
      - name: higress-gateway
-        image: "{{ .Values.gateway.hub | default .Values.global.hub }}/{{ .Values.gateway.image | default "gateway" }}:{{ .Values.gateway.tag | default .Chart.AppVersion }}"
+        image: "{{ .Values.gateway.hub | default .Values.global.hub }}/higress/{{ .Values.gateway.image | default "gateway" }}:{{ .Values.gateway.tag | default .Chart.AppVersion }}"
        args:
          - proxy
          - router
@@ -123,6 +123,8 @@ template:
        - name: LITE_METRICS
          value: "on"
        {{- end }}
+        - name: ISTIO_DELTA_XDS
+          value: "{{ .Values.global.enableDeltaXDS }}"
        {{- if include "skywalking.enabled" . }}
        - name: ISTIO_BOOTSTRAP_OVERRIDE
          value: /etc/istio/custom-bootstrap/custom_bootstrap.json
@@ -203,7 +205,7 @@ template:
      {{- if $o11y.enabled }}
        {{- $config := $o11y.promtail }}
      - name: promtail
-        image: {{ $config.image.repository }}:{{ $config.image.tag }}
+        image: {{ $config.image.repository | default (printf "%s/higress/promtail" .Values.global.hub) }}:{{ $config.image.tag }}
        imagePullPolicy: IfNotPresent
        args:
          - -config.file=/etc/promtail/promtail.yaml
@@ -250,6 +252,10 @@ template:
    tolerations:
      {{- toYaml . | nindent 6 }}
    {{- end }}
+    {{- with .Values.gateway.topologySpreadConstraints }}
+    topologySpreadConstraints:
+      {{- toYaml . | nindent 6 }}
+    {{- end }}
    volumes:
    - emptyDir: {}
      name: workload-socket
--- a/helm/core/templates/controller-clusterrole.yaml
+++ b/helm/core/templates/controller-clusterrole.yaml
@@ -144,3 +144,7 @@ rules:
  - apiGroups: [""]
    verbs: [ "get", "watch", "list", "update", "patch", "create", "delete" ]
    resources: [ "serviceaccounts"]
+  # istio leader election need
+  - apiGroups: ["coordination.k8s.io"]
+    resources: ["leases"]
+    verbs: ["get", "update", "patch", "create"]
--- a/helm/core/templates/controller-deployment.yaml
+++ b/helm/core/templates/controller-deployment.yaml
@@ -38,7 +38,7 @@ spec:
        - name: {{ .Chart.Name }}
          securityContext:
            {{- toYaml .Values.controller.securityContext | nindent 12 }}
-          image: "{{ .Values.controller.hub | default .Values.global.hub }}/{{ .Values.controller.image | default "higress" }}:{{ .Values.controller.tag | default .Chart.AppVersion }}"
+          image: "{{ .Values.controller.hub | default .Values.global.hub }}/higress/{{ .Values.controller.image | default "higress" }}:{{ .Values.controller.tag | default .Chart.AppVersion }}"
          args:
          - "serve"
          - --gatewaySelectorKey=higress
@@ -104,7 +104,7 @@ spec:
          - name: log
            mountPath: /var/log
        - name: discovery
-          image: "{{ .Values.pilot.hub | default .Values.global.hub }}/{{ .Values.pilot.image | default "pilot" }}:{{ .Values.pilot.tag | default .Chart.AppVersion }}"
+          image: "{{ .Values.pilot.hub | default .Values.global.hub }}/higress/{{ .Values.pilot.image | default "pilot" }}:{{ .Values.pilot.tag | default .Chart.AppVersion }}"
 {{- if .Values.global.imagePullPolicy }}
          imagePullPolicy: {{ .Values.global.imagePullPolicy }}
 {{- end }}
@@ -173,6 +173,8 @@ spec:
            value: "{{ .Values.global.xdsMaxRecvMsgSize }}"
          - name: ENBALE_SCOPED_RDS
            value: "{{ .Values.global.enableSRDS }}"
+          - name: ISTIO_DELTA_XDS
+            value: "{{ .Values.global.enableDeltaXDS }}"
          - name: ON_DEMAND_RDS
            value: "{{ .Values.global.onDemandRDS }}"
          - name: HOST_RDS_MERGE_SUBSET
@@ -301,6 +303,10 @@ spec:
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}
+      {{- with .Values.controller.topologySpreadConstraints }}
+      topologySpreadConstraints:
+        {{- toYaml . | nindent 8 }}
+      {{- end }}
      volumes:
      - name: log
        emptyDir: {}
--- a/helm/core/templates/plugin-server-deployment.yaml
+++ b/helm/core/templates/plugin-server-deployment.yaml
@@ -23,7 +23,7 @@ spec:
      {{- end }}
      containers:
        - name: {{ .Chart.Name }}
-          image: {{ .Values.pluginServer.hub | default .Values.global.hub }}/{{ .Values.pluginServer.image | default "plugin-server" }}:{{ .Values.pluginServer.tag | default "1.0.0" }}
+          image: {{ .Values.pluginServer.hub | default .Values.global.hub }}/higress/{{ .Values.pluginServer.image | default "plugin-server" }}:{{ .Values.pluginServer.tag | default "1.0.0" }}
          {{- if .Values.global.imagePullPolicy }}
          imagePullPolicy: {{ .Values.global.imagePullPolicy }}
          {{- end }}
--- a/helm/core/templates/service.yaml
+++ b/helm/core/templates/service.yaml
@@ -24,9 +24,6 @@ spec:
 {{- end }}
 {{- with .Values.gateway.service.externalTrafficPolicy }}
  externalTrafficPolicy: "{{ . }}"
-{{- end }}
-{{- with .Values.gateway.service.loadBalancerClass}}
-  loadBalancerClass: "{{ . }}"
 {{- end }}
  type: {{ .Values.gateway.service.type }}
  ports:
--- a/helm/core/values.yaml
+++ b/helm/core/values.yaml
@@ -9,6 +9,8 @@ global:
  xdsMaxRecvMsgSize: "104857600"
  defaultUpstreamConcurrencyThreshold: 10000
  enableSRDS: true
+  # -- Whether to enable Istio delta xDS, default is false.
+  enableDeltaXDS: true
  # -- Whether to enable Redis(redis-stack-server) for Higress, default is false.
  enableRedis: false
  enablePluginServer: false
@@ -68,10 +70,14 @@ global:
    #   cpu: 100m
    #   memory: 128Mi

-  # -- Default hub for Istio images.
-  # Releases are published to docker hub under 'istio' project.
-  # Dev builds from prow are on gcr.io
-  hub: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress
+  # -- Default hub (registry) for Higress images.
+  # For Higress deployments, images are pulled from: {hub}/higress/{image}
+  # For built-in plugins, images are pulled from: {hub}/{pluginNamespace}/{plugin-name}
+  # Change this to use a mirror registry closer to your deployment region for faster image pulls.
+  hub: higress-registry.cn-hangzhou.cr.aliyuncs.com
+  # -- Namespace for built-in plugin images. Default is "plugins".
+  # Used by higress-console to configure plugin image path.
+  pluginNamespace: "plugins"

  # -- Specify image pull policy if default behavior isn't desired.
  # Default behavior: latest images will be Always else IfNotPresent.
@@ -364,7 +370,7 @@ global:
    enabled: false
    promtail:
      image:
-        repository: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/promtail
+        repository: "" # Will use global.hub if not set
        tag: 2.9.4
      port: 3101
      resources:
@@ -379,7 +385,7 @@ global:
  # The default value is "" and when caName="", the CA will be configured by other
  # mechanisms (e.g., environmental variable CA_PROVIDER).
  caName: ""
-hub: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress
+hub: "" # Will use global.hub if not set

 clusterName: ""
 # -- meshConfig defines runtime configuration of components, including Istiod and istio-agent behavior
@@ -435,7 +441,7 @@ gateway:
  # -- The readiness timeout seconds
  readinessTimeoutSeconds: 3

-  hub: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress
+  hub: "" # Will use global.hub if not set
  tag: ""
  # -- revision declares which revision this gateway is a part of
  revision: ""
@@ -524,6 +530,8 @@ gateway:

  affinity: {}

+  topologySpreadConstraints: []
+
  # -- If specified, the gateway will act as a network gateway for the given network.
  networkGateway: ""

@@ -555,7 +563,7 @@ controller:
  replicas: 1
  image: higress

-  hub: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress
+  hub: "" # Will use global.hub if not set
  tag: ""
  env: {}

@@ -631,6 +639,8 @@ controller:

  affinity: {}

+  topologySpreadConstraints: []
+
  autoscaling:
    enabled: false
    minReplicas: 1
@@ -649,7 +659,7 @@ pilot:
  rollingMaxSurge: 100%
  rollingMaxUnavailable: 25%

-  hub: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress
+  hub: "" # Will use global.hub if not set
  tag: ""

  # -- Can be a full hub/image:tag
@@ -802,7 +812,7 @@ pluginServer:
  replicas: 2
  image: plugin-server

-  hub: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress
+  hub: "" # Will use global.hub if not set
  tag: ""

  imagePullSecrets: []
--- a/helm/higress/Chart.lock
+++ b/helm/higress/Chart.lock
@@ -1,9 +1,9 @@
 dependencies:
 - name: higress-core
  repository: file://../core
-  version: 2.1.9
+  version: 2.2.0
 - name: higress-console
  repository: https://higress.io/helm-charts/
-  version: 2.1.9
-digest: sha256:d696af6726b40219cc16e7cf8de7400101479dfbd8deb3101d7ee736415b9875
-generated: "2025-11-13T16:33:49.721553+08:00"
+  version: 2.2.0
+digest: sha256:2cb148fa6d52856344e1905d3fea018466c2feb52013e08997c2d5c7d50f2e5d
+generated: "2026-02-11T17:45:59.187965929+08:00"
--- a/helm/higress/Chart.yaml
+++ b/helm/higress/Chart.yaml
@@ -1,5 +1,5 @@
 apiVersion: v2
-appVersion: 2.1.9
+appVersion: 2.2.0
 description: Helm chart for deploying Higress gateways
 icon: https://higress.io/img/higress_logo_small.png
 home: http://higress.io/
@@ -12,9 +12,9 @@ sources:
 dependencies:
 - name: higress-core
  repository: "file://../core"
-  version: 2.1.9
+  version: 2.2.0
 - name: higress-console
  repository: "https://higress.io/helm-charts/"
-  version: 2.1.9
+  version: 2.2.0
 type: application
-version: 2.1.9
+version: 2.2.0
--- a/helm/higress/README.md
+++ b/helm/higress/README.md
@@ -44,7 +44,7 @@ The command removes all the Kubernetes components associated with the chart and
 | controller.autoscaling.minReplicas | int | `1` |  |
 | controller.autoscaling.targetCPUUtilizationPercentage | int | `80` |  |
 | controller.env | object | `{}` |  |
-| controller.hub | string | `"higress-registry.cn-hangzhou.cr.aliyuncs.com/higress"` |  |
+| controller.hub | string | `""` |  |
 | controller.image | string | `"higress"` |  |
 | controller.imagePullSecrets | list | `[]` |  |
 | controller.labels | object | `{}` |  |
@@ -83,6 +83,7 @@ The command removes all the Kubernetes components associated with the chart and
 | controller.serviceAccount.name | string | `""` | If not set and create is true, a name is generated using the fullname template |
 | controller.tag | string | `""` |  |
 | controller.tolerations | list | `[]` |  |
+| controller.topologySpreadConstraints | list | `[]` |  |
 | downstream | object | `{"connectionBufferLimits":32768,"http2":{"initialConnectionWindowSize":1048576,"initialStreamWindowSize":65535,"maxConcurrentStreams":100},"idleTimeout":180,"maxRequestHeadersKb":60,"routeTimeout":0}` | Downstream config settings |
 | gateway.affinity | object | `{}` |  |
 | gateway.annotations | object | `{}` | Annotations to apply to all resources |
@@ -95,7 +96,7 @@ The command removes all the Kubernetes components associated with the chart and
 | gateway.hostNetwork | bool | `false` |  |
 | gateway.httpPort | int | `80` |  |
 | gateway.httpsPort | int | `443` |  |
-| gateway.hub | string | `"higress-registry.cn-hangzhou.cr.aliyuncs.com/higress"` |  |
+| gateway.hub | string | `""` |  |
 | gateway.image | string | `"gateway"` |  |
 | gateway.kind | string | `"Deployment"` | Use a `DaemonSet` or `Deployment` |
 | gateway.labels | object | `{}` | Labels to apply to all resources |
@@ -152,6 +153,7 @@ The command removes all the Kubernetes components associated with the chart and
 | gateway.serviceAccount.name | string | `""` | The name of the service account to use. If not set, the release name is used |
 | gateway.tag | string | `""` |  |
 | gateway.tolerations | list | `[]` |  |
+| gateway.topologySpreadConstraints | list | `[]` |  |
 | gateway.unprivilegedPortSupported | string | `nil` |  |
 | global.autoscalingv2API | bool | `true` | whether to use autoscaling/v2 template for HPA settings for internal usage only, not to be configured by users. |
 | global.caAddress | string | `""` | The customized CA address to retrieve certificates for the pods in the cluster. CSR clients such as the Istio Agent and ingress gateways can use this to specify the CA endpoint. If not set explicitly, default to the Istio discovery address. |
@@ -161,6 +163,7 @@ The command removes all the Kubernetes components associated with the chart and
 | global.defaultResources | object | `{"requests":{"cpu":"10m"}}` | A minimal set of requested resources to applied to all deployments so that Horizontal Pod Autoscaler will be able to function (if set). Each component can overwrite these default values by adding its own resources block in the relevant section below and setting the desired resources values. |
 | global.defaultUpstreamConcurrencyThreshold | int | `10000` |  |
 | global.disableAlpnH2 | bool | `false` | Whether to disable HTTP/2 in ALPN |
+| global.enableDeltaXDS | bool | `true` | Whether to enable Istio delta xDS, default is false. |
 | global.enableGatewayAPI | bool | `true` | If true, Higress Controller will monitor Gateway API resources as well |
 | global.enableH3 | bool | `false` |  |
 | global.enableIPv6 | bool | `false` |  |
@@ -175,7 +178,7 @@ The command removes all the Kubernetes components associated with the chart and
 | global.enableStatus | bool | `true` | If true, Higress Controller will update the status field of Ingress resources. When migrating from Nginx Ingress, in order to avoid status field of Ingress objects being overwritten, this parameter needs to be set to false, so Higress won't write the entry IP to the status field of the corresponding Ingress object. |
 | global.externalIstiod | bool | `false` | Configure a remote cluster data plane controlled by an external istiod. When set to true, istiod is not deployed locally and only a subset of the other discovery charts are enabled. |
 | global.hostRDSMergeSubset | bool | `false` |  |
-| global.hub | string | `"higress-registry.cn-hangzhou.cr.aliyuncs.com/higress"` | Default hub for Istio images. Releases are published to docker hub under 'istio' project. Dev builds from prow are on gcr.io |
+| global.hub | string | `"higress-registry.cn-hangzhou.cr.aliyuncs.com"` | Default hub (registry) for Higress images. For Higress deployments, images are pulled from: {hub}/higress/{image} For built-in plugins, images are pulled from: {hub}/{pluginNamespace}/{plugin-name} Change this to use a mirror registry closer to your deployment region for faster image pulls. |
 | global.imagePullPolicy | string | `""` | Specify image pull policy if default behavior isn't desired. Default behavior: latest images will be Always else IfNotPresent. |
 | global.imagePullSecrets | list | `[]` | ImagePullSecrets for all ServiceAccount, list of secrets in the same namespace to use for pulling any images in pods that reference this ServiceAccount. For components that don't use ServiceAccounts (i.e. grafana, servicegraph, tracing) ImagePullSecrets will be added to the corresponding Deployment(StatefulSet) objects. Must be set for any cluster configured with private docker registry. |
 | global.ingressClass | string | `"higress"` | IngressClass filters which ingress resources the higress controller watches. The default ingress class is higress. There are some special cases for special ingress class. 1. When the ingress class is set as nginx, the higress controller will watch ingress resources with the nginx ingress class or without any ingress class. 2. When the ingress class is set empty, the higress controller will watch all ingress resources in the k8s cluster. |
@@ -193,13 +196,14 @@ The command removes all the Kubernetes components associated with the chart and
 | global.multiCluster.clusterName | string | `""` | Should be set to the name of the cluster this installation will run in. This is required for sidecar injection to properly label proxies |
 | global.multiCluster.enabled | bool | `true` | Set to true to connect two kubernetes clusters via their respective ingressgateway services when pods in each cluster cannot directly talk to one another. All clusters should be using Istio mTLS and must have a shared root CA for this model to work. |
 | global.network | string | `""` | Network defines the network this cluster belong to. This name corresponds to the networks in the map of mesh networks. |
-| global.o11y | object | `{"enabled":false,"promtail":{"image":{"repository":"higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/promtail","tag":"2.9.4"},"port":3101,"resources":{"limits":{"cpu":"500m","memory":"2Gi"}},"securityContext":{}}}` | Observability (o11y) configurations |
+| global.o11y | object | `{"enabled":false,"promtail":{"image":{"repository":"","tag":"2.9.4"},"port":3101,"resources":{"limits":{"cpu":"500m","memory":"2Gi"}},"securityContext":{}}}` | Observability (o11y) configurations |
 | global.omitSidecarInjectorConfigMap | bool | `false` |  |
 | global.onDemandRDS | bool | `false` |  |
 | global.oneNamespace | bool | `false` | Whether to restrict the applications namespace the controller manages; If not set, controller watches all namespaces |
 | global.onlyPushRouteCluster | bool | `true` |  |
 | global.operatorManageWebhooks | bool | `false` | Configure whether Operator manages webhook configurations. The current behavior of Istiod is to manage its own webhook configurations. When this option is set as true, Istio Operator, instead of webhooks, manages the webhook configurations. When this option is set as false, webhooks manage their own webhook configurations. |
 | global.pilotCertProvider | string | `"istiod"` | Configure the certificate provider for control plane communication. Currently, two providers are supported: "kubernetes" and "istiod". As some platforms may not have kubernetes signing APIs, Istiod is the default |
+| global.pluginNamespace | string | `"plugins"` | Namespace for built-in plugin images. Default is "plugins". Used by higress-console to configure plugin image path. |
 | global.priorityClassName | string | `""` | Kubernetes >=v1.11.0 will create two PriorityClass, including system-cluster-critical and system-node-critical, it is better to configure this in order to make sure your Istio pods will not be killed because of low priority class. Refer to https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass for more detail. |
 | global.proxy.autoInject | string | `"enabled"` | This controls the 'policy' in the sidecar injector. |
 | global.proxy.clusterDomain | string | `"cluster.local"` | CAUTION: It is important to ensure that all Istio helm charts specify the same clusterDomain value cluster domain. Default value is "cluster.local". |
@@ -245,7 +249,7 @@ The command removes all the Kubernetes components associated with the chart and
 | global.watchNamespace | string | `""` | If not empty, Higress Controller will only watch resources in the specified namespace. When isolating different business systems using K8s namespace, if each namespace requires a standalone gateway instance, this parameter can be used to confine the Ingress watching of Higress within the given namespace. |
 | global.xdsMaxRecvMsgSize | string | `"104857600"` |  |
 | gzip | object | `{"chunkSize":4096,"compressionLevel":"BEST_COMPRESSION","compressionStrategy":"DEFAULT_STRATEGY","contentType":["text/html","text/css","text/plain","text/xml","application/json","application/javascript","application/xhtml+xml","image/svg+xml"],"disableOnEtagHeader":true,"enable":true,"memoryLevel":5,"minContentLength":1024,"windowBits":12}` | Gzip compression settings |
-| hub | string | `"higress-registry.cn-hangzhou.cr.aliyuncs.com/higress"` |  |
+| hub | string | `""` |  |
 | meshConfig | object | `{"enablePrometheusMerge":true,"rootNamespace":null,"trustDomain":"cluster.local"}` | meshConfig defines runtime configuration of components, including Istiod and istio-agent behavior See https://istio.io/docs/reference/config/istio.mesh.v1alpha1/ for all available options |
 | meshConfig.rootNamespace | string | `nil` | The namespace to treat as the administrative root namespace for Istio configuration. When processing a leaf namespace Istio will search for declarations in that namespace first and if none are found it will search in the root namespace. Any matching declaration found in the root namespace is processed as if it were declared in the leaf namespace. |
 | meshConfig.trustDomain | string | `"cluster.local"` | The trust domain corresponds to the trust root of a system Refer to https://github.com/spiffe/spiffe/blob/master/standards/SPIFFE-ID.md#21-trust-domain |
@@ -262,7 +266,7 @@ The command removes all the Kubernetes components associated with the chart and
 | pilot.env.PILOT_ENABLE_METADATA_EXCHANGE | string | `"false"` |  |
 | pilot.env.PILOT_SCOPE_GATEWAY_TO_NAMESPACE | string | `"false"` |  |
 | pilot.env.VALIDATION_ENABLED | string | `"false"` |  |
-| pilot.hub | string | `"higress-registry.cn-hangzhou.cr.aliyuncs.com/higress"` |  |
+| pilot.hub | string | `""` |  |
 | pilot.image | string | `"pilot"` | Can be a full hub/image:tag |
 | pilot.jwksResolverExtraRootCA | string | `""` | You can use jwksResolverExtraRootCA to provide a root certificate in PEM format. This will then be trusted by pilot when resolving JWKS URIs. |
 | pilot.keepaliveMaxServerConnectionAge | string | `"30m"` | The following is used to limit how long a sidecar can be connected to a pilot. It balances out load across pilot instances at the cost of increasing system churn. |
@@ -277,7 +281,7 @@ The command removes all the Kubernetes components associated with the chart and
 | pilot.serviceAnnotations | object | `{}` |  |
 | pilot.tag | string | `""` |  |
 | pilot.traceSampling | float | `1` |  |
-| pluginServer.hub | string | `"higress-registry.cn-hangzhou.cr.aliyuncs.com/higress"` |  |
+| pluginServer.hub | string | `""` |  |
 | pluginServer.image | string | `"plugin-server"` |  |
 | pluginServer.imagePullSecrets | list | `[]` |  |
 | pluginServer.labels | object | `{}` |  |
--- a/helm/higress/README.zh.md
+++ b/helm/higress/README.zh.md
@@ -0,0 +1,152 @@
+## Higress 适用于 Kubernetes
+
+Higress 是基于阿里巴巴内部网关实践的云原生 API 网关。
+
+通过 Istio 和 Envoy 的支持，Higress 实现了流量网关、微服务网关和安全网关三种架构的融合，从而极大地减少了部署、运维的成本。
+
+## 设置仓库信息
+
+```console
+helm repo add higress.io https://higress.io/helm-charts
+helm repo update
+```
+
+## 安装
+
+使用 Helm 安装名为 `higress` 的组件：
+
+```console
+helm install higress -n higress-system higress.io/higress --create-namespace --render-subchart-notes
+```
+
+## 卸载
+
+删除名称为 higress 的安装：
+
+```console
+helm delete higress -n higress-system
+```
+
+该命令将删除与组件关联的所有 Kubernetes 组件并卸载该发行版。
+
+## 参数
+
+## Values
+
+| 键 | 类型 | 默认值 | 描述 |
+|----|------|---------|-------------|
+| clusterName | string | `""` | 集群名 |
+| controller.affinity | object | `{}` | 控制器亲和性设置 |
+| controller.automaticHttps.email | string | `""` | 自动 HTTPS 所需的邮件 |
+| controller.automaticHttps.enabled | bool | `true` | 是否启用自动 HTTPS 功能 |
+| controller.autoscaling.enabled | bool | `false` | 是否启用控制器的自动扩展功能 |
+| controller.autoscaling.maxReplicas | int | `5` | 最大副本数 |
+| controller.autoscaling.minReplicas | int | `1` | 最小副本数 |
+| controller.autoscaling.targetCPUUtilizationPercentage | int | `80` | 目标 CPU 使用率百分比 |
+| controller.env | object | `{}` | 环境变量 |
+| controller.hub | string | `"higress-registry.cn-hangzhou.cr.aliyuncs.com/higress"` | 图像库的基础地址 |
+| controller.image | string | `"higress"` | 镜像名称 |
+| controller.imagePullSecrets | list | `[]` | 拉取秘钥列表 |
+| controller.labels | object | `{}` | 标签 |
+| controller.name | string | `"higress-controller"` | 控制器名称 |
+| controller.nodeSelector | object | `{}` | 节点选择器 |
+| controller.podAnnotations | object | `{}` | Pod 注解 |
+| controller.podLabels | object | `{}` | 应用到 Pod 上的标签 |
+| controller.podSecurityContext | object | `{}` | Pod 安全上下文 |
+| controller.ports[0].name | string | `"http"` | 端口名称 |
+| controller.ports[0].port | int | `8888` | 端口编号 |
+| controller.ports[0].protocol | string | `"TCP"` | 协议类型 |
+| controller.ports[0].targetPort | int | `8888` | 目标端口 |
+| controller.ports[1].name | string | `"http-solver"` | 端口名称 |
+| controller.ports[1].port | int | `8889` | 端口编号 |
+| controller.ports[1].protocol | string | `"TCP"` | 协议类型 |
+| controller.ports[1].targetPort | int | `8889` | 目标端口 |
+| controller.ports[2].name | string | `"grpc"` | 端口名称 |
+| controller.ports[2].port | int | `15051` | 端口编号 |
+| controller.ports[2].protocol | string | `"TCP"` | 协议类型 |
+| controller.ports[2].targetPort | int | `15051` | 目标端口 |
+| controller.probe.httpGet.path | string | `"/ready"` | 运行状况检查路径 |
+| controller.probe.httpGet.port | int | `8888` | 端口运行状态检查 |
+| controller.probe.initialDelaySeconds | int | `1` | 初始延迟秒数 |
+| controller.probe.periodSeconds | int | `3` | 健康检查间隔秒数 |
+| controller.probe.timeoutSeconds | int | `5` | 超时秒数 |
+| controller.rbac.create | bool | `true` | 是否创建 RBAC 相关资源 |
+| controller.replicas | int | `1` | Higress 控制器 Pod 的数量 |
+| controller.resources.limits.cpu | string | `"1000m"` | CPU 上限 |
+| controller.resources.limits.memory | string | `"2048Mi"` | 内存上限 |
+| controller.resources.requests.cpu | string | `"500m"` | CPU 请求量 |
+| controller.resources.requests.memory | string | `"2048Mi"` | 内存请求量 |
+| controller.securityContext | object | `{}` | 安全上下文 |
+| controller.service.type | string | `"ClusterIP"` | 服务类型 |
+| controller.serviceAccount.annotations | object | `{}` | 添加到服务帐户的注解 |
+| controller.serviceAccount.create | bool | `true` | 是否创建服务帐户 |
+| controller.serviceAccount.name | string | `""` | 如果未设置且 create 为 true，则从 fullname 模板生成名称 |
+| controller.tag | string | `""` | 标记 |
+| controller.tolerations | list | `[]` | 受容容忍度列表 |
+| downstream.connectionBufferLimits | int | `32768` | 下游连接缓冲区限制（字节） |
+| downstream.http2.initialConnectionWindowSize | int | `1048576` | HTTP/2 初始连接窗口大小 |
+| downstream.http2.initialStreamWindowSize | int | `65535` | 流初始窗口大小 |
+| downstream.http2.maxConcurrentStreams | int | `100` | 并发流最大数量 |
+| downstream.idleTimeout | int | `180` | 空闲超时时间（秒） |
+| downstream.maxRequestHeadersKb | int | `60` | 最大请求头大小（KB） |
+| downstream.routeTimeout | int | `0` | 路由超时时间 |
+| gateway.affinity | object | `{}` | 网关的节点亲和性 |
+| gateway.annotations | object | `{}` | 应用于所有资源的注解 |
+| gateway.autoscaling.enabled | bool | `false` | 启用网关的自动扩展功能 |
+| gateway.autoscaling.maxReplicas | int | `5` | 最大副本数 |
+| gateway.autoscaling.minReplicas | int | `1` | 最小副本数 |
+| gateway.autoscaling.targetCPUUtilizationPercentage | int | `80` | CPU 使用率的目标百分比 |
+| gateway.containerSecurityContext | string | `nil` | 网关容器的安全配置上下文 |
+| gateway.env | object | `{}` | Pod 环境变量 |
+| gateway.hostNetwork | bool | `false` | 是否使用主机网络 |
+| gateway.httpPort | int | `80` | HTTP 服务端口 |
+| gateway.httpsPort | int | `443` | HTTPS 服务端口 |
+| gateway.hub | string | `"higress-registry.cn-hangzhou.cr.aliyuncs.com/higress"` | 网关镜像的基础域名 |
+| gateway.image | string | `"gateway"` |  |
+| gateway.kind | string | `"Deployment"` | 部署类型 |
+| gateway.labels | object | `{}` | 应用于所有资源的标签 |
+| gateway.metrics.enabled | bool | `false` | 启用网关度量收集 |
+| gateway.metrics.honorLabels | bool | `false` | 是否合并现有标签 |
+| gateway.metrics.interval | string | `""` | 度量间隔时间 |
+| gateway.metrics.provider | string | `"monitoring.coreos.com"` | 定义监控提供者 |
+| gateway.metrics.rawSpec | object | `{}` | 额外的度量规范 |
+| gateway.metrics.relabelConfigs | list | `[]` | 重新标签配置 |
+| gateway.metrics.relabelings | list | `[]` | 重新标签项 |
+| gateway.metrics.podMonitorSelector | object | `{"release":"kube-prometheus-stack"}` | PodMonitor 选择器，当使用 prometheus stack 的podmonitor自动发现时，选择器必须匹配标签 "release: kube-prome"，这是 kube-prometheus-stack 的默认设置 |
+| gateway.metrics.scrapeTimeout | string | `""` | 抓取的超时时间 |
+| gateway.name | string | `"higress-gateway"` | 网关名称 |
+| gateway.networkGateway | string | `""` | 网络网关指定 |
+| gateway.nodeSelector | object | `{}` | 节点选择器 |
+| gateway.replicas | int | `2` | Higress Gateway pod 的数量 |
+| gateway.resources.limits.cpu | string | `"2000m"` | 容器资源限制的 CPU |
+| gateway.resources.limits.memory | string | `"2048Mi"` | 容器资源限制的内存 |
+| gateway.resources.requests.cpu | string | `"2000m"` | 容器资源请求的 CPU |
+| gateway.resources.requests.memory | string | `"2048Mi"` | 容器资源请求的内存 |
+| gateway.revision | string | `""` | 网关所属版本声明 |
+| gateway.rollingMaxSurge | string | `"100%"` | 最大激增数目百分比 |
+| gateway.rollingMaxUnavailable | string | `"25%"` | 最大不可用比例 |
+| gateway.readinessFailureThreshold | int | `30` | 成功尝试之前连续失败的最大探测次数 |
+| gateway.readinessInitialDelaySeconds | int | `1` | 初次检测推迟多少秒后开始探测存活状态 |
+| gateway.readinessPeriodSeconds | int | `2` | 存活探测间隔秒数 |
+| gateway.readinessSuccessThreshold | int | `1` | 认为成功之前连续成功最小探测次数 |
+| gateway.readinessTimeoutSeconds | int | `3` | 存活探测超时秒数 |
+| gateway.securityContext | string | `nil` | 客户豆荚的安全上下文 |
+| gateway.service.annotations | object | `{}` | 应用于服务账户的注释 |
+| gateway.service.externalTrafficPolicy | string | `""` | 外部路由策略 |
+| gateway.service.loadBalancerClass | string | `""` | 负载均衡器类别 |
+| gateway.service.loadBalancerIP | string | `""` | 负载均衡器 IP 地址 |
+| gateway.service.loadBalancerSourceRanges | list | `[]` | 允许访问负载均衡器的 CIDR 范围 |
+| gateway.service.ports[0].name | string | `"http2"` | 服务定义的端口名称 |
+| gateway.service.ports[0].port | int | `80` | 服务端口 |
+| gateway.service.ports[0].protocol | string | `"TCP"` | 协议 |
+| gateway.service.ports[0].targetPort | int | `80` | 靶向端口 |
+| gateway.service.ports[1].name | string | `"https"` | 服务定义的端口名称 |
+| gateway.service.ports[1].port | int | `443` | 服务端口 |
+| gateway.service.ports[1].protocol | string | `"TCP"` | 协议 |
+| gateway.service.ports[1].targetPort | int | `443` | 靶向端口 |
+| gateway.service.type | string | `"LoadBalancer"` | 服务类型 |
+| global.disableAlpnH2 | bool | `false` | 设置是否禁用 ALPN 中的 http/2 |
+| global.enableInferenceExtension | bool | `false` | 是否启用 Gateway API Inference Extension 支持 |
+| ... | ... | ... | ... |
+
+由于内容较多，其他参数可以参考完整表。
--- a/hgctl/go.mod
+++ b/hgctl/go.mod
@@ -64,16 +64,16 @@ require (
 	github.com/containerd/platforms v0.2.1 // indirect
 	github.com/containerd/ttrpc v1.2.7 // indirect
 	github.com/envoyproxy/go-control-plane/contrib v0.0.0-20251016030003-90eca0228178 // indirect
-	github.com/envoyproxy/go-control-plane/envoy v1.35.0 // indirect
+	github.com/envoyproxy/go-control-plane/envoy v1.36.0 // indirect
 	github.com/fxamacker/cbor/v2 v2.9.0 // indirect
-	github.com/go-jose/go-jose/v4 v4.1.2 // indirect
+	github.com/go-jose/go-jose/v4 v4.1.3 // indirect
 	github.com/gorilla/websocket v1.5.4-0.20250319132907-e064f32e3674 // indirect
 	github.com/moby/sys/userns v0.1.0 // indirect
 	github.com/mxk/go-flowrate v0.0.0-20140419014527-cca7078d478f // indirect
 	github.com/planetscale/vtprotobuf v0.6.1-0.20240409071808-615f978279ca // indirect
 	github.com/santhosh-tekuri/jsonschema/v6 v6.0.2 // indirect
 	github.com/x448/float16 v0.8.4 // indirect
-	go.opentelemetry.io/auto/sdk v1.1.0 // indirect
+	go.opentelemetry.io/auto/sdk v1.2.1 // indirect
 	go.yaml.in/yaml/v2 v2.4.3 // indirect
 	go.yaml.in/yaml/v3 v3.0.4 // indirect
 	gopkg.in/evanphx/json-patch.v4 v4.13.0 // indirect
@@ -111,10 +111,9 @@ require (
 	github.com/beorn7/perks v1.0.1 // indirect
 	github.com/bmatcuk/doublestar/v4 v4.6.0 // indirect
 	github.com/buger/goterm v1.0.4 // indirect
-	github.com/census-instrumentation/opencensus-proto v0.4.1 // indirect
 	github.com/cespare/xxhash/v2 v2.3.0 // indirect
 	github.com/chai2010/gettext-go v1.0.3 // indirect
-	github.com/cncf/xds/go v0.0.0-20250501225837-2ac532fd4443 // indirect
+	github.com/cncf/xds/go v0.0.0-20251110193048-8bfbf64dc13e // indirect
 	github.com/containerd/console v1.0.3 // indirect
 	github.com/containerd/containerd v1.7.27 // indirect
 	github.com/containerd/continuity v0.4.4 // indirect
@@ -132,7 +131,7 @@ require (
 	github.com/docker/go-metrics v0.0.1 // indirect
 	github.com/docker/go-units v0.5.0 // indirect
 	github.com/emicklei/go-restful/v3 v3.13.0 // indirect
-	github.com/envoyproxy/protoc-gen-validate v1.2.1 // indirect
+	github.com/envoyproxy/protoc-gen-validate v1.3.0 // indirect
 	github.com/evanphx/json-patch v5.9.11+incompatible // indirect
 	github.com/exponent-io/jsonpath v0.0.0-20210407135951-1de76d718b3f // indirect
 	github.com/felixge/httpsnoop v1.0.4 // indirect
@@ -152,7 +151,7 @@ require (
 	github.com/gofrs/flock v0.12.1 // indirect
 	github.com/gogo/googleapis v1.4.1 // indirect
 	github.com/gogo/protobuf v1.3.2 // indirect
-	github.com/golang/mock v1.6.0 // indirect
+	github.com/golang/mock v1.7.0-rc.1 // indirect
 	github.com/golang/protobuf v1.5.4 // indirect
 	github.com/google/btree v1.1.3 // indirect
 	github.com/google/cel-go v0.26.0 // indirect
@@ -162,7 +161,6 @@ require (
 	github.com/google/uuid v1.6.0 // indirect
 	github.com/gorilla/mux v1.8.1 // indirect
 	github.com/gosuri/uitable v0.0.4 // indirect
-	github.com/grafana/regexp v0.0.0-20250905093917-f7b3be9d1853 // indirect
 	github.com/gregjones/httpcache v0.0.0-20190611155906-901d90724c79 // indirect
 	github.com/grpc-ecosystem/go-grpc-middleware v1.4.0 // indirect
 	github.com/grpc-ecosystem/grpc-gateway/v2 v2.27.2 // indirect
@@ -231,7 +229,6 @@ require (
 	github.com/prometheus/client_model v0.6.2 // indirect
 	github.com/prometheus/common v0.67.1 // indirect
 	github.com/prometheus/procfs v0.17.0 // indirect
-	github.com/prometheus/prometheus v0.307.1 // indirect
 	github.com/rivo/uniseg v0.4.7 // indirect
 	github.com/rubenv/sql-migrate v1.8.0 // indirect
 	github.com/russross/blackfriday/v2 v2.1.0 // indirect
@@ -268,26 +265,26 @@ require (
 	go.opentelemetry.io/otel/sdk v1.38.0 // indirect
 	go.opentelemetry.io/otel/sdk/metric v1.38.0 // indirect
 	go.opentelemetry.io/otel/trace v1.38.0 // indirect
-	go.opentelemetry.io/proto/otlp v1.7.1 // indirect
+	go.opentelemetry.io/proto/otlp v1.9.0 // indirect
 	go.uber.org/atomic v1.11.0 // indirect
 	go.uber.org/multierr v1.11.0 // indirect
 	go.uber.org/zap v1.27.0 // indirect
-	golang.org/x/crypto v0.42.0 // indirect
+	golang.org/x/crypto v0.44.0 // indirect
 	golang.org/x/exp v0.0.0-20250808145144-a408d31f581a // indirect
-	golang.org/x/mod v0.28.0 // indirect
-	golang.org/x/net v0.44.0 // indirect
-	golang.org/x/oauth2 v0.31.0 // indirect
-	golang.org/x/sync v0.17.0 // indirect
-	golang.org/x/sys v0.36.0 // indirect
-	golang.org/x/term v0.35.0 // indirect
-	golang.org/x/text v0.29.0 // indirect
+	golang.org/x/mod v0.29.0 // indirect
+	golang.org/x/net v0.47.0 // indirect
+	golang.org/x/oauth2 v0.32.0 // indirect
+	golang.org/x/sync v0.18.0 // indirect
+	golang.org/x/sys v0.38.0 // indirect
+	golang.org/x/term v0.37.0 // indirect
+	golang.org/x/text v0.31.0 // indirect
 	golang.org/x/time v0.13.0 // indirect
-	golang.org/x/tools v0.37.0 // indirect
+	golang.org/x/tools v0.38.0 // indirect
 	google.golang.org/genproto v0.0.0-20250603155806-513f23925822 // indirect
-	google.golang.org/genproto/googleapis/api v0.0.0-20250929231259-57b25ae835d4 // indirect
-	google.golang.org/genproto/googleapis/rpc v0.0.0-20250922171735-9219d122eba9 // indirect
-	google.golang.org/grpc v1.76.0 // indirect
-	google.golang.org/protobuf v1.36.10 // indirect
+	google.golang.org/genproto/googleapis/api v0.0.0-20251029180050-ab9386a59fda // indirect
+	google.golang.org/genproto/googleapis/rpc v0.0.0-20251029180050-ab9386a59fda // indirect
+	google.golang.org/grpc v1.78.0 // indirect
+	google.golang.org/protobuf v1.36.11 // indirect
 	gopkg.in/inf.v0 v0.9.1 // indirect
 	gopkg.in/natefinch/lumberjack.v2 v2.2.1 // indirect
 	istio.io/api v1.27.1-0.20250820125923-f5a5d3a605a9 // indirect
--- a/hgctl/go.sum
+++ b/hgctl/go.sum
--- a/istio/istio
+++ b/istio/istio
--- a/istio/proxy
+++ b/istio/proxy
--- a/pkg/bootstrap/server.go
+++ b/pkg/bootstrap/server.go
@@ -16,12 +16,13 @@ package bootstrap

 import (
 	"fmt"
-	"istio.io/istio/pkg/config/mesh/meshwatcher"
-	"istio.io/istio/pkg/kube/krt"
 	"net"
 	"net/http"
 	"time"

+	"istio.io/istio/pkg/config/mesh/meshwatcher"
+	"istio.io/istio/pkg/kube/krt"
+
 	prometheus "github.com/grpc-ecosystem/go-grpc-prometheus"
 	"google.golang.org/grpc"
 	"google.golang.org/grpc/reflection"
@@ -436,10 +437,17 @@ func (s *Server) initHttpServer() error {
 	}
 	s.xdsServer.AddDebugHandlers(s.httpMux, nil, true, nil)
 	s.httpMux.HandleFunc("/ready", s.readyHandler)
-	s.httpMux.HandleFunc("/registry/watcherStatus", s.registryWatcherStatusHandler)
+	s.httpMux.HandleFunc("/registry/watcherStatus", s.withConditionalAuth(s.registryWatcherStatusHandler))
 	return nil
 }

+func (s *Server) withConditionalAuth(handler http.HandlerFunc) http.HandlerFunc {
+	if features.DebugAuth {
+		return s.xdsServer.AllowAuthenticatedOrLocalhost(handler)
+	}
+	return handler
+}
+
 // readyHandler checks whether the http server is ready
 func (s *Server) readyHandler(w http.ResponseWriter, _ *http.Request) {
 	for name, fn := range s.readinessProbes {
--- a/plugins/golang-filter/mcp-session/config.go
+++ b/plugins/golang-filter/mcp-session/config.go
@@ -26,8 +26,8 @@ type config struct {
 	matchList             []common.MatchRule
 	enableUserLevelServer bool
 	rateLimitConfig       *handler.MCPRatelimitConfig
-	defaultServer         *common.SSEServer
 	redisClient           *common.RedisClient
+	sharedMCPServer       *common.MCPServer // Created once, thread-safe with sync.RWMutex
 }

 func (c *config) Destroy() {
@@ -110,6 +110,9 @@ func (p *Parser) Parse(any *anypb.Any, callbacks api.ConfigCallbackHandler) (int
 	}
 	GlobalSSEPathSuffix = ssePathSuffix

+	// Create shared MCPServer once during config parsing (thread-safe with sync.RWMutex)
+	conf.sharedMCPServer = common.NewMCPServer(DefaultServerName, Version)
+
 	return conf, nil
 }

@@ -125,9 +128,6 @@ func (p *Parser) Merge(parent interface{}, child interface{}) interface{} {
 	if childConfig.rateLimitConfig != nil {
 		newConfig.rateLimitConfig = childConfig.rateLimitConfig
 	}
-	if childConfig.defaultServer != nil {
-		newConfig.defaultServer = childConfig.defaultServer
-	}
 	return &newConfig
 }

--- a/plugins/golang-filter/mcp-session/filter.go
+++ b/plugins/golang-filter/mcp-session/filter.go
@@ -37,6 +37,7 @@ type filter struct {
 	skipRequestBody    bool
 	skipResponseBody   bool
 	cachedResponseBody []byte
+	sseServer          *common.SSEServer // SSE server instance for this filter (per-request, not shared)

 	userLevelConfig     bool
 	mcpConfigHandler    *handler.MCPConfigHandler
@@ -135,11 +136,13 @@ func (f *filter) processMcpRequestHeadersForRestUpstream(header api.RequestHeade
 			trimmed += "?" + rq
 		}

-		f.config.defaultServer = common.NewSSEServer(common.NewMCPServer(DefaultServerName, Version),
+		// Create SSE server instance for this filter (per-request, not shared)
+		// MCPServer is shared (thread-safe), but SSEServer must be per-request (contains request-specific messageEndpoint)
+		f.sseServer = common.NewSSEServer(f.config.sharedMCPServer,
 			common.WithSSEEndpoint(GlobalSSEPathSuffix),
 			common.WithMessageEndpoint(trimmed),
 			common.WithRedisClient(f.config.redisClient))
-		f.serverName = f.config.defaultServer.GetServerName()
+		f.serverName = f.sseServer.GetServerName()
 		body := "SSE connection create"
 		f.callbacks.DecoderFilterCallbacks().SendLocalReply(http.StatusOK, body, nil, 0, "")
 	}
@@ -275,9 +278,9 @@ func (f *filter) encodeDataFromRestUpstream(buffer api.BufferInstance, endStream

 	if f.serverName != "" {
 		if f.config.redisClient != nil {
-			// handle default server
+			// handle SSE server for this filter instance
 			buffer.Reset()
-			f.config.defaultServer.HandleSSE(f.callbacks, f.stopChan)
+			f.sseServer.HandleSSE(f.callbacks, f.stopChan)
 			return api.Running
 		} else {
 			_ = buffer.SetString(RedisNotEnabledResponseBody)
--- a/plugins/wasm-go/extensions/ai-load-balancer/global_least_request/lb_policy.go
+++ b/plugins/wasm-go/extensions/ai-load-balancer/global_least_request/lb_policy.go
@@ -16,40 +16,91 @@ import (
 )

 const (
-	RedisKeyFormat = "higress:global_least_request_table:%s:%s"
-	RedisLua       = `local seed = KEYS[1]
+	RedisKeyFormat          = "higress:global_least_request_table:%s:%s"
+	RedisLastCleanKeyFormat = "higress:global_least_request_table:last_clean_time:%s:%s"
+	RedisLua                = `local seed = tonumber(KEYS[1])
 local hset_key = KEYS[2]
-local current_target = KEYS[3]
-local current_count = 0
+local last_clean_key = KEYS[3]
+local clean_interval = tonumber(KEYS[4])
+local current_target = KEYS[5]
+local healthy_count = tonumber(KEYS[6])
+local enable_detail_log = KEYS[7]

 math.randomseed(seed)

-local function randomBool()
-    return math.random() >= 0.5
-end
+-- 1. Selection
+local current_count = 0
+local same_count_hits = 0

-if redis.call('HEXISTS', hset_key, current_target) == 1 then
-    current_count = redis.call('HGET', hset_key, current_target)
-	for i = 4, #KEYS do
-		if redis.call('HEXISTS', hset_key, KEYS[i]) == 1 then
-			local count = redis.call('HGET', hset_key, KEYS[i])
-			if tonumber(count) < tonumber(current_count) then
-				current_target = KEYS[i]
-				current_count = count
-			elseif count == current_count and randomBool() then
-				current_target = KEYS[i]
-			end
-		end
-	end
+for i = 8, 8 + healthy_count - 1 do
+    local host = KEYS[i]
+    local count = 0
+    local val = redis.call('HGET', hset_key, host)
+    if val then
+        count = tonumber(val) or 0
+    end
+    
+    if same_count_hits == 0 or count < current_count then
+        current_target = host
+        current_count = count
+        same_count_hits = 1
+    elseif count == current_count then
+        same_count_hits = same_count_hits + 1
+        if math.random(same_count_hits) == 1 then
+            current_target = host
+        end
+    end
 end

 redis.call("HINCRBY", hset_key, current_target, 1)
+local new_count = redis.call("HGET", hset_key, current_target)

-return current_target`
+-- Collect host counts for logging
+local host_details = {}
+if enable_detail_log == "1" then
+    local fields = {}
+    for i = 8, #KEYS do
+        table.insert(fields, KEYS[i])
+    end
+    if #fields > 0 then
+        local values = redis.call('HMGET', hset_key, (table.unpack or unpack)(fields))
+        for i, val in ipairs(values) do
+            table.insert(host_details, fields[i])
+            table.insert(host_details, tostring(val or 0))
+        end
+    end
+end
+
+-- 2. Cleanup
+local current_time = math.floor(seed / 1000000)
+local last_clean_time = tonumber(redis.call('GET', last_clean_key) or 0)
+
+if current_time - last_clean_time >= clean_interval then
+    local all_keys = redis.call('HKEYS', hset_key)
+    if #all_keys > 0 then
+        -- Create a lookup table for current hosts (from index 8 onwards)
+        local current_hosts = {}
+        for i = 8, #KEYS do
+            current_hosts[KEYS[i]] = true
+        end
+        -- Remove keys not in current hosts
+        for _, host in ipairs(all_keys) do
+            if not current_hosts[host] then
+                redis.call('HDEL', hset_key, host)
+            end
+        end
+    end
+    redis.call('SET', last_clean_key, current_time)
+end
+
+return {current_target, new_count, host_details}`
 )

 type GlobalLeastRequestLoadBalancer struct {
-	redisClient wrapper.RedisClient
+	redisClient     wrapper.RedisClient
+	maxRequestCount int64
+	cleanInterval   int64 // seconds
+	enableDetailLog bool
 }

 func NewGlobalLeastRequestLoadBalancer(json gjson.Result) (GlobalLeastRequestLoadBalancer, error) {
@@ -72,6 +123,18 @@ func NewGlobalLeastRequestLoadBalancer(json gjson.Result) (GlobalLeastRequestLoa
 	}
 	// database default is 0
 	database := json.Get("database").Int()
+	lb.maxRequestCount = json.Get("maxRequestCount").Int()
+	lb.cleanInterval = json.Get("cleanInterval").Int()
+	if lb.cleanInterval == 0 {
+		lb.cleanInterval = 60 * 60 // default 60 minutes
+	} else {
+		lb.cleanInterval = lb.cleanInterval * 60 // convert minutes to seconds
+	}
+	lb.enableDetailLog = true
+	if val := json.Get("enableDetailLog"); val.Exists() {
+		lb.enableDetailLog = val.Bool()
+	}
+	log.Infof("redis client init, serviceFQDN: %s, servicePort: %d, timeout: %d, database: %d, maxRequestCount: %d, cleanInterval: %d minutes, enableDetailLog: %v", serviceFQDN, servicePort, timeout, database, lb.maxRequestCount, lb.cleanInterval/60, lb.enableDetailLog)
 	return lb, lb.redisClient.Init(username, password, int64(timeout), wrapper.WithDataBase(int(database)))
 }

@@ -100,9 +163,11 @@ func (lb GlobalLeastRequestLoadBalancer) HandleHttpRequestBody(ctx wrapper.HttpC
 		ctx.SetContext("error", true)
 		return types.ActionContinue
 	}
+	allHostMap := make(map[string]struct{})
 	// Only healthy host can be selected
 	healthyHostArray := []string{}
 	for _, hostInfo := range hostInfos {
+		allHostMap[hostInfo[0]] = struct{}{}
 		if gjson.Get(hostInfo[1], "health_status").String() == "Healthy" {
 			healthyHostArray = append(healthyHostArray, hostInfo[0])
 		}
@@ -113,10 +178,37 @@ func (lb GlobalLeastRequestLoadBalancer) HandleHttpRequestBody(ctx wrapper.HttpC
 	}
 	randomIndex := rand.Intn(len(healthyHostArray))
 	hostSelected := healthyHostArray[randomIndex]
-	keys := []interface{}{time.Now().UnixMicro(), fmt.Sprintf(RedisKeyFormat, routeName, clusterName), hostSelected}
+
+	// KEYS structure: [seed, hset_key, last_clean_key, clean_interval, host_selected, healthy_count, ...healthy_hosts, enableDetailLog, ...unhealthy_hosts]
+	keys := []interface{}{
+		time.Now().UnixMicro(),
+		fmt.Sprintf(RedisKeyFormat, routeName, clusterName),
+		fmt.Sprintf(RedisLastCleanKeyFormat, routeName, clusterName),
+		lb.cleanInterval,
+		hostSelected,
+		len(healthyHostArray),
+		"0",
+	}
+	if lb.enableDetailLog {
+		keys[6] = "1"
+	}
 	for _, v := range healthyHostArray {
 		keys = append(keys, v)
 	}
+	// Append unhealthy hosts (those in allHostMap but not in healthyHostArray)
+	for host := range allHostMap {
+		isHealthy := false
+		for _, hh := range healthyHostArray {
+			if host == hh {
+				isHealthy = true
+				break
+			}
+		}
+		if !isHealthy {
+			keys = append(keys, host)
+		}
+	}
+
 	err = lb.redisClient.Eval(RedisLua, len(keys), keys, []interface{}{}, func(response resp.Value) {
 		if err := response.Error(); err != nil {
 			log.Errorf("HGetAll failed: %+v", err)
@@ -124,17 +216,54 @@ func (lb GlobalLeastRequestLoadBalancer) HandleHttpRequestBody(ctx wrapper.HttpC
 			proxywasm.ResumeHttpRequest()
 			return
 		}
-		hostSelected = response.String()
+		valArray := response.Array()
+		if len(valArray) < 2 {
+			log.Errorf("redis eval lua result format error, expect at least [host, count], got: %+v", valArray)
+			ctx.SetContext("error", true)
+			proxywasm.ResumeHttpRequest()
+			return
+		}
+		hostSelected = valArray[0].String()
+		currentCount := valArray[1].Integer()
+
+		// detail log
+		if lb.enableDetailLog && len(valArray) >= 3 {
+			detailLogStr := "host and count: "
+			details := valArray[2].Array()
+			for i := 0; i+1 < len(details); i += 2 {
+				h := details[i].String()
+				c := details[i+1].String()
+				detailLogStr += fmt.Sprintf("{%s: %s}, ", h, c)
+			}
+			log.Debugf("host_selected: %s + 1, %s", hostSelected, detailLogStr)
+		}
+
+		// check rate limit
+		if !lb.checkRateLimit(hostSelected, int64(currentCount), ctx, routeName, clusterName) {
+			ctx.SetContext("error", true)
+			log.Warnf("host_selected: %s, current_count: %d, exceed max request limit %d", hostSelected, currentCount, lb.maxRequestCount)
+			// return 429
+			proxywasm.SendHttpResponse(429, [][2]string{}, []byte("Exceeded maximum request limit from ai-load-balancer."), -1)
+			ctx.DontReadResponseBody()
+			return
+		}
+
 		if err := proxywasm.SetUpstreamOverrideHost([]byte(hostSelected)); err != nil {
 			ctx.SetContext("error", true)
 			log.Errorf("override upstream host failed, fallback to default lb policy, error informations: %+v", err)
+			proxywasm.ResumeHttpRequest()
+			return
 		}
+
 		log.Debugf("host_selected: %s", hostSelected)
+
+		// finally resume the request
 		ctx.SetContext("host_selected", hostSelected)
 		proxywasm.ResumeHttpRequest()
 	})
 	if err != nil {
 		ctx.SetContext("error", true)
+		log.Errorf("redis eval failed, fallback to default lb policy, error informations: %+v", err)
 		return types.ActionContinue
 	}
 	return types.ActionPause
@@ -161,7 +290,10 @@ func (lb GlobalLeastRequestLoadBalancer) HandleHttpStreamDone(ctx wrapper.HttpCo
 		if host_selected == "" {
 			log.Errorf("get host_selected failed")
 		} else {
-			lb.redisClient.HIncrBy(fmt.Sprintf(RedisKeyFormat, routeName, clusterName), host_selected, -1, nil)
+			err := lb.redisClient.HIncrBy(fmt.Sprintf(RedisKeyFormat, routeName, clusterName), host_selected, -1, nil)
+			if err != nil {
+				log.Errorf("host_selected: %s - 1, failed to update count from redis: %v", host_selected, err)
+			}
 		}
 	}
 }
--- a/plugins/wasm-go/extensions/ai-load-balancer/global_least_request/lb_script_test.lua
+++ b/plugins/wasm-go/extensions/ai-load-balancer/global_least_request/lb_script_test.lua
@@ -0,0 +1,220 @@
+-- Mocking Redis environment
+local redis_data = {
+    hset = {},
+    kv = {}
+}
+
+local redis = {
+    call = function(cmd, ...)
+        local args = {...}
+        if cmd == "HGET" then
+            local key, field = args[1], args[2]
+            return redis_data.hset[field]
+        elseif cmd == "HSET" then
+            local key, field, val = args[1], args[2], args[3]
+            redis_data.hset[field] = val
+        elseif cmd == "HINCRBY" then
+            local key, field, increment = args[1], args[2], args[3]
+            local val = tonumber(redis_data.hset[field] or 0)
+            redis_data.hset[field] = tostring(val + increment)
+            return redis_data.hset[field]
+        elseif cmd == "HKEYS" then
+            local keys = {}
+            for k, _ in pairs(redis_data.hset) do
+                table.insert(keys, k)
+            end
+            return keys
+        elseif cmd == "HDEL" then
+            local key, field = args[1], args[2]
+            redis_data.hset[field] = nil
+        elseif cmd == "GET" then
+            return redis_data.kv[args[1]]
+        elseif cmd == "HMGET" then
+            local key = args[1]
+            local res = {}
+            for i = 2, #args do
+                table.insert(res, redis_data.hset[args[i]])
+            end
+            return res
+        elseif cmd == "SET" then
+            redis_data.kv[args[1]] = args[2]
+        end
+    end
+}
+
+-- The actual logic from lb_policy.go
+local function run_lb_logic(KEYS)
+    local seed = tonumber(KEYS[1])
+    local hset_key = KEYS[2]
+    local last_clean_key = KEYS[3]
+    local clean_interval = tonumber(KEYS[4])
+    local current_target = KEYS[5]
+    local healthy_count = tonumber(KEYS[6])
+    local enable_detail_log = KEYS[7]
+
+    math.randomseed(seed)
+
+    -- 1. Selection
+    local current_count = 0
+    local same_count_hits = 0
+
+    for i = 8, 8 + healthy_count - 1 do
+        local host = KEYS[i]
+        local count = 0
+        local val = redis.call('HGET', hset_key, host)
+        if val then
+            count = tonumber(val) or 0
+        end
+        
+        if same_count_hits == 0 or count < current_count then
+            current_target = host
+            current_count = count
+            same_count_hits = 1
+        elseif count == current_count then
+            same_count_hits = same_count_hits + 1
+            if math.random(same_count_hits) == 1 then
+                current_target = host
+            end
+        end
+    end
+
+    redis.call("HINCRBY", hset_key, current_target, 1)
+    local new_count = redis.call("HGET", hset_key, current_target)
+
+    -- Collect host counts for logging
+    local host_details = {}
+    if enable_detail_log == "1" then
+        local fields = {}
+        for i = 8, #KEYS do
+            table.insert(fields, KEYS[i])
+        end
+        if #fields > 0 then
+            local values = redis.call('HMGET', hset_key, (table.unpack or unpack)(fields))
+            for i, val in ipairs(values) do
+                table.insert(host_details, fields[i])
+                table.insert(host_details, tostring(val or 0))
+            end
+        end
+    end
+
+    -- 2. Cleanup
+    local current_time = math.floor(seed / 1000000)
+    local last_clean_time = tonumber(redis.call('GET', last_clean_key) or 0)
+
+    if current_time - last_clean_time >= clean_interval then
+        local all_keys = redis.call('HKEYS', hset_key)
+        if #all_keys > 0 then
+            -- Create a lookup table for current hosts (from index 8 onwards)
+            local current_hosts = {}
+            for i = 8, #KEYS do
+                current_hosts[KEYS[i]] = true
+            end
+            -- Remove keys not in current hosts
+            for _, host in ipairs(all_keys) do
+                if not current_hosts[host] then
+                    redis.call('HDEL', hset_key, host)
+                end
+            end
+        end
+        redis.call('SET', last_clean_key, current_time)
+    end
+
+    return {current_target, new_count, host_details}
+end
+
+-- --- Test 1: Load Balancing Distribution ---
+print("--- Test 1: Load Balancing Distribution ---")
+local hosts = {"host1", "host2", "host3", "host4", "host5"}
+local iterations = 100000
+local results = {}
+for _, h in ipairs(hosts) do results[h] = 0 end
+
+-- Reset redis
+redis_data.hset = {}
+for _, h in ipairs(hosts) do redis_data.hset[h] = "0" end
+
+print(string.format("Running %d iterations with %d hosts (all counts started at 0)...", iterations, #hosts))
+
+for i = 1, iterations do
+    local initial_host = hosts[math.random(#hosts)]
+    -- KEYS structure: [seed, hset_key, last_clean_key, clean_interval, host_selected, healthy_count, enable_detail_log, ...healthy_hosts]
+    local keys = {i * 1000000, "table_key", "clean_key", 3600, initial_host, #hosts, "1"}
+    for _, h in ipairs(hosts) do table.insert(keys, h) end
+    
+    local res = run_lb_logic(keys)
+    local selected = res[1]
+    results[selected] = results[selected] + 1
+end
+
+for _, h in ipairs(hosts) do
+    local percentage = (results[h] / iterations) * 100
+    print(string.format("%s: %6d (%.2f%%)", h, results[h], percentage))
+end
+
+-- --- Test 2: IP Cleanup Logic ---
+print("\n--- Test 2: IP Cleanup Logic ---")
+
+local function test_cleanup()
+    redis_data.hset = {
+        ["host1"] = "10",
+        ["host2"] = "5",
+        ["old_ip_1"] = "1",
+        ["old_ip_2"] = "1",
+    }
+    redis_data.kv["clean_key"] = "1000" -- Last cleaned at 1000s
+    
+    local current_hosts = {"host1", "host2"}
+    local current_time_ms = 1000 * 1000000 + 500 * 1000000 -- 1500s (interval is 300s, let's say)
+    local clean_interval = 300
+    
+    print("Initial Redis IPs:", table.concat((function() local res={} for k,_ in pairs(redis_data.hset) do table.insert(res, k) end return res end)(), ", "))
+    
+    -- Run logic (seed is microtime)
+    local keys = {current_time_ms, "table_key", "clean_key", clean_interval, "host1", #current_hosts, "1"}
+    for _, h in ipairs(current_hosts) do table.insert(keys, h) end
+    
+    run_lb_logic(keys)
+    
+    print("After Cleanup Redis IPs:", table.concat((function() local res={} for k,_ in pairs(redis_data.hset) do table.insert(res, k) end table.sort(res) return res end)(), ", "))
+    
+    local exists_old1 = redis_data.hset["old_ip_1"] ~= nil
+    local exists_old2 = redis_data.hset["old_ip_2"] ~= nil
+    
+    if not exists_old1 and not exists_old2 then
+        print("Success: Outdated IPs removed.")
+    else
+        print("Failure: Outdated IPs still exist.")
+    end
+    
+    print("New last_clean_time:", redis_data.kv["clean_key"])
+end
+
+test_cleanup()
+
+-- --- Test 3: No Cleanup if Interval Not Reached ---
+print("\n--- Test 3: No Cleanup if Interval Not Reached ---")
+
+local function test_no_cleanup()
+    redis_data.hset = {
+        ["host1"] = "10",
+        ["old_ip_1"] = "1",
+    }
+    redis_data.kv["clean_key"] = "1000"
+    
+    local current_hosts = {"host1"}
+    local current_time_ms = 1000 * 1000000 + 100 * 1000000 -- 1100s (interval 300s, not reached)
+    local clean_interval = 300
+    
+    local keys = {current_time_ms, "table_key", "clean_key", clean_interval, "host1", #current_hosts, "0"}
+    for _, h in ipairs(current_hosts) do table.insert(keys, h) end
+    
+    run_lb_logic(keys)
+    
+    if redis_data.hset["old_ip_1"] then
+        print("Success: Cleanup not triggered as expected.")
+    else
+        print("Failure: Cleanup triggered unexpectedly.")
+    end
+end
+
+test_no_cleanup()
--- a/plugins/wasm-go/extensions/ai-load-balancer/global_least_request/rate_limit.go
+++ b/plugins/wasm-go/extensions/ai-load-balancer/global_least_request/rate_limit.go
@@ -0,0 +1,24 @@
+package global_least_request
+
+import (
+	"fmt"
+
+	"github.com/higress-group/wasm-go/pkg/wrapper"
+)
+
+func (lb GlobalLeastRequestLoadBalancer) checkRateLimit(hostSelected string, currentCount int64, ctx wrapper.HttpContext, routeName string, clusterName string) bool {
+	// 如果没有配置最大请求数，直接通过
+	if lb.maxRequestCount <= 0 {
+		return true
+	}
+
+	// 如果当前请求数大于最大请求数，则限流
+	// 注意：Lua脚本已经加了1，所以这里比较的是加1后的值
+	if currentCount > lb.maxRequestCount {
+		// 恢复 Redis 计数
+		lb.redisClient.HIncrBy(fmt.Sprintf(RedisKeyFormat, routeName, clusterName), hostSelected, -1, nil)
+		return false
+	}
+
+	return true
+}
--- a/plugins/wasm-go/extensions/ai-proxy/README.md
+++ b/plugins/wasm-go/extensions/ai-proxy/README.md
@@ -26,6 +26,8 @@ description: AI 代理插件配置参考

 > 请求路径后缀匹配 `/v1/embeddings` 时，对应文本向量场景，会用 OpenAI 的文本向量协议解析请求 Body，再转换为对应 LLM 厂商的文本向量协议

+> 请求路径后缀匹配 `/v1/images/generations` 时，对应文生图场景，会用 OpenAI 的图片生成协议解析请求 Body，再转换为对应 LLM 厂商的图片生成协议
+
 ## 运行属性

 插件执行阶段：`默认阶段`
@@ -55,6 +57,7 @@ description: AI 代理插件配置参考
 | `reasoningContentMode` | string                 | 非必填   | -      | 如何处理大模型服务返回的推理内容。目前支持以下取值：passthrough（正常输出推理内容）、ignore（不输出推理内容）、concat（将推理内容拼接在常规输出内容之前）。默认为 passthrough。仅支持通义千问服务。                                                                                                                                                                                                                                        |
 | `capabilities`         | map of string          | 非必填   | -      | 部分 provider 的部分 ai 能力原生兼容 openai/v1 格式，不需要重写，可以直接转发，通过此配置项指定来开启转发, key 表示的是采用的厂商协议能力，values 表示的真实的厂商该能力的 api path, 厂商协议能力当前支持: openai/v1/chatcompletions, openai/v1/embeddings, openai/v1/imagegeneration, openai/v1/audiospeech, cohere/v1/rerank                                                                                                             |
 | `subPath`              | string                 | 非必填   | -      | 如果配置了subPath，将会先移除请求path中该前缀，再进行后续处理                                                                                                                                                                                                                                                                                                                                                                              |
+| `contextCleanupCommands` | array of string      | 非必填   | -      | 上下文清理命令列表。当请求的 messages 中存在完全匹配任意一个命令的 user 消息时，将该消息及之前所有非 system 消息清理掉，只保留 system 消息和该命令之后的消息。可用于主动清理对话上下文。                                                                                                                                                                                                                                                    |

 `context`的配置字段说明如下：

@@ -221,6 +224,17 @@ Anthropic Claude 所对应的 `type` 为 `claude`。它特有的配置字段如
 | 名称            | 数据类型 | 填写要求 | 默认值 | 描述                                      |
 | --------------- | -------- | -------- | ------ | ----------------------------------------- |
 | `claudeVersion` | string   | 可选     | -      | Claude 服务的 API 版本，默认为 2023-06-01 |
+| `claudeCodeMode` | boolean | 可选     | false  | 启用 Claude Code 模式，用于支持 Claude Code OAuth 令牌认证。启用后将伪装成 Claude Code 客户端发起请求 |
+
+**Claude Code 模式说明**
+
+启用 `claudeCodeMode: true` 时，插件将：
+- 使用 Bearer Token 认证替代 x-api-key（适配 Claude Code OAuth 令牌）
+- 设置 Claude Code 特定的请求头（user-agent、x-app、anthropic-beta）
+- 为请求 URL 添加 `?beta=true` 查询参数
+- 自动注入 Claude Code 的系统提示词（如未提供）
+
+这允许在 Higress 中直接使用 Claude Code 的 OAuth Token 进行身份验证。

 #### Ollama

@@ -309,7 +323,9 @@ Dify 所对应的 `type` 为 `dify`。它特有的配置字段如下:

 #### Google Vertex AI

-Google Vertex AI 所对应的 type 为 vertex。它特有的配置字段如下：
+Google Vertex AI 所对应的 type 为 vertex。支持两种认证模式：
+
+**标准模式**（使用 Service Account）：

 | 名称                         | 数据类型       | 填写要求   | 默认值    | 描述                                                                            |
 |-----------------------------|---------------|--------|--------|-------------------------------------------------------------------------------|
@@ -320,25 +336,56 @@ Google Vertex AI 所对应的 type 为 vertex。它特有的配置字段如下
 | `geminiSafetySetting`       | map of string | 非必填   | -      | Gemini AI 内容过滤和安全级别设定。参考[Safety settings](https://ai.google.dev/gemini-api/docs/safety-settings)                             |
 | `vertexTokenRefreshAhead`   | number        | 非必填   | -      | Vertex access token刷新提前时间(单位秒)                                                |

+**Express Mode**（使用 API Key，简化配置）：
+
+Express Mode 是 Vertex AI 推出的简化访问模式，只需 API Key 即可快速开始使用，无需配置 Service Account。详见 [Vertex AI Express Mode 文档](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview)。
+
+| 名称                         | 数据类型       | 填写要求   | 默认值    | 描述                                                                            |
+|-----------------------------|---------------|--------|--------|-------------------------------------------------------------------------------|
+| `apiTokens`                 | array of string | 必填   | -      | Express Mode 使用的 API Key，从 Google Cloud Console 的 API & Services > Credentials 获取 |
+| `geminiSafetySetting`       | map of string | 非必填   | -      | Gemini AI 内容过滤和安全级别设定。参考[Safety settings](https://ai.google.dev/gemini-api/docs/safety-settings)                             |
+
+**OpenAI 兼容模式**（使用 Vertex AI Chat Completions API）：
+
+Vertex AI 提供了 OpenAI 兼容的 Chat Completions API 端点，可以直接使用 OpenAI 格式的请求和响应，无需进行协议转换。详见 [Vertex AI OpenAI 兼容性文档](https://cloud.google.com/vertex-ai/generative-ai/docs/migrate/openai/overview)。
+
+| 名称                         | 数据类型       | 填写要求   | 默认值    | 描述                                                                            |
+|-----------------------------|---------------|--------|--------|-------------------------------------------------------------------------------|
+| `vertexOpenAICompatible`    | boolean       | 非必填   | false  | 启用 OpenAI 兼容模式。启用后将使用 Vertex AI 的 OpenAI-compatible Chat Completions API |
+| `vertexAuthKey`             | string        | 必填     | -      | 用于认证的 Google Service Account JSON Key |
+| `vertexRegion`              | string        | 必填     | -      | Google Cloud 区域（如 us-central1, europe-west4 等） |
+| `vertexProjectId`           | string        | 必填     | -      | Google Cloud 项目 ID |
+| `vertexAuthServiceName`     | string        | 必填     | -      | 用于 OAuth2 认证的服务名称 |
+
+**注意**：OpenAI 兼容模式与 Express Mode 互斥，不能同时配置 `apiTokens` 和 `vertexOpenAICompatible`。
+
 #### AWS Bedrock

-AWS Bedrock 所对应的 type 为 bedrock。它特有的配置字段如下：
+AWS Bedrock 所对应的 type 为 bedrock。它支持两种认证方式：

-| 名称            | 数据类型  | 填写要求 | 默认值 | 描述                           |
-|---------------------------|--------|------|-----|------------------------------|
-| `modelVersion` | string   | 非必填  | -   | 用于指定 Triton Server 中 model version           |
-| `tritonDomain` | string   | 非必填  | -   | Triton Server 部署的指定请求 Domain            |
+1. **AWS Signature V4 认证**：使用 `awsAccessKey` 和 `awsSecretKey` 进行 AWS 标准签名认证
+2. **Bearer Token 认证**：使用 `apiTokens` 配置 AWS Bearer Token（适用于 IAM Identity Center 等场景）
+
+**注意**：两种认证方式二选一，如果同时配置了 `apiTokens`，将优先使用 Bearer Token 认证方式。
+
+它特有的配置字段如下：
+
+| 名称                        | 数据类型        | 填写要求            | 默认值 | 描述                                                |
+|---------------------------|---------------|-------------------|-------|---------------------------------------------------|
+| `apiTokens`               | array of string | 与 ak/sk 二选一   | -     | AWS Bearer Token，用于 Bearer Token 认证方式          |
+| `awsAccessKey`            | string        | 与 apiTokens 二选一 | -     | AWS Access Key，用于 AWS Signature V4 认证            |
+| `awsSecretKey`            | string        | 与 apiTokens 二选一 | -     | AWS Secret Access Key，用于 AWS Signature V4 认证     |
+| `awsRegion`               | string        | 必填              | -     | AWS 区域，例如：us-east-1                              |
+| `bedrockAdditionalFields` | map           | 非必填            | -     | Bedrock 额外模型请求参数                               |

 #### NVIDIA Triton Interference Server

 NVIDIA Triton Interference Server 所对应的 type 为 triton。它特有的配置字段如下：

-| 名称                        | 数据类型   | 填写要求 | 默认值 | 描述                           |
-|---------------------------|--------|------|-----|------------------------------|
-| `awsAccessKey`            | string | 必填   | -   | AWS Access Key，用于身份认证        |
-| `awsSecretKey`            | string | 必填   | -   | AWS Secret Access Key，用于身份认证 |
-| `awsRegion`               | string | 必填   | -   | AWS 区域，例如：us-east-1          |
-| `bedrockAdditionalFields` | map    | 非必填  | -   | Bedrock 额外模型请求参数             |
+| 名称                   | 数据类型 | 填写要求 | 默认值 | 描述                                      |
+|----------------------|--------|--------|-------|------------------------------------------|
+| `tritonModelVersion` | string | 非必填   | -     | 用于指定 Triton Server 中 model version     |
+| `tritonDomain`       | string | 非必填   | -     | Triton Server 部署的指定请求 Domain          |

 ## 用法示例

@@ -1175,6 +1222,44 @@ URL: `http://your-domain/v1/messages`
 }
 ```

+### 使用 Claude Code 模式
+
+Claude Code 是 Anthropic 提供的官方 CLI 工具。通过启用 `claudeCodeMode`，可以使用 Claude Code 的 OAuth Token 进行身份验证：
+
+**配置信息**
+
+```yaml
+provider:
+  type: claude
+  apiTokens:
+    - 'sk-ant-oat01-xxxxx'  # Claude Code OAuth Token
+  claudeCodeMode: true  # 启用 Claude Code 模式
+```
+
+启用此模式后，插件将自动：
+- 使用 Bearer Token 认证（而非 x-api-key）
+- 设置 Claude Code 特定的请求头和查询参数
+- 注入 Claude Code 的系统提示词（如未提供）
+
+**请求示例**
+
+```json
+{
+  "model": "claude-sonnet-4-5-20250929",
+  "max_tokens": 8192,
+  "messages": [
+    {
+      "role": "user",
+      "content": "List files in current directory"
+    }
+  ]
+}
+```
+
+插件将自动转换为适合 Claude Code 的请求格式，包括：
+- 添加系统提示词：`"You are Claude Code, Anthropic's official CLI for Claude."`
+- 设置适当的认证和请求头
+
 ### 使用智能协议转换

 当目标供应商不原生支持 Claude 协议时，插件会自动进行协议转换：
@@ -1947,7 +2032,7 @@ provider:
 }
 ```

-### 使用 OpenAI 协议代理 Google Vertex 服务
+### 使用 OpenAI 协议代理 Google Vertex 服务（标准模式）

 **配置信息**

@@ -2009,8 +2094,236 @@ provider:
 }
 ```

+### 使用 OpenAI 协议代理 Google Vertex 服务（Express Mode）
+
+Express Mode 是 Vertex AI 的简化访问模式，只需 API Key 即可快速开始使用。
+
+**配置信息**
+
+```yaml
+provider:
+  type: vertex
+  apiTokens:
+    - "YOUR_API_KEY"
+```
+
+**请求示例**
+
+```json
+{
+  "model": "gemini-2.5-flash",
+  "messages": [
+    {
+      "role": "user",
+      "content": "你好，你是谁？"
+    }
+  ],
+  "stream": false
+}
+```
+
+**响应示例**
+
+```json
+{
+  "id": "chatcmpl-0000000000000",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "你好！我是 Gemini，由 Google 开发的人工智能助手。有什么我可以帮您的吗？"
+      },
+      "finish_reason": "stop"
+    }
+  ],
+  "created": 1729986750,
+  "model": "gemini-2.5-flash",
+  "object": "chat.completion",
+  "usage": {
+    "prompt_tokens": 10,
+    "completion_tokens": 25,
+    "total_tokens": 35
+  }
+}
+```
+
+### 使用 OpenAI 协议代理 Google Vertex 服务（OpenAI 兼容模式）
+
+OpenAI 兼容模式使用 Vertex AI 的 OpenAI-compatible Chat Completions API，请求和响应都使用 OpenAI 格式，无需进行协议转换。
+
+**配置信息**
+
+```yaml
+provider:
+  type: vertex
+  vertexOpenAICompatible: true
+  vertexAuthKey: |
+    {
+      "type": "service_account",
+      "project_id": "your-project-id",
+      "private_key_id": "your-private-key-id",
+      "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
+      "client_email": "your-service-account@your-project.iam.gserviceaccount.com",
+      "token_uri": "https://oauth2.googleapis.com/token"
+    }
+  vertexRegion: us-central1
+  vertexProjectId: your-project-id
+  vertexAuthServiceName: your-auth-service-name
+  modelMapping:
+    "gpt-4": "gemini-2.0-flash"
+    "*": "gemini-1.5-flash"
+```
+
+**请求示例**
+
+```json
+{
+  "model": "gpt-4",
+  "messages": [
+    {
+      "role": "user",
+      "content": "你好，你是谁？"
+    }
+  ],
+  "stream": false
+}
+```
+
+**响应示例**
+
+```json
+{
+  "id": "chatcmpl-abc123",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "你好！我是由 Google 开发的 Gemini 模型。我可以帮助回答问题、提供信息和进行对话。有什么我可以帮您的吗？"
+      },
+      "finish_reason": "stop"
+    }
+  ],
+  "created": 1729986750,
+  "model": "gemini-2.0-flash",
+  "object": "chat.completion",
+  "usage": {
+    "prompt_tokens": 12,
+    "completion_tokens": 35,
+    "total_tokens": 47
+  }
+}
+```
+
+### 使用 OpenAI 协议代理 Google Vertex 图片生成服务
+
+Vertex AI 支持使用 Gemini 模型进行图片生成。通过 ai-proxy 插件，可以使用 OpenAI 的 `/v1/images/generations` 接口协议来调用 Vertex AI 的图片生成能力。
+
+**配置信息**
+
+```yaml
+provider:
+  type: vertex
+  apiTokens:
+    - "YOUR_API_KEY"
+  modelMapping:
+    "dall-e-3": "gemini-2.0-flash-exp"
+  geminiSafetySetting:
+    HARM_CATEGORY_HARASSMENT: "OFF"
+    HARM_CATEGORY_HATE_SPEECH: "OFF"
+    HARM_CATEGORY_SEXUALLY_EXPLICIT: "OFF"
+    HARM_CATEGORY_DANGEROUS_CONTENT: "OFF"
+```
+
+**使用 curl 请求**
+
+```bash
+curl -X POST "http://your-gateway-address/v1/images/generations" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gemini-2.0-flash-exp",
+    "prompt": "一只可爱的橘猫在阳光下打盹",
+    "size": "1024x1024"
+  }'
+```
+
+**使用 OpenAI Python SDK**
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="any-value",  # 可以是任意值，认证由网关处理
+    base_url="http://your-gateway-address/v1"
+)
+
+response = client.images.generate(
+    model="gemini-2.0-flash-exp",
+    prompt="一只可爱的橘猫在阳光下打盹",
+    size="1024x1024",
+    n=1
+)
+
+# 获取生成的图片（base64 编码）
+image_data = response.data[0].b64_json
+print(f"Generated image (base64): {image_data[:100]}...")
+```
+
+**响应示例**
+
+```json
+{
+  "created": 1729986750,
+  "data": [
+    {
+      "b64_json": "iVBORw0KGgoAAAANSUhEUgAABAAAAAQACAIAAADwf7zUAAAA..."
+    }
+  ],
+  "usage": {
+    "total_tokens": 1356,
+    "input_tokens": 13,
+    "output_tokens": 1120
+  }
+}
+```
+
+**支持的尺寸参数**
+
+Vertex AI 支持的宽高比（aspectRatio）：`1:1`、`3:2`、`2:3`、`3:4`、`4:3`、`4:5`、`5:4`、`9:16`、`16:9`、`21:9`
+
+Vertex AI 支持的分辨率（imageSize）：`1k`、`2k`、`4k`
+
+| OpenAI size 参数 | Vertex AI aspectRatio | Vertex AI imageSize |
+|------------------|----------------------|---------------------|
+| 256x256          | 1:1                  | 1k                  |
+| 512x512          | 1:1                  | 1k                  |
+| 1024x1024        | 1:1                  | 1k                  |
+| 1792x1024        | 16:9                 | 2k                  |
+| 1024x1792        | 9:16                 | 2k                  |
+| 2048x2048        | 1:1                  | 2k                  |
+| 4096x4096        | 1:1                  | 4k                  |
+| 1536x1024        | 3:2                  | 2k                  |
+| 1024x1536        | 2:3                  | 2k                  |
+| 1024x768         | 4:3                  | 1k                  |
+| 768x1024         | 3:4                  | 1k                  |
+| 1280x1024        | 5:4                  | 1k                  |
+| 1024x1280        | 4:5                  | 1k                  |
+| 2560x1080        | 21:9                 | 2k                  |
+
+**注意事项**
+
+- 图片生成使用 Gemini 模型（如 `gemini-2.0-flash-exp`、`gemini-3-pro-image-preview`），不同模型的可用性可能因区域而异
+- 返回的图片数据为 base64 编码格式（`b64_json`）
+- 可以通过 `geminiSafetySetting` 配置内容安全过滤级别
+- 如果需要使用模型映射（如将 `dall-e-3` 映射到 Gemini 模型），可以配置 `modelMapping`
+
 ### 使用 OpenAI 协议代理 AWS Bedrock 服务

+AWS Bedrock 支持两种认证方式：
+
+#### 方式一：使用 AWS Access Key/Secret Key 认证（AWS Signature V4）
+
 **配置信息**

 ```yaml
@@ -2018,7 +2331,21 @@ provider:
  type: bedrock
  awsAccessKey: "YOUR_AWS_ACCESS_KEY_ID"
  awsSecretKey: "YOUR_AWS_SECRET_ACCESS_KEY"
-  awsRegion: "YOUR_AWS_REGION"
+  awsRegion: "us-east-1"
+  bedrockAdditionalFields:
+    top_k: 200
+```
+
+#### 方式二：使用 Bearer Token 认证（适用于 IAM Identity Center 等场景）
+
+**配置信息**
+
+```yaml
+provider:
+  type: bedrock
+  apiTokens:
+    - "YOUR_AWS_BEARER_TOKEN"
+  awsRegion: "us-east-1"
  bedrockAdditionalFields:
    top_k: 200
 ```
@@ -2027,7 +2354,7 @@ provider:

 ```json
 {
-  "model": "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-5-haiku-20241022-v1:0",
+  "model": "us.anthropic.claude-3-5-haiku-20241022-v1:0",
  "messages": [
    {
      "role": "user",
@@ -2112,11 +2439,92 @@ providers:
 }
 ```

+### 使用上下文清理命令

+配置上下文清理命令后，用户可以通过发送特定消息来主动清理对话历史，实现"重新开始对话"的效果。

+**配置信息**

+```yaml
+provider:
+  type: qwen
+  apiTokens:
+    - "YOUR_QWEN_API_TOKEN"
+  modelMapping:
+    "*": "qwen-turbo"
+  contextCleanupCommands:
+    - "清理上下文"
+    - "/clear"
+    - "重新开始"
+    - "新对话"
+```

+**请求示例**

+当用户发送包含清理命令的请求时：
+
+```json
+{
+  "model": "gpt-3",
+  "messages": [
+    {
+      "role": "system",
+      "content": "你是一个助手"
+    },
+    {
+      "role": "user",
+      "content": "你好"
+    },
+    {
+      "role": "assistant",
+      "content": "你好！有什么可以帮助你的？"
+    },
+    {
+      "role": "user",
+      "content": "今天天气怎么样"
+    },
+    {
+      "role": "assistant",
+      "content": "抱歉，我无法获取实时天气信息。"
+    },
+    {
+      "role": "user",
+      "content": "清理上下文"
+    },
+    {
+      "role": "user",
+      "content": "现在开始新话题，介绍一下你自己"
+    }
+  ]
+}
+```
+
+**实际发送给 AI 服务的请求**
+
+插件会自动清理"清理上下文"命令及之前的所有非 system 消息：
+
+```json
+{
+  "model": "qwen-turbo",
+  "messages": [
+    {
+      "role": "system",
+      "content": "你是一个助手"
+    },
+    {
+      "role": "user",
+      "content": "现在开始新话题，介绍一下你自己"
+    }
+  ]
+}
+```
+
+**说明**
+
+- 清理命令必须完全匹配配置的字符串，部分匹配不会触发清理
+- 当存在多个清理命令时，只处理最后一个匹配的命令
+- 清理会保留所有 system 消息，删除命令及之前的 user、assistant、tool 消息
+- 清理命令之后的所有消息都会保留

 ## 完整配置示例

--- a/plugins/wasm-go/extensions/ai-proxy/README_EN.md
+++ b/plugins/wasm-go/extensions/ai-proxy/README_EN.md
@@ -25,6 +25,8 @@ The plugin now supports **automatic protocol detection**, allowing seamless comp

 > When the request path suffix matches `/v1/embeddings`, it corresponds to text vector scenarios. The request body will be parsed using OpenAI's text vector protocol and then converted to the corresponding LLM vendor's text vector protocol.

+> When the request path suffix matches `/v1/images/generations`, it corresponds to text-to-image scenarios. The request body will be parsed using OpenAI's image generation protocol and then converted to the corresponding LLM vendor's image generation protocol.
+
 ## Execution Properties
 Plugin execution phase: `Default Phase`
 Plugin execution priority: `100`
@@ -50,6 +52,7 @@ Plugin execution priority: `100`
 | `context`        | object                 | Optional    | -       | Configuration for AI conversation context information                                                                                                                                                                                                                                                                                                                                     |
 | `customSettings` | array of customSetting | Optional    | -       | Specifies overrides or fills parameters for AI requests                                                                                                                                                                                                                                                                                                                                   |
 | `subPath`        | string                 | Optional    | -       | If subPath is configured, the prefix will be removed from the request path before further processing.                                                                                                                                                                                                                                                                                     |
+| `contextCleanupCommands` | array of string | Optional    | -       | List of context cleanup commands. When a user message in the request exactly matches any of the configured commands, that message and all non-system messages before it will be removed, keeping only system messages and messages after the command. This enables users to actively clear conversation history.                                                                           |

 **Details for the `context` configuration fields:**

@@ -182,11 +185,22 @@ For MiniMax, the corresponding `type` is `minimax`. Its unique configuration fie

 #### Anthropic Claude

-For Anthropic Claude, the corresponding `type` is `claude`. Its unique configuration field is:
+For Anthropic Claude, the corresponding `type` is `claude`. Its unique configuration fields are:

 | Name        | Data Type   | Filling Requirements | Default Value | Description                                                                                                    |
 |------------|-------------|----------------------|---------------|---------------------------------------------------------------------------------------------------------------|
 | `claudeVersion` | string | Optional             | -             | The version of the Claude service's API, default is 2023-06-01.                                               |
+| `claudeCodeMode` | boolean | Optional             | false         | Enable Claude Code mode for OAuth token authentication. When enabled, requests will be formatted as Claude Code client requests. |
+
+**Claude Code Mode**
+
+When `claudeCodeMode: true` is enabled, the plugin will:
+- Use Bearer Token authentication instead of x-api-key (compatible with Claude Code OAuth tokens)
+- Set Claude Code-specific request headers (user-agent, x-app, anthropic-beta)
+- Add `?beta=true` query parameter to request URLs
+- Automatically inject Claude Code system prompt if not provided
+
+This enables direct use of Claude Code OAuth tokens for authentication in Higress.

 #### Ollama

@@ -255,7 +269,9 @@ For DeepL, the corresponding `type` is `deepl`. Its unique configuration field i
 | `targetLang` | string    | Required    | -       | The target language required by the DeepL translation service |

 #### Google Vertex AI
-For Vertex, the corresponding `type` is `vertex`. Its unique configuration field is:
+For Vertex, the corresponding `type` is `vertex`. It supports two authentication modes:
+
+**Standard Mode** (using Service Account):

 | Name                        | Data Type     | Requirement   | Default | Description                                                                                                                                                 |
 |-----------------------------|---------------|---------------| ------ |-------------------------------------------------------------------------------------------------------------------------------------------------------------|
@@ -266,16 +282,47 @@ For Vertex, the corresponding `type` is `vertex`. Its unique configuration field
 | `vertexGeminiSafetySetting` | map of string | Optional      | -      | Gemini model content safety filtering settings.                                                                                                             |
 | `vertexTokenRefreshAhead`   | number        | Optional      | -      | Vertex access token refresh ahead time in seconds                                                                                                           |

+**Express Mode** (using API Key, simplified configuration):
+
+Express Mode is a simplified access mode introduced by Vertex AI. You can quickly get started with just an API Key, without configuring a Service Account. See [Vertex AI Express Mode documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview).
+
+| Name                        | Data Type        | Requirement   | Default | Description                                                                                                                                                 |
+|-----------------------------|------------------|---------------| ------ |-------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `apiTokens`                 | array of string  | Required      | -      | API Key for Express Mode, obtained from Google Cloud Console under API & Services > Credentials                                                              |
+| `vertexGeminiSafetySetting` | map of string    | Optional      | -      | Gemini model content safety filtering settings.                                                                                                             |
+
+**OpenAI Compatible Mode** (using Vertex AI Chat Completions API):
+
+Vertex AI provides an OpenAI-compatible Chat Completions API endpoint, allowing you to use OpenAI format requests and responses directly without protocol conversion. See [Vertex AI OpenAI Compatibility documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/migrate/openai/overview).
+
+| Name                        | Data Type        | Requirement   | Default | Description                                                                                                                                                 |
+|-----------------------------|------------------|---------------| ------ |-------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `vertexOpenAICompatible`    | boolean          | Optional      | false  | Enable OpenAI compatible mode. When enabled, uses Vertex AI's OpenAI-compatible Chat Completions API |
+| `vertexAuthKey`             | string           | Required      | -      | Google Service Account JSON Key for authentication |
+| `vertexRegion`              | string           | Required      | -      | Google Cloud region (e.g., us-central1, europe-west4) |
+| `vertexProjectId`           | string           | Required      | -      | Google Cloud Project ID |
+| `vertexAuthServiceName`     | string           | Required      | -      | Service name for OAuth2 authentication |
+
+**Note**: OpenAI Compatible Mode and Express Mode are mutually exclusive. You cannot configure both `apiTokens` and `vertexOpenAICompatible` at the same time.
+
 #### AWS Bedrock

-For AWS Bedrock, the corresponding `type` is `bedrock`. Its unique configuration field is:
+For AWS Bedrock, the corresponding `type` is `bedrock`. It supports two authentication methods:

-| Name                      | Data Type | Requirement | Default | Description                                             |
-|---------------------------|-----------|-------------|---------|---------------------------------------------------------|
-| `awsAccessKey`            | string    | Required    | -       | AWS Access Key used for authentication                  |
-| `awsSecretKey`            | string    | Required    | -       | AWS Secret Access Key used for authentication           |
-| `awsRegion`               | string    | Required    | -       | AWS region, e.g., us-east-1                             |
-| `bedrockAdditionalFields` | map       | Optional    | -       | Additional inference parameters that the model supports |
+1. **AWS Signature V4 Authentication**: Uses `awsAccessKey` and `awsSecretKey` for standard AWS signature authentication
+2. **Bearer Token Authentication**: Uses `apiTokens` to configure AWS Bearer Token (suitable for IAM Identity Center and similar scenarios)
+
+**Note**: Choose one of the two authentication methods. If `apiTokens` is configured, Bearer Token authentication will be used preferentially.
+
+Its unique configuration fields are:
+
+| Name                      | Data Type       | Requirement              | Default | Description                                                       |
+|---------------------------|-----------------|--------------------------|---------|-------------------------------------------------------------------|
+| `apiTokens`               | array of string | Either this or ak/sk     | -       | AWS Bearer Token for Bearer Token authentication                   |
+| `awsAccessKey`            | string          | Either this or apiTokens | -       | AWS Access Key for AWS Signature V4 authentication                 |
+| `awsSecretKey`            | string          | Either this or apiTokens | -       | AWS Secret Access Key for AWS Signature V4 authentication          |
+| `awsRegion`               | string          | Required                 | -       | AWS region, e.g., us-east-1                                        |
+| `bedrockAdditionalFields` | map             | Optional                 | -       | Additional inference parameters that the model supports            |

 ## Usage Examples

@@ -1112,6 +1159,44 @@ Both protocol formats will return responses in their respective formats:
 }
 ```

+### Using Claude Code Mode
+
+Claude Code is Anthropic's official CLI tool. By enabling `claudeCodeMode`, you can authenticate using Claude Code OAuth tokens:
+
+**Configuration Information**
+
+```yaml
+provider:
+  type: claude
+  apiTokens:
+    - "sk-ant-oat01-xxxxx"  # Claude Code OAuth Token
+  claudeCodeMode: true  # Enable Claude Code mode
+```
+
+Once this mode is enabled, the plugin will automatically:
+- Use Bearer Token authentication (instead of x-api-key)
+- Set Claude Code-specific request headers and query parameters
+- Inject Claude Code system prompt if not provided
+
+**Request Example**
+
+```json
+{
+  "model": "claude-sonnet-4-5-20250929",
+  "max_tokens": 8192,
+  "messages": [
+    {
+      "role": "user",
+      "content": "List files in current directory"
+    }
+  ]
+}
+```
+
+The plugin will automatically transform the request into Claude Code format, including:
+- Adding system prompt: `"You are Claude Code, Anthropic's official CLI for Claude."`
+- Setting appropriate authentication and request headers
+
 ### Using Intelligent Protocol Conversion

 When the target provider doesn't natively support Claude protocol, the plugin automatically performs protocol conversion:
@@ -1720,7 +1805,7 @@ provider:
 }
 ```

-### Utilizing OpenAI Protocol Proxy for Google Vertex Services
+### Utilizing OpenAI Protocol Proxy for Google Vertex Services (Standard Mode)
 **Configuration Information**
 ```yaml
 provider:
@@ -1778,14 +1863,250 @@ provider:
 }
 ```

+### Utilizing OpenAI Protocol Proxy for Google Vertex Services (Express Mode)
+
+Express Mode is a simplified access mode for Vertex AI. You only need an API Key to get started quickly.
+
+**Configuration Information**
+```yaml
+provider:
+  type: vertex
+  apiTokens:
+    - "YOUR_API_KEY"
+```
+
+**Request Example**
+```json
+{
+  "model": "gemini-2.5-flash",
+  "messages": [
+    {
+      "role": "user",
+      "content": "Who are you?"
+    }
+  ],
+  "stream": false
+}
+```
+
+**Response Example**
+```json
+{
+  "id": "chatcmpl-0000000000000",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "Hello! I am Gemini, an AI assistant developed by Google. How can I help you today?"
+      },
+      "finish_reason": "stop"
+    }
+  ],
+  "created": 1729986750,
+  "model": "gemini-2.5-flash",
+  "object": "chat.completion",
+  "usage": {
+    "prompt_tokens": 10,
+    "completion_tokens": 25,
+    "total_tokens": 35
+  }
+}
+```
+
+### Utilizing OpenAI Protocol Proxy for Google Vertex Services (OpenAI Compatible Mode)
+
+OpenAI Compatible Mode uses Vertex AI's OpenAI-compatible Chat Completions API. Both requests and responses use OpenAI format, requiring no protocol conversion.
+
+**Configuration Information**
+```yaml
+provider:
+  type: vertex
+  vertexOpenAICompatible: true
+  vertexAuthKey: |
+    {
+      "type": "service_account",
+      "project_id": "your-project-id",
+      "private_key_id": "your-private-key-id",
+      "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
+      "client_email": "your-service-account@your-project.iam.gserviceaccount.com",
+      "token_uri": "https://oauth2.googleapis.com/token"
+    }
+  vertexRegion: us-central1
+  vertexProjectId: your-project-id
+  vertexAuthServiceName: your-auth-service-name
+  modelMapping:
+    "gpt-4": "gemini-2.0-flash"
+    "*": "gemini-1.5-flash"
+```
+
+**Request Example**
+```json
+{
+  "model": "gpt-4",
+  "messages": [
+    {
+      "role": "user",
+      "content": "Hello, who are you?"
+    }
+  ],
+  "stream": false
+}
+```
+
+**Response Example**
+```json
+{
+  "id": "chatcmpl-abc123",
+  "choices": [
+    {
+      "index": 0,
+      "message": {
+        "role": "assistant",
+        "content": "Hello! I am Gemini, an AI model developed by Google. I can help answer questions, provide information, and engage in conversations. How can I assist you today?"
+      },
+      "finish_reason": "stop"
+    }
+  ],
+  "created": 1729986750,
+  "model": "gemini-2.0-flash",
+  "object": "chat.completion",
+  "usage": {
+    "prompt_tokens": 12,
+    "completion_tokens": 35,
+    "total_tokens": 47
+  }
+}
+```
+
+### Utilizing OpenAI Protocol Proxy for Google Vertex Image Generation
+
+Vertex AI supports image generation using Gemini models. Through the ai-proxy plugin, you can use OpenAI's `/v1/images/generations` API to call Vertex AI's image generation capabilities.
+
+**Configuration Information**
+
+```yaml
+provider:
+  type: vertex
+  apiTokens:
+    - "YOUR_API_KEY"
+  modelMapping:
+    "dall-e-3": "gemini-2.0-flash-exp"
+  geminiSafetySetting:
+    HARM_CATEGORY_HARASSMENT: "OFF"
+    HARM_CATEGORY_HATE_SPEECH: "OFF"
+    HARM_CATEGORY_SEXUALLY_EXPLICIT: "OFF"
+    HARM_CATEGORY_DANGEROUS_CONTENT: "OFF"
+```
+
+**Using curl**
+
+```bash
+curl -X POST "http://your-gateway-address/v1/images/generations" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gemini-2.0-flash-exp",
+    "prompt": "A cute orange cat napping in the sunshine",
+    "size": "1024x1024"
+  }'
+```
+
+**Using OpenAI Python SDK**
+
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="any-value",  # Can be any value, authentication is handled by the gateway
+    base_url="http://your-gateway-address/v1"
+)
+
+response = client.images.generate(
+    model="gemini-2.0-flash-exp",
+    prompt="A cute orange cat napping in the sunshine",
+    size="1024x1024",
+    n=1
+)
+
+# Get the generated image (base64 encoded)
+image_data = response.data[0].b64_json
+print(f"Generated image (base64): {image_data[:100]}...")
+```
+
+**Response Example**
+
+```json
+{
+  "created": 1729986750,
+  "data": [
+    {
+      "b64_json": "iVBORw0KGgoAAAANSUhEUgAABAAAAAQACAIAAADwf7zUAAAA..."
+    }
+  ],
+  "usage": {
+    "total_tokens": 1356,
+    "input_tokens": 13,
+    "output_tokens": 1120
+  }
+}
+```
+
+**Supported Size Parameters**
+
+Vertex AI supported aspect ratios: `1:1`, `3:2`, `2:3`, `3:4`, `4:3`, `4:5`, `5:4`, `9:16`, `16:9`, `21:9`
+
+Vertex AI supported resolutions (imageSize): `1k`, `2k`, `4k`
+
+| OpenAI size parameter | Vertex AI aspectRatio | Vertex AI imageSize |
+|-----------------------|----------------------|---------------------|
+| 256x256               | 1:1                  | 1k                  |
+| 512x512               | 1:1                  | 1k                  |
+| 1024x1024             | 1:1                  | 1k                  |
+| 1792x1024             | 16:9                 | 2k                  |
+| 1024x1792             | 9:16                 | 2k                  |
+| 2048x2048             | 1:1                  | 2k                  |
+| 4096x4096             | 1:1                  | 4k                  |
+| 1536x1024             | 3:2                  | 2k                  |
+| 1024x1536             | 2:3                  | 2k                  |
+| 1024x768              | 4:3                  | 1k                  |
+| 768x1024              | 3:4                  | 1k                  |
+| 1280x1024             | 5:4                  | 1k                  |
+| 1024x1280             | 4:5                  | 1k                  |
+| 2560x1080             | 21:9                 | 2k                  |
+
+**Notes**
+
+- Image generation uses Gemini models (e.g., `gemini-2.0-flash-exp`, `gemini-3-pro-image-preview`). Model availability may vary by region
+- The returned image data is in base64 encoded format (`b64_json`)
+- Content safety filtering levels can be configured via `geminiSafetySetting`
+- If you need model mapping (e.g., mapping `dall-e-3` to a Gemini model), configure `modelMapping`
+
 ### Utilizing OpenAI Protocol Proxy for AWS Bedrock Services
+
+AWS Bedrock supports two authentication methods:
+
+#### Method 1: Using AWS Access Key/Secret Key Authentication (AWS Signature V4)
+
 **Configuration Information**
 ```yaml
 provider:
  type: bedrock
  awsAccessKey: "YOUR_AWS_ACCESS_KEY_ID"
  awsSecretKey: "YOUR_AWS_SECRET_ACCESS_KEY"
-  awsRegion: "YOUR_AWS_REGION"
+  awsRegion: "us-east-1"
+  bedrockAdditionalFields:
+    top_k: 200
+```
+
+#### Method 2: Using Bearer Token Authentication (suitable for IAM Identity Center and similar scenarios)
+
+**Configuration Information**
+```yaml
+provider:
+  type: bedrock
+  apiTokens:
+    - "YOUR_AWS_BEARER_TOKEN"
+  awsRegion: "us-east-1"
  bedrockAdditionalFields:
    top_k: 200
 ```
@@ -1793,7 +2114,7 @@ provider:
 **Request Example**
 ```json
 {
-  "model": "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-5-haiku-20241022-v1:0",
+  "model": "us.anthropic.claude-3-5-haiku-20241022-v1:0",
  "messages": [
    {
      "role": "user",
@@ -1876,6 +2197,93 @@ providers:
 }
 ```

+### Using Context Cleanup Commands
+
+After configuring context cleanup commands, users can actively clear conversation history by sending specific messages, achieving a "start over" effect.
+
+**Configuration**
+
+```yaml
+provider:
+  type: qwen
+  apiTokens:
+    - "YOUR_QWEN_API_TOKEN"
+  modelMapping:
+    "*": "qwen-turbo"
+  contextCleanupCommands:
+    - "clear context"
+    - "/clear"
+    - "start over"
+    - "new conversation"
+```
+
+**Request Example**
+
+When a user sends a request containing a cleanup command:
+
+```json
+{
+  "model": "gpt-3",
+  "messages": [
+    {
+      "role": "system",
+      "content": "You are an assistant"
+    },
+    {
+      "role": "user",
+      "content": "Hello"
+    },
+    {
+      "role": "assistant",
+      "content": "Hello! How can I help you?"
+    },
+    {
+      "role": "user",
+      "content": "What's the weather like today"
+    },
+    {
+      "role": "assistant",
+      "content": "Sorry, I cannot get real-time weather information."
+    },
+    {
+      "role": "user",
+      "content": "clear context"
+    },
+    {
+      "role": "user",
+      "content": "Let's start a new topic, introduce yourself"
+    }
+  ]
+}
+```
+
+**Actual Request Sent to AI Service**
+
+The plugin automatically removes the cleanup command and all non-system messages before it:
+
+```json
+{
+  "model": "qwen-turbo",
+  "messages": [
+    {
+      "role": "system",
+      "content": "You are an assistant"
+    },
+    {
+      "role": "user",
+      "content": "Let's start a new topic, introduce yourself"
+    }
+  ]
+}
+```
+
+**Notes**
+
+- The cleanup command must exactly match the configured string; partial matches will not trigger cleanup
+- When multiple cleanup commands exist in messages, only the last matching command is processed
+- Cleanup preserves all system messages and removes user, assistant, and tool messages before the command
+- All messages after the cleanup command are preserved
+
 ## Full Configuration Example

 ### Kubernetes Example
--- a/plugins/wasm-go/extensions/ai-proxy/go.mod
+++ b/plugins/wasm-go/extensions/ai-proxy/go.mod
@@ -8,7 +8,7 @@ toolchain go1.24.4

 require (
 	github.com/higress-group/proxy-wasm-go-sdk v0.0.0-20251103120604-77e9cce339d2
-	github.com/higress-group/wasm-go v1.0.7-0.20251209122854-7e766df5675c
+	github.com/higress-group/wasm-go v1.0.10-0.20260120033417-1c84f010156d
 	github.com/stretchr/testify v1.9.0
 	github.com/tidwall/gjson v1.18.0
 )
--- a/plugins/wasm-go/extensions/ai-proxy/go.sum
+++ b/plugins/wasm-go/extensions/ai-proxy/go.sum
@@ -4,8 +4,8 @@ github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
 github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
 github.com/higress-group/proxy-wasm-go-sdk v0.0.0-20251103120604-77e9cce339d2 h1:NY33OrWCJJ+DFiLc+lsBY4Ywor2Ik61ssk6qkGF8Ypo=
 github.com/higress-group/proxy-wasm-go-sdk v0.0.0-20251103120604-77e9cce339d2/go.mod h1:tRI2LfMudSkKHhyv1uex3BWzcice2s/l8Ah8axporfA=
-github.com/higress-group/wasm-go v1.0.7-0.20251209122854-7e766df5675c h1:DdVPyaMHSYBqO5jwB9Wl3PqsBGIf4u29BHMI0uIVB1Y=
-github.com/higress-group/wasm-go v1.0.7-0.20251209122854-7e766df5675c/go.mod h1:uKVYICbRaxTlKqdm8E0dpjbysxM8uCPb9LV26hF3Km8=
+github.com/higress-group/wasm-go v1.0.10-0.20260120033417-1c84f010156d h1:LgYbzEBtg0+LEqoebQeMVgAB6H5SgqG+KN+gBhNfKbM=
+github.com/higress-group/wasm-go v1.0.10-0.20260120033417-1c84f010156d/go.mod h1:uKVYICbRaxTlKqdm8E0dpjbysxM8uCPb9LV26hF3Km8=
 github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
 github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
 github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg=
--- a/plugins/wasm-go/extensions/ai-proxy/main_test.go
+++ b/plugins/wasm-go/extensions/ai-proxy/main_test.go
@@ -128,3 +128,37 @@ func TestGeneric(t *testing.T) {
 	test.RunGenericOnHttpRequestHeadersTests(t)
 	test.RunGenericOnHttpRequestBodyTests(t)
 }
+
+func TestVertex(t *testing.T) {
+	test.RunVertexParseConfigTests(t)
+	test.RunVertexExpressModeOnHttpRequestHeadersTests(t)
+	test.RunVertexExpressModeOnHttpRequestBodyTests(t)
+	test.RunVertexExpressModeOnHttpResponseBodyTests(t)
+	test.RunVertexExpressModeOnStreamingResponseBodyTests(t)
+	test.RunVertexExpressModeImageGenerationRequestBodyTests(t)
+	test.RunVertexExpressModeImageGenerationResponseBodyTests(t)
+	// Vertex Raw 模式测试
+	test.RunVertexRawModeOnHttpRequestHeadersTests(t)
+	test.RunVertexRawModeOnHttpRequestBodyTests(t)
+	test.RunVertexRawModeOnHttpResponseBodyTests(t)
+}
+
+func TestBedrock(t *testing.T) {
+	test.RunBedrockParseConfigTests(t)
+	test.RunBedrockOnHttpRequestHeadersTests(t)
+	test.RunBedrockOnHttpRequestBodyTests(t)
+	test.RunBedrockOnHttpResponseHeadersTests(t)
+	test.RunBedrockOnHttpResponseBodyTests(t)
+	test.RunBedrockToolCallTests(t)
+}
+
+func TestClaude(t *testing.T) {
+	test.RunClaudeParseConfigTests(t)
+	test.RunClaudeOnHttpRequestHeadersTests(t)
+	test.RunClaudeOnHttpRequestBodyTests(t)
+}
+
+func TestConsumerAffinity(t *testing.T) {
+	test.RunConsumerAffinityParseConfigTests(t)
+	test.RunConsumerAffinityOnHttpRequestHeadersTests(t)
+}
--- a/plugins/wasm-go/extensions/ai-proxy/provider/azure.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/azure.go
@@ -206,7 +206,16 @@ func (m *azureProvider) transformRequestPath(ctx wrapper.HttpContext, apiName Ap
 		path = strings.ReplaceAll(path, pathAzureModelPlaceholder, model)
 		log.Debugf("azureProvider: model replaced path: %s", path)
 	}
-	path = path + "?" + m.serviceUrl.RawQuery
+	if !strings.Contains(path, "?") {
+		// No query string yet
+		path = path + "?" + m.serviceUrl.RawQuery
+	} else if strings.HasSuffix(path, "?") {
+		// Ends with "?" and has no query parameter
+		path = path + m.serviceUrl.RawQuery
+	} else {
+		// Has other query parameters
+		path = path + "&" + m.serviceUrl.RawQuery
+	}
 	log.Debugf("azureProvider: final path: %s", path)

 	return path
--- a/plugins/wasm-go/extensions/ai-proxy/provider/bedrock.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/bedrock.go
@@ -43,8 +43,11 @@ const (
 type bedrockProviderInitializer struct{}

 func (b *bedrockProviderInitializer) ValidateConfig(config *ProviderConfig) error {
-	if len(config.awsAccessKey) == 0 || len(config.awsSecretKey) == 0 {
-		return errors.New("missing bedrock access authentication parameters")
+	hasAkSk := len(config.awsAccessKey) > 0 && len(config.awsSecretKey) > 0
+	hasApiToken := len(config.apiTokens) > 0
+
+	if !hasAkSk && !hasApiToken {
+		return errors.New("missing bedrock access authentication parameters: either apiTokens or (awsAccessKey + awsSecretKey) is required")
 	}
 	if len(config.awsRegion) == 0 {
 		return errors.New("missing bedrock region parameters")
@@ -634,6 +637,13 @@ func (b *bedrockProvider) OnRequestHeaders(ctx wrapper.HttpContext, apiName ApiN

 func (b *bedrockProvider) TransformRequestHeaders(ctx wrapper.HttpContext, apiName ApiName, headers http.Header) {
 	util.OverwriteRequestHostHeader(headers, fmt.Sprintf(bedrockDefaultDomain, b.config.awsRegion))
+
+	// If apiTokens is configured, set Bearer token authentication here
+	// This follows the same pattern as other providers (qwen, zhipuai, etc.)
+	// AWS SigV4 authentication is handled in setAuthHeaders because it requires the request body
+	if len(b.config.apiTokens) > 0 {
+		util.OverwriteRequestAuthorizationHeader(headers, "Bearer "+b.config.GetApiTokenInUse(ctx))
+	}
 }

 func (b *bedrockProvider) OnRequestBody(ctx wrapper.HttpContext, apiName ApiName, body []byte) (types.Action, error) {
@@ -659,18 +669,18 @@ func (b *bedrockProvider) TransformResponseBody(ctx wrapper.HttpContext, apiName
 	case ApiNameChatCompletion:
 		return b.onChatCompletionResponseBody(ctx, body)
 	case ApiNameImageGeneration:
-		return b.onImageGenerationResponseBody(ctx, body)
+		return b.onImageGenerationResponseBody(body)
 	}
 	return nil, errUnsupportedApiName
 }

-func (b *bedrockProvider) onImageGenerationResponseBody(ctx wrapper.HttpContext, body []byte) ([]byte, error) {
+func (b *bedrockProvider) onImageGenerationResponseBody(body []byte) ([]byte, error) {
 	bedrockResponse := &bedrockImageGenerationResponse{}
 	if err := json.Unmarshal(body, bedrockResponse); err != nil {
 		log.Errorf("unable to unmarshal bedrock image gerneration response: %v", err)
 		return nil, fmt.Errorf("unable to unmarshal bedrock image generation response: %v", err)
 	}
-	response := b.buildBedrockImageGenerationResponse(ctx, bedrockResponse)
+	response := b.buildBedrockImageGenerationResponse(bedrockResponse)
 	return json.Marshal(response)
 }

@@ -710,7 +720,7 @@ func (b *bedrockProvider) buildBedrockImageGenerationRequest(origRequest *imageG
 	return requestBytes, err
 }

-func (b *bedrockProvider) buildBedrockImageGenerationResponse(ctx wrapper.HttpContext, bedrockResponse *bedrockImageGenerationResponse) *imageGenerationResponse {
+func (b *bedrockProvider) buildBedrockImageGenerationResponse(bedrockResponse *bedrockImageGenerationResponse) *imageGenerationResponse {
 	data := make([]imageGenerationData, len(bedrockResponse.Images))
 	for i, image := range bedrockResponse.Images {
 		data[i] = imageGenerationData{
@@ -759,7 +769,15 @@ func (b *bedrockProvider) buildBedrockTextGenerationRequest(origRequest *chatCom
 		case roleSystem:
 			systemMessages = append(systemMessages, systemContentBlock{Text: msg.StringContent()})
 		case roleTool:
-			messages = append(messages, chatToolMessage2BedrockMessage(msg))
+			toolResultContent := chatToolMessage2BedrockToolResultContent(msg)
+			if len(messages) > 0 && messages[len(messages)-1].Role == roleUser && messages[len(messages)-1].Content[0].ToolResult != nil {
+				messages[len(messages)-1].Content = append(messages[len(messages)-1].Content, toolResultContent)
+			} else {
+				messages = append(messages, bedrockMessage{
+					Role:    roleUser,
+					Content: []bedrockMessageContent{toolResultContent},
+				})
+			}
 		default:
 			messages = append(messages, chatMessage2BedrockMessage(msg))
 		}
@@ -1050,7 +1068,7 @@ type tokenUsage struct {
 	TotalTokens int `json:"totalTokens"`
 }

-func chatToolMessage2BedrockMessage(chatMessage chatMessage) bedrockMessage {
+func chatToolMessage2BedrockToolResultContent(chatMessage chatMessage) bedrockMessageContent {
 	toolResultContent := &toolResultBlock{}
 	toolResultContent.ToolUseId = chatMessage.ToolCallId
 	if text, ok := chatMessage.Content.(string); ok {
@@ -1073,29 +1091,29 @@ func chatToolMessage2BedrockMessage(chatMessage chatMessage) bedrockMessage {
 	} else {
 		log.Warnf("the content type is not supported, current content is %v", chatMessage.Content)
 	}
-	return bedrockMessage{
-		Role: roleUser,
-		Content: []bedrockMessageContent{
-			{
-				ToolResult: toolResultContent,
-			},
-		},
+	return bedrockMessageContent{
+		ToolResult: toolResultContent,
 	}
 }

 func chatMessage2BedrockMessage(chatMessage chatMessage) bedrockMessage {
 	var result bedrockMessage
 	if len(chatMessage.ToolCalls) > 0 {
+		contents := make([]bedrockMessageContent, 0, len(chatMessage.ToolCalls))
+		for _, toolCall := range chatMessage.ToolCalls {
+			params := map[string]interface{}{}
+			json.Unmarshal([]byte(toolCall.Function.Arguments), &params)
+			contents = append(contents, bedrockMessageContent{
+				ToolUse: &toolUseBlock{
+					Input:     params,
+					Name:      toolCall.Function.Name,
+					ToolUseId: toolCall.Id,
+				},
+			})
+		}
 		result = bedrockMessage{
 			Role:    chatMessage.Role,
-			Content: []bedrockMessageContent{{}},
-		}
-		params := map[string]interface{}{}
-		json.Unmarshal([]byte(chatMessage.ToolCalls[0].Function.Arguments), &params)
-		result.Content[0].ToolUse = &toolUseBlock{
-			Input:     params,
-			Name:      chatMessage.ToolCalls[0].Function.Name,
-			ToolUseId: chatMessage.ToolCalls[0].Id,
+			Content: contents,
 		}
 	} else if chatMessage.IsStringContent() {
 		result = bedrockMessage{
@@ -1138,6 +1156,13 @@ func chatMessage2BedrockMessage(chatMessage chatMessage) bedrockMessage {
 }

 func (b *bedrockProvider) setAuthHeaders(body []byte, headers http.Header) {
+	// Bearer token authentication is already set in TransformRequestHeaders
+	// This function only handles AWS SigV4 authentication which requires the request body
+	if len(b.config.apiTokens) > 0 {
+		return
+	}
+
+	// Use AWS Signature V4 authentication
 	t := time.Now().UTC()
 	amzDate := t.Format("20060102T150405Z")
 	dateStamp := t.Format("20060102")
--- a/plugins/wasm-go/extensions/ai-proxy/provider/claude.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/claude.go
@@ -19,6 +19,11 @@ const (
 	claudeDomain           = "api.anthropic.com"
 	claudeDefaultVersion   = "2023-06-01"
 	claudeDefaultMaxTokens = 4096
+
+	// Claude Code mode constants
+	claudeCodeUserAgent    = "claude-cli/2.1.2 (external, cli)"
+	claudeCodeBetaFeatures = "oauth-2025-04-20,interleaved-thinking-2025-05-14,claude-code-20250219"
+	claudeCodeSystemPrompt = "You are Claude Code, Anthropic's official CLI for Claude."
 )

 type claudeProviderInitializer struct{}
@@ -68,8 +73,8 @@ type claudeChatMessageContent struct {
 	Name  string                 `json:"name,omitempty"`  // For tool_use
 	Input map[string]interface{} `json:"input,omitempty"` // For tool_use
 	// Tool result fields
-	ToolUseId string                     `json:"tool_use_id,omitempty"` // For tool_result
-	Content   claudeChatMessageContentWr `json:"content,omitempty"`     // For tool_result - can be string or array
+	ToolUseId string                      `json:"tool_use_id,omitempty"` // For tool_result
+	Content   *claudeChatMessageContentWr `json:"content,omitempty"`     // For tool_result - can be string or array
 }

 // UnmarshalJSON implements custom JSON unmarshaling for claudeChatMessageContentWr
@@ -232,13 +237,13 @@ type claudeTextGenResponse struct {
 }

 type claudeTextGenContent struct {
-	Type      string                 `json:"type,omitempty"`
-	Text      string                 `json:"text,omitempty"`
-	Id        string                 `json:"id,omitempty"`        // For tool_use
-	Name      string                 `json:"name,omitempty"`      // For tool_use
-	Input     map[string]interface{} `json:"input,omitempty"`     // For tool_use
-	Signature string                 `json:"signature,omitempty"` // For thinking
-	Thinking  string                 `json:"thinking,omitempty"`  // For thinking
+	Type      string                  `json:"type,omitempty"`
+	Text      *string                 `json:"text,omitempty"`      // Use pointer: empty string outputs "text":"", nil omits field
+	Id        string                  `json:"id,omitempty"`        // For tool_use
+	Name      string                  `json:"name,omitempty"`      // For tool_use
+	Input     *map[string]interface{} `json:"input,omitempty"`     // Use pointer: empty map outputs "input":{}, nil omits field
+	Signature *string                 `json:"signature,omitempty"` // For thinking - use pointer for empty string output
+	Thinking  *string                 `json:"thinking,omitempty"`  // For thinking - use pointer for empty string output
 }

 type claudeTextGenUsage struct {
@@ -264,11 +269,12 @@ type claudeTextGenStreamResponse struct {
 }

 type claudeTextGenDelta struct {
-	Type         string  `json:"type"`
-	Text         string  `json:"text,omitempty"`
-	PartialJson  string  `json:"partial_json,omitempty"`
-	StopReason   *string `json:"stop_reason,omitempty"`
-	StopSequence *string `json:"stop_sequence,omitempty"`
+	Type         string          `json:"type,omitempty"`
+	Text         string          `json:"text,omitempty"`
+	Thinking     string          `json:"thinking,omitempty"`
+	PartialJson  string          `json:"partial_json,omitempty"`
+	StopReason   *string         `json:"stop_reason,omitempty"`
+	StopSequence json.RawMessage `json:"stop_sequence,omitempty"` // Use RawMessage to output explicit null
 }

 func (c *claudeProviderInitializer) ValidateConfig(config *ProviderConfig) error {
@@ -319,13 +325,36 @@ func (c *claudeProvider) TransformRequestHeaders(ctx wrapper.HttpContext, apiNam
 	util.OverwriteRequestPathHeaderByCapability(headers, string(apiName), c.config.capabilities)
 	util.OverwriteRequestHostHeader(headers, claudeDomain)

-	headers.Set("x-api-key", c.config.GetApiTokenInUse(ctx))
-
 	if c.config.apiVersion == "" {
 		c.config.apiVersion = claudeDefaultVersion
 	}
-
 	headers.Set("anthropic-version", c.config.apiVersion)
+
+	// Check if Claude Code mode is enabled
+	if c.config.claudeCodeMode {
+		// Claude Code mode: use OAuth token with Bearer authorization
+		token := c.config.GetApiTokenInUse(ctx)
+		headers.Set("authorization", "Bearer "+token)
+		headers.Del("x-api-key")
+
+		// Set Claude Code specific headers
+		headers.Set("user-agent", claudeCodeUserAgent)
+		headers.Set("x-app", "cli")
+		headers.Set("anthropic-beta", claudeCodeBetaFeatures)
+
+		// Add ?beta=true query parameter to the path
+		currentPath := headers.Get(":path")
+		if currentPath != "" && !strings.Contains(currentPath, "beta=true") {
+			if strings.Contains(currentPath, "?") {
+				headers.Set(":path", currentPath+"&beta=true")
+			} else {
+				headers.Set(":path", currentPath+"?beta=true")
+			}
+		}
+	} else {
+		// Standard mode: use x-api-key
+		headers.Set("x-api-key", c.config.GetApiTokenInUse(ctx))
+	}
 }

 func (c *claudeProvider) OnRequestBody(ctx wrapper.HttpContext, apiName ApiName, body []byte) (types.Action, error) {
@@ -413,18 +442,158 @@ func (c *claudeProvider) buildClaudeTextGenRequest(origRequest *chatCompletionRe
 		claudeRequest.MaxTokens = claudeDefaultMaxTokens
 	}

+	// Convert OpenAI reasoning parameters to Claude thinking configuration
+	if origRequest.ReasoningEffort != "" || origRequest.ReasoningMaxTokens > 0 {
+		var budgetTokens int
+		if origRequest.ReasoningMaxTokens > 0 {
+			budgetTokens = origRequest.ReasoningMaxTokens
+		} else {
+			// Convert reasoning_effort to budget_tokens
+			switch origRequest.ReasoningEffort {
+			case "low":
+				budgetTokens = 1024 // Minimum required by Claude
+			case "medium":
+				budgetTokens = 8192
+			case "high":
+				budgetTokens = 16384
+			default:
+				budgetTokens = 8192 // Default to medium
+			}
+		}
+		// Ensure minimum budget_tokens requirement
+		if budgetTokens < 1024 {
+			budgetTokens = 1024
+		}
+		claudeRequest.Thinking = &claudeThinkingConfig{
+			Type:         "enabled",
+			BudgetTokens: budgetTokens,
+		}
+	}
+
+	// Track if system message exists in original request
+	hasSystemMessage := false
 	for _, message := range origRequest.Messages {
 		if message.Role == roleSystem {
-			claudeRequest.System = &claudeSystemPrompt{
-				StringValue: message.StringContent(),
-				IsArray:     false,
+			hasSystemMessage = true
+			// In Claude Code mode, use array format with cache_control
+			if c.config.claudeCodeMode {
+				claudeRequest.System = &claudeSystemPrompt{
+					ArrayValue: []claudeChatMessageContent{
+						{
+							Type: contentTypeText,
+							Text: message.StringContent(),
+							CacheControl: map[string]interface{}{
+								"type": "ephemeral",
+							},
+						},
+					},
+					IsArray: true,
+				}
+			} else {
+				claudeRequest.System = &claudeSystemPrompt{
+					StringValue: message.StringContent(),
+					IsArray:     false,
+				}
 			}
 			continue
 		}

+		// Handle OpenAI "tool" role messages - convert to Claude "user" role with tool_result content
+		if message.Role == roleTool {
+			toolResultContent := claudeChatMessageContent{
+				Type:      "tool_result",
+				ToolUseId: message.ToolCallId,
+			}
+			// Tool result content can be string or array
+			if message.IsStringContent() {
+				toolResultContent.Content = &claudeChatMessageContentWr{
+					StringValue: message.StringContent(),
+					IsString:    true,
+				}
+			} else {
+				// For array content, extract text parts
+				var textParts []string
+				for _, part := range message.ParseContent() {
+					if part.Type == contentTypeText {
+						textParts = append(textParts, part.Text)
+					}
+				}
+				toolResultContent.Content = &claudeChatMessageContentWr{
+					StringValue: strings.Join(textParts, "\n"),
+					IsString:    true,
+				}
+			}
+
+			// Check if the last message is a user message with tool_result, merge if so
+			if len(claudeRequest.Messages) > 0 {
+				lastMsg := &claudeRequest.Messages[len(claudeRequest.Messages)-1]
+				if lastMsg.Role == roleUser && !lastMsg.Content.IsString {
+					// Check if last message contains tool_result
+					hasToolResult := false
+					for _, content := range lastMsg.Content.ArrayValue {
+						if content.Type == "tool_result" {
+							hasToolResult = true
+							break
+						}
+					}
+					if hasToolResult {
+						// Merge with existing tool_result message
+						lastMsg.Content.ArrayValue = append(lastMsg.Content.ArrayValue, toolResultContent)
+						continue
+					}
+				}
+			}
+
+			// Create new user message with tool_result
+			claudeMessage := claudeChatMessage{
+				Role:    roleUser,
+				Content: NewArrayContent([]claudeChatMessageContent{toolResultContent}),
+			}
+			claudeRequest.Messages = append(claudeRequest.Messages, claudeMessage)
+			continue
+		}
+
 		claudeMessage := claudeChatMessage{
 			Role: message.Role,
 		}
+
+		// Handle assistant messages with tool_calls - convert to Claude tool_use content blocks
+		if message.Role == roleAssistant && len(message.ToolCalls) > 0 {
+			chatMessageContents := make([]claudeChatMessageContent, 0)
+
+			// Add text content if present
+			if message.IsStringContent() && message.StringContent() != "" {
+				chatMessageContents = append(chatMessageContents, claudeChatMessageContent{
+					Type: contentTypeText,
+					Text: message.StringContent(),
+				})
+			}
+
+			// Convert tool_calls to tool_use content blocks
+			for _, tc := range message.ToolCalls {
+				var inputMap map[string]interface{}
+				if tc.Function.Arguments != "" {
+					if err := json.Unmarshal([]byte(tc.Function.Arguments), &inputMap); err != nil {
+						log.Errorf("failed to parse tool call arguments: %v", err)
+						inputMap = make(map[string]interface{})
+					}
+				} else {
+					inputMap = make(map[string]interface{})
+				}
+
+				chatMessageContents = append(chatMessageContents, claudeChatMessageContent{
+					Type:  "tool_use",
+					Id:    tc.Id,
+					Name:  tc.Function.Name,
+					Input: inputMap,
+				})
+			}
+
+			claudeMessage.Content = NewArrayContent(chatMessageContents)
+			claudeRequest.Messages = append(claudeRequest.Messages, claudeMessage)
+			continue
+		}
+
 		if message.IsStringContent() {
 			claudeMessage.Content = NewStringContent(message.StringContent())
 		} else {
@@ -478,6 +647,22 @@ func (c *claudeProvider) buildClaudeTextGenRequest(origRequest *chatCompletionRe
 		claudeRequest.Messages = append(claudeRequest.Messages, claudeMessage)
 	}

+	// In Claude Code mode, add default system prompt if not present
+	if c.config.claudeCodeMode && !hasSystemMessage {
+		claudeRequest.System = &claudeSystemPrompt{
+			ArrayValue: []claudeChatMessageContent{
+				{
+					Type: contentTypeText,
+					Text: claudeCodeSystemPrompt,
+					CacheControl: map[string]interface{}{
+						"type": "ephemeral",
+					},
+				},
+			},
+			IsArray: true,
+		}
+	}
+
 	for _, tool := range origRequest.Tools {
 		claudeTool := claudeTool{
 			Name:        tool.Function.Name,
@@ -499,9 +684,41 @@ func (c *claudeProvider) buildClaudeTextGenRequest(origRequest *chatCompletionRe
 }

 func (c *claudeProvider) responseClaude2OpenAI(ctx wrapper.HttpContext, origResponse *claudeTextGenResponse) *chatCompletionResponse {
+	// Extract text content, thinking content, and tool calls from Claude response
+	var textContent string
+	var reasoningContent string
+	var toolCalls []toolCall
+	for _, content := range origResponse.Content {
+		switch content.Type {
+		case contentTypeText:
+			if content.Text != nil {
+				textContent = *content.Text
+			}
+		case "thinking":
+			if content.Thinking != nil {
+				reasoningContent = *content.Thinking
+			}
+		case "tool_use":
+			var args []byte
+			if content.Input != nil {
+				args, _ = json.Marshal(*content.Input)
+			} else {
+				args = []byte("{}")
+			}
+			toolCalls = append(toolCalls, toolCall{
+				Id:   content.Id,
+				Type: "function",
+				Function: functionCall{
+					Name:      content.Name,
+					Arguments: string(args),
+				},
+			})
+		}
+	}
+
 	choice := chatCompletionChoice{
 		Index:        0,
-		Message:      &chatMessage{Role: roleAssistant, Content: origResponse.Content[0].Text},
+		Message:      &chatMessage{Role: roleAssistant, Content: textContent, ReasoningContent: reasoningContent, ToolCalls: toolCalls},
 		FinishReason: util.Ptr(stopReasonClaude2OpenAI(origResponse.StopReason)),
 	}

@@ -537,6 +754,8 @@ func stopReasonClaude2OpenAI(reason *string) string {
 		return finishReasonStop
 	case "max_tokens":
 		return finishReasonLength
+	case "tool_use":
+		return finishReasonToolCall
 	default:
 		return *reason
 	}
@@ -563,11 +782,64 @@ func (c *claudeProvider) streamResponseClaude2OpenAI(ctx wrapper.HttpContext, or
 		}
 		return c.createChatCompletionResponse(ctx, origResponse, choice)

+	case "content_block_start":
+		// Handle tool_use content block start
+		if origResponse.ContentBlock != nil && origResponse.ContentBlock.Type == "tool_use" {
+			var index int
+			if origResponse.Index != nil {
+				index = *origResponse.Index
+			}
+			choice := chatCompletionChoice{
+				Index: index,
+				Delta: &chatMessage{
+					ToolCalls: []toolCall{
+						{
+							Index: index,
+							Id:    origResponse.ContentBlock.Id,
+							Type:  "function",
+							Function: functionCall{
+								Name:      origResponse.ContentBlock.Name,
+								Arguments: "",
+							},
+						},
+					},
+				},
+			}
+			return c.createChatCompletionResponse(ctx, origResponse, choice)
+		}
+		return nil
+
 	case "content_block_delta":
 		var index int
 		if origResponse.Index != nil {
 			index = *origResponse.Index
 		}
+		// Handle tool_use input_json_delta
+		if origResponse.Delta != nil && origResponse.Delta.Type == "input_json_delta" {
+			choice := chatCompletionChoice{
+				Index: index,
+				Delta: &chatMessage{
+					ToolCalls: []toolCall{
+						{
+							Index: index,
+							Function: functionCall{
+								Arguments: origResponse.Delta.PartialJson,
+							},
+						},
+					},
+				},
+			}
+			return c.createChatCompletionResponse(ctx, origResponse, choice)
+		}
+		// Handle thinking_delta
+		if origResponse.Delta != nil && origResponse.Delta.Type == "thinking_delta" {
+			choice := chatCompletionChoice{
+				Index: index,
+				Delta: &chatMessage{Reasoning: origResponse.Delta.Thinking},
+			}
+			return c.createChatCompletionResponse(ctx, origResponse, choice)
+		}
+		// Handle text_delta
 		choice := chatCompletionChoice{
 			Index: index,
 			Delta: &chatMessage{Content: origResponse.Delta.Text},
@@ -604,7 +876,7 @@ func (c *claudeProvider) streamResponseClaude2OpenAI(ctx wrapper.HttpContext, or
 				TotalTokens:      c.usage.TotalTokens,
 			},
 		}
-	case "content_block_stop", "ping", "content_block_start":
+	case "content_block_stop", "ping":
 		log.Debugf("skip processing response type: %s", origResponse.Type)
 		return nil
 	default:
--- a/plugins/wasm-go/extensions/ai-proxy/provider/claude_test.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/claude_test.go
@@ -0,0 +1,421 @@
+package provider
+
+import (
+	"encoding/json"
+	"strings"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestClaudeProviderInitializer_ValidateConfig(t *testing.T) {
+	initializer := &claudeProviderInitializer{}
+
+	t.Run("valid_config_with_api_tokens", func(t *testing.T) {
+		config := &ProviderConfig{
+			apiTokens: []string{"test-token"},
+		}
+		err := initializer.ValidateConfig(config)
+		assert.NoError(t, err)
+	})
+
+	t.Run("invalid_config_without_api_tokens", func(t *testing.T) {
+		config := &ProviderConfig{
+			apiTokens: nil,
+		}
+		err := initializer.ValidateConfig(config)
+		assert.Error(t, err)
+		assert.Contains(t, err.Error(), "no apiToken found in provider config")
+	})
+
+	t.Run("invalid_config_with_empty_api_tokens", func(t *testing.T) {
+		config := &ProviderConfig{
+			apiTokens: []string{},
+		}
+		err := initializer.ValidateConfig(config)
+		assert.Error(t, err)
+		assert.Contains(t, err.Error(), "no apiToken found in provider config")
+	})
+}
+
+func TestClaudeProviderInitializer_DefaultCapabilities(t *testing.T) {
+	initializer := &claudeProviderInitializer{}
+
+	capabilities := initializer.DefaultCapabilities()
+	expected := map[string]string{
+		string(ApiNameChatCompletion):    PathAnthropicMessages,
+		string(ApiNameCompletion):        PathAnthropicComplete,
+		string(ApiNameAnthropicMessages): PathAnthropicMessages,
+		string(ApiNameEmbeddings):        PathOpenAIEmbeddings,
+		string(ApiNameModels):            PathOpenAIModels,
+	}
+
+	assert.Equal(t, expected, capabilities)
+}
+
+func TestClaudeProviderInitializer_CreateProvider(t *testing.T) {
+	initializer := &claudeProviderInitializer{}
+
+	config := ProviderConfig{
+		apiTokens: []string{"test-token"},
+	}
+
+	provider, err := initializer.CreateProvider(config)
+	require.NoError(t, err)
+	require.NotNil(t, provider)
+
+	assert.Equal(t, providerTypeClaude, provider.GetProviderType())
+
+	claudeProvider, ok := provider.(*claudeProvider)
+	require.True(t, ok)
+	assert.NotNil(t, claudeProvider.config.apiTokens)
+	assert.Equal(t, []string{"test-token"}, claudeProvider.config.apiTokens)
+}
+
+func TestClaudeProvider_GetProviderType(t *testing.T) {
+	provider := &claudeProvider{
+		config: ProviderConfig{
+			apiTokens: []string{"test-token"},
+		},
+		contextCache: createContextCache(&ProviderConfig{}),
+	}
+
+	assert.Equal(t, providerTypeClaude, provider.GetProviderType())
+}
+
+// Note: TransformRequestHeaders tests are skipped because they require WASM runtime
+// The header transformation logic is tested via integration tests instead.
+// Here we test the helper functions and logic that can be unit tested.
+
+func TestClaudeCodeMode_HeaderLogic(t *testing.T) {
+	// Test the logic for adding beta=true query parameter
+	t.Run("adds_beta_query_param_to_path_without_query", func(t *testing.T) {
+		currentPath := "/v1/messages"
+		var newPath string
+		if currentPath != "" && !strings.Contains(currentPath, "beta=true") {
+			if strings.Contains(currentPath, "?") {
+				newPath = currentPath + "&beta=true"
+			} else {
+				newPath = currentPath + "?beta=true"
+			}
+		} else {
+			newPath = currentPath
+		}
+		assert.Equal(t, "/v1/messages?beta=true", newPath)
+	})
+
+	t.Run("adds_beta_query_param_to_path_with_existing_query", func(t *testing.T) {
+		currentPath := "/v1/messages?foo=bar"
+		var newPath string
+		if currentPath != "" && !strings.Contains(currentPath, "beta=true") {
+			if strings.Contains(currentPath, "?") {
+				newPath = currentPath + "&beta=true"
+			} else {
+				newPath = currentPath + "?beta=true"
+			}
+		} else {
+			newPath = currentPath
+		}
+		assert.Equal(t, "/v1/messages?foo=bar&beta=true", newPath)
+	})
+
+	t.Run("does_not_duplicate_beta_param", func(t *testing.T) {
+		currentPath := "/v1/messages?beta=true"
+		var newPath string
+		if currentPath != "" && !strings.Contains(currentPath, "beta=true") {
+			if strings.Contains(currentPath, "?") {
+				newPath = currentPath + "&beta=true"
+			} else {
+				newPath = currentPath + "?beta=true"
+			}
+		} else {
+			newPath = currentPath
+		}
+		assert.Equal(t, "/v1/messages?beta=true", newPath)
+	})
+
+	t.Run("bearer_token_format", func(t *testing.T) {
+		token := "sk-ant-oat01-oauth-token"
+		bearerAuth := "Bearer " + token
+		assert.Equal(t, "Bearer sk-ant-oat01-oauth-token", bearerAuth)
+	})
+}
+
+func TestClaudeProvider_BuildClaudeTextGenRequest_StandardMode(t *testing.T) {
+	provider := &claudeProvider{
+		config: ProviderConfig{
+			claudeCodeMode: false,
+		},
+	}
+
+	t.Run("builds_request_without_injecting_defaults", func(t *testing.T) {
+		request := &chatCompletionRequest{
+			Model:     "claude-sonnet-4-5-20250929",
+			MaxTokens: 8192,
+			Stream:    true,
+			Messages: []chatMessage{
+				{Role: roleUser, Content: "Hello"},
+			},
+		}
+
+		claudeReq := provider.buildClaudeTextGenRequest(request)
+
+		// Should not have system prompt injected
+		assert.Nil(t, claudeReq.System)
+		// Should not have tools injected
+		assert.Empty(t, claudeReq.Tools)
+	})
+
+	t.Run("preserves_existing_system_message", func(t *testing.T) {
+		request := &chatCompletionRequest{
+			Model:     "claude-sonnet-4-5-20250929",
+			MaxTokens: 8192,
+			Messages: []chatMessage{
+				{Role: roleSystem, Content: "You are a helpful assistant."},
+				{Role: roleUser, Content: "Hello"},
+			},
+		}
+
+		claudeReq := provider.buildClaudeTextGenRequest(request)
+
+		assert.NotNil(t, claudeReq.System)
+		assert.False(t, claudeReq.System.IsArray)
+		assert.Equal(t, "You are a helpful assistant.", claudeReq.System.StringValue)
+	})
+}
+
+func TestClaudeProvider_BuildClaudeTextGenRequest_ClaudeCodeMode(t *testing.T) {
+	provider := &claudeProvider{
+		config: ProviderConfig{
+			claudeCodeMode: true,
+		},
+	}
+
+	t.Run("injects_default_system_prompt_when_missing", func(t *testing.T) {
+		request := &chatCompletionRequest{
+			Model:     "claude-sonnet-4-5-20250929",
+			MaxTokens: 8192,
+			Stream:    true,
+			Messages: []chatMessage{
+				{Role: roleUser, Content: "List files"},
+			},
+		}
+
+		claudeReq := provider.buildClaudeTextGenRequest(request)
+
+		// Should have default Claude Code system prompt
+		require.NotNil(t, claudeReq.System)
+		assert.True(t, claudeReq.System.IsArray)
+		require.Len(t, claudeReq.System.ArrayValue, 1)
+		assert.Equal(t, claudeCodeSystemPrompt, claudeReq.System.ArrayValue[0].Text)
+		assert.Equal(t, contentTypeText, claudeReq.System.ArrayValue[0].Type)
+		// Should have cache_control
+		assert.NotNil(t, claudeReq.System.ArrayValue[0].CacheControl)
+		assert.Equal(t, "ephemeral", claudeReq.System.ArrayValue[0].CacheControl["type"])
+	})
+
+	t.Run("preserves_existing_system_message_with_cache_control", func(t *testing.T) {
+		request := &chatCompletionRequest{
+			Model:     "claude-sonnet-4-5-20250929",
+			MaxTokens: 8192,
+			Messages: []chatMessage{
+				{Role: roleSystem, Content: "Custom system prompt"},
+				{Role: roleUser, Content: "Hello"},
+			},
+		}
+
+		claudeReq := provider.buildClaudeTextGenRequest(request)
+
+		// Should preserve custom system prompt but with array format and cache_control
+		require.NotNil(t, claudeReq.System)
+		assert.True(t, claudeReq.System.IsArray)
+		require.Len(t, claudeReq.System.ArrayValue, 1)
+		assert.Equal(t, "Custom system prompt", claudeReq.System.ArrayValue[0].Text)
+		// Should have cache_control
+		assert.NotNil(t, claudeReq.System.ArrayValue[0].CacheControl)
+		assert.Equal(t, "ephemeral", claudeReq.System.ArrayValue[0].CacheControl["type"])
+	})
+
+	t.Run("full_request_transformation", func(t *testing.T) {
+		request := &chatCompletionRequest{
+			Model:       "claude-sonnet-4-5-20250929",
+			MaxTokens:   8192,
+			Stream:      true,
+			Temperature: 1.0,
+			Messages: []chatMessage{
+				{Role: roleUser, Content: "List files in current directory"},
+			},
+		}
+
+		claudeReq := provider.buildClaudeTextGenRequest(request)
+
+		// Verify complete request structure
+		assert.Equal(t, "claude-sonnet-4-5-20250929", claudeReq.Model)
+		assert.Equal(t, 8192, claudeReq.MaxTokens)
+		assert.True(t, claudeReq.Stream)
+		assert.Equal(t, 1.0, claudeReq.Temperature)
+
+		// Verify system prompt
+		require.NotNil(t, claudeReq.System)
+		assert.True(t, claudeReq.System.IsArray)
+		assert.Equal(t, claudeCodeSystemPrompt, claudeReq.System.ArrayValue[0].Text)
+
+		// Verify messages
+		require.Len(t, claudeReq.Messages, 1)
+		assert.Equal(t, roleUser, claudeReq.Messages[0].Role)
+
+		// Verify no tools are injected by default
+		assert.Empty(t, claudeReq.Tools)
+
+		// Verify the request can be serialized to JSON
+		jsonBytes, err := json.Marshal(claudeReq)
+		require.NoError(t, err)
+		assert.NotEmpty(t, jsonBytes)
+	})
+}
+
+// Note: TransformRequestBody tests are skipped because they require WASM runtime
+// The request body transformation is tested indirectly through buildClaudeTextGenRequest tests
+
+// Test constants
+func TestClaudeConstants(t *testing.T) {
+	assert.Equal(t, "api.anthropic.com", claudeDomain)
+	assert.Equal(t, "2023-06-01", claudeDefaultVersion)
+	assert.Equal(t, 4096, claudeDefaultMaxTokens)
+	assert.Equal(t, "claude", providerTypeClaude)
+
+	// Claude Code mode constants
+	assert.Equal(t, "claude-cli/2.1.2 (external, cli)", claudeCodeUserAgent)
+	assert.Equal(t, "oauth-2025-04-20,interleaved-thinking-2025-05-14,claude-code-20250219", claudeCodeBetaFeatures)
+	assert.Equal(t, "You are Claude Code, Anthropic's official CLI for Claude.", claudeCodeSystemPrompt)
+}
+
+func TestClaudeProvider_GetApiName(t *testing.T) {
+	provider := &claudeProvider{}
+
+	t.Run("messages_path", func(t *testing.T) {
+		assert.Equal(t, ApiNameChatCompletion, provider.GetApiName("/v1/messages"))
+		assert.Equal(t, ApiNameChatCompletion, provider.GetApiName("/api/v1/messages"))
+	})
+
+	t.Run("complete_path", func(t *testing.T) {
+		assert.Equal(t, ApiNameCompletion, provider.GetApiName("/v1/complete"))
+	})
+
+	t.Run("models_path", func(t *testing.T) {
+		assert.Equal(t, ApiNameModels, provider.GetApiName("/v1/models"))
+	})
+
+	t.Run("embeddings_path", func(t *testing.T) {
+		assert.Equal(t, ApiNameEmbeddings, provider.GetApiName("/v1/embeddings"))
+	})
+
+	t.Run("unknown_path", func(t *testing.T) {
+		assert.Equal(t, ApiName(""), provider.GetApiName("/unknown"))
+	})
+}
+
+func TestClaudeProvider_BuildClaudeTextGenRequest_ToolRoleConversion(t *testing.T) {
+	provider := &claudeProvider{
+		config: ProviderConfig{
+			claudeCodeMode: false,
+		},
+	}
+
+	t.Run("converts_single_tool_role_to_user_with_tool_result", func(t *testing.T) {
+		request := &chatCompletionRequest{
+			Model:     "claude-sonnet-4-5-20250929",
+			MaxTokens: 1024,
+			Messages: []chatMessage{
+				{Role: roleUser, Content: "What's the weather?"},
+				{Role: roleAssistant, Content: nil, ToolCalls: []toolCall{
+					{Id: "call_123", Type: "function", Function: functionCall{Name: "get_weather", Arguments: `{"city": "Beijing"}`}},
+				}},
+				{Role: roleTool, ToolCallId: "call_123", Content: "Sunny, 25°C"},
+			},
+		}
+
+		claudeReq := provider.buildClaudeTextGenRequest(request)
+
+		// Should have 3 messages: user, assistant with tool_use, user with tool_result
+		require.Len(t, claudeReq.Messages, 3)
+
+		// First message should be user
+		assert.Equal(t, roleUser, claudeReq.Messages[0].Role)
+
+		// Second message should be assistant with tool_use
+		assert.Equal(t, roleAssistant, claudeReq.Messages[1].Role)
+		require.False(t, claudeReq.Messages[1].Content.IsString)
+		require.Len(t, claudeReq.Messages[1].Content.ArrayValue, 1)
+		assert.Equal(t, "tool_use", claudeReq.Messages[1].Content.ArrayValue[0].Type)
+		assert.Equal(t, "call_123", claudeReq.Messages[1].Content.ArrayValue[0].Id)
+		assert.Equal(t, "get_weather", claudeReq.Messages[1].Content.ArrayValue[0].Name)
+
+		// Third message should be user with tool_result
+		assert.Equal(t, roleUser, claudeReq.Messages[2].Role)
+		require.False(t, claudeReq.Messages[2].Content.IsString)
+		require.Len(t, claudeReq.Messages[2].Content.ArrayValue, 1)
+		assert.Equal(t, "tool_result", claudeReq.Messages[2].Content.ArrayValue[0].Type)
+		assert.Equal(t, "call_123", claudeReq.Messages[2].Content.ArrayValue[0].ToolUseId)
+	})
+
+	t.Run("merges_multiple_tool_results_into_single_user_message", func(t *testing.T) {
+		request := &chatCompletionRequest{
+			Model:     "claude-sonnet-4-5-20250929",
+			MaxTokens: 1024,
+			Messages: []chatMessage{
+				{Role: roleUser, Content: "What's the weather and time?"},
+				{Role: roleAssistant, Content: nil, ToolCalls: []toolCall{
+					{Id: "call_1", Type: "function", Function: functionCall{Name: "get_weather", Arguments: `{"city": "Beijing"}`}},
+					{Id: "call_2", Type: "function", Function: functionCall{Name: "get_time", Arguments: `{"timezone": "Asia/Shanghai"}`}},
+				}},
+				{Role: roleTool, ToolCallId: "call_1", Content: "Sunny, 25°C"},
+				{Role: roleTool, ToolCallId: "call_2", Content: "3:00 PM"},
+			},
+		}
+
+		claudeReq := provider.buildClaudeTextGenRequest(request)
+
+		// Should have 3 messages: user, assistant with 2 tool_use, user with 2 tool_results
+		require.Len(t, claudeReq.Messages, 3)
+
+		// Assistant message should have 2 tool_use blocks
+		require.Len(t, claudeReq.Messages[1].Content.ArrayValue, 2)
+		assert.Equal(t, "tool_use", claudeReq.Messages[1].Content.ArrayValue[0].Type)
+		assert.Equal(t, "tool_use", claudeReq.Messages[1].Content.ArrayValue[1].Type)
+
+		// User message should have 2 tool_result blocks merged
+		assert.Equal(t, roleUser, claudeReq.Messages[2].Role)
+		require.Len(t, claudeReq.Messages[2].Content.ArrayValue, 2)
+		assert.Equal(t, "tool_result", claudeReq.Messages[2].Content.ArrayValue[0].Type)
+		assert.Equal(t, "call_1", claudeReq.Messages[2].Content.ArrayValue[0].ToolUseId)
+		assert.Equal(t, "tool_result", claudeReq.Messages[2].Content.ArrayValue[1].Type)
+		assert.Equal(t, "call_2", claudeReq.Messages[2].Content.ArrayValue[1].ToolUseId)
+	})
+
+	t.Run("handles_assistant_tool_calls_with_text_content", func(t *testing.T) {
+		request := &chatCompletionRequest{
+			Model:     "claude-sonnet-4-5-20250929",
+			MaxTokens: 1024,
+			Messages: []chatMessage{
+				{Role: roleUser, Content: "What's the weather?"},
+				{Role: roleAssistant, Content: "Let me check the weather for you.", ToolCalls: []toolCall{
+					{Id: "call_123", Type: "function", Function: functionCall{Name: "get_weather", Arguments: `{"city": "Beijing"}`}},
+				}},
+			},
+		}
+
+		claudeReq := provider.buildClaudeTextGenRequest(request)
+
+		require.Len(t, claudeReq.Messages, 2)
+
+		// Assistant message should have both text and tool_use
+		assert.Equal(t, roleAssistant, claudeReq.Messages[1].Role)
+		require.False(t, claudeReq.Messages[1].Content.IsString)
+		require.Len(t, claudeReq.Messages[1].Content.ArrayValue, 2)
+		assert.Equal(t, contentTypeText, claudeReq.Messages[1].Content.ArrayValue[0].Type)
+		assert.Equal(t, "Let me check the weather for you.", claudeReq.Messages[1].Content.ArrayValue[0].Text)
+		assert.Equal(t, "tool_use", claudeReq.Messages[1].Content.ArrayValue[1].Type)
+	})
+}
--- a/plugins/wasm-go/extensions/ai-proxy/provider/claude_to_openai.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/claude_to_openai.go
@@ -119,6 +119,15 @@ func (c *ClaudeToOpenAIConverter) ConvertClaudeRequestToOpenAI(body []byte) ([]b
 					}
 					openaiRequest.Messages = append(openaiRequest.Messages, toolMsg)
 				}
+				// Also add text content if present alongside tool results
+				// This handles cases like: [tool_result, tool_result, text]
+				if len(conversionResult.textParts) > 0 {
+					textMsg := chatMessage{
+						Role:    claudeMsg.Role,
+						Content: strings.Join(conversionResult.textParts, "\n\n"),
+					}
+					openaiRequest.Messages = append(openaiRequest.Messages, textMsg)
+				}
 			}

 			// Handle regular content if no tool calls or tool results
@@ -136,7 +145,8 @@ func (c *ClaudeToOpenAIConverter) ConvertClaudeRequestToOpenAI(body []byte) ([]b
 	if claudeRequest.System != nil {
 		systemMsg := chatMessage{Role: roleSystem}
 		if !claudeRequest.System.IsArray {
-			systemMsg.Content = claudeRequest.System.StringValue
+			// Strip dynamic cch field from billing header to enable caching
+			systemMsg.Content = stripCchFromBillingHeader(claudeRequest.System.StringValue)
 		} else {
 			conversionResult := c.convertContentArray(claudeRequest.System.ArrayValue)
 			systemMsg.Content = conversionResult.openaiContents
@@ -183,6 +193,7 @@ func (c *ClaudeToOpenAIConverter) ConvertClaudeRequestToOpenAI(body []byte) ([]b

 		if claudeRequest.Thinking.Type == "enabled" {
 			openaiRequest.ReasoningMaxTokens = claudeRequest.Thinking.BudgetTokens
+			openaiRequest.Thinking = &thinkingParam{Type: "enabled", BudgetToken: claudeRequest.Thinking.BudgetTokens}

 			// Set ReasoningEffort based on budget_tokens
 			// low: <4096, medium: >=4096 and <16384, high: >=16384
@@ -198,7 +209,10 @@ func (c *ClaudeToOpenAIConverter) ConvertClaudeRequestToOpenAI(body []byte) ([]b
 				claudeRequest.Thinking.BudgetTokens, openaiRequest.ReasoningEffort, openaiRequest.ReasoningMaxTokens)
 		}
 	} else {
-		log.Debugf("[Claude->OpenAI] No thinking config found")
+		// Explicitly disable thinking when not configured in Claude request
+		// This prevents providers like ZhipuAI from enabling thinking by default
+		openaiRequest.Thinking = &thinkingParam{Type: "disabled"}
+		log.Debugf("[Claude->OpenAI] No thinking config found, explicitly disabled")
 	}

 	result, err := json.Marshal(openaiRequest)
@@ -253,19 +267,21 @@ func (c *ClaudeToOpenAIConverter) ConvertOpenAIResponseToClaude(ctx wrapper.Http
 			}

 			if reasoningText != "" {
+				emptySignature := ""
 				contents = append(contents, claudeTextGenContent{
 					Type:      "thinking",
-					Signature: "", // OpenAI doesn't provide signature, use empty string
-					Thinking:  reasoningText,
+					Signature: &emptySignature, // Use pointer for empty string
+					Thinking:  &reasoningText,
 				})
 				log.Debugf("[OpenAI->Claude] Added thinking content: %s", reasoningText)
 			}

 			// Add text content if present
 			if choice.Message.StringContent() != "" {
+				textContent := choice.Message.StringContent()
 				contents = append(contents, claudeTextGenContent{
 					Type: "text",
-					Text: choice.Message.StringContent(),
+					Text: &textContent,
 				})
 			}

@@ -288,7 +304,7 @@ func (c *ClaudeToOpenAIConverter) ConvertOpenAIResponseToClaude(ctx wrapper.Http
 							Type:  "tool_use",
 							Id:    toolCall.Id,
 							Name:  toolCall.Function.Name,
-							Input: input,
+							Input: &input,
 						})
 					}
 				}
@@ -338,7 +354,7 @@ func (c *ClaudeToOpenAIConverter) ConvertOpenAIStreamResponseToClaude(ctx wrappe
 						Index: &c.thinkingBlockIndex,
 					}
 					stopData, _ := json.Marshal(stopEvent)
-					result.WriteString(fmt.Sprintf("data: %s\n\n", stopData))
+					result.WriteString(fmt.Sprintf("event: %s\ndata: %s\n\n", stopEvent.Type, stopData))
 				}
 				if c.textBlockStarted && !c.textBlockStopped {
 					c.textBlockStopped = true
@@ -348,7 +364,7 @@ func (c *ClaudeToOpenAIConverter) ConvertOpenAIStreamResponseToClaude(ctx wrappe
 						Index: &c.textBlockIndex,
 					}
 					stopData, _ := json.Marshal(stopEvent)
-					result.WriteString(fmt.Sprintf("data: %s\n\n", stopData))
+					result.WriteString(fmt.Sprintf("event: %s\ndata: %s\n\n", stopEvent.Type, stopData))
 				}
 				// Send final content_block_stop events for any remaining unclosed tool calls
 				for index, toolCall := range c.toolCallStates {
@@ -360,7 +376,7 @@ func (c *ClaudeToOpenAIConverter) ConvertOpenAIStreamResponseToClaude(ctx wrappe
 							Index: &toolCall.claudeContentIndex,
 						}
 						stopData, _ := json.Marshal(stopEvent)
-						result.WriteString(fmt.Sprintf("data: %s\n\n", stopData))
+						result.WriteString(fmt.Sprintf("event: %s\ndata: %s\n\n", stopEvent.Type, stopData))
 					}
 				}

@@ -370,12 +386,12 @@ func (c *ClaudeToOpenAIConverter) ConvertOpenAIStreamResponseToClaude(ctx wrappe
 					messageDelta := &claudeTextGenStreamResponse{
 						Type: "message_delta",
 						Delta: &claudeTextGenDelta{
-							Type:       "message_delta",
-							StopReason: c.pendingStopReason,
+							StopReason:   c.pendingStopReason,
+							StopSequence: json.RawMessage("null"),
 						},
 					}
 					stopData, _ := json.Marshal(messageDelta)
-					result.WriteString(fmt.Sprintf("data: %s\n\n", stopData))
+					result.WriteString(fmt.Sprintf("event: %s\ndata: %s\n\n", messageDelta.Type, stopData))
 					c.pendingStopReason = nil
 				}

@@ -386,7 +402,7 @@ func (c *ClaudeToOpenAIConverter) ConvertOpenAIStreamResponseToClaude(ctx wrappe
 						Type: "message_stop",
 					}
 					stopData, _ := json.Marshal(messageStopEvent)
-					result.WriteString(fmt.Sprintf("data: %s\n\n", stopData))
+					result.WriteString(fmt.Sprintf("event: %s\ndata: %s\n\n", messageStopEvent.Type, stopData))
 				}

 				// Reset all state for next request
@@ -515,13 +531,14 @@ func (c *ClaudeToOpenAIConverter) buildClaudeStreamResponse(ctx wrapper.HttpCont
 			c.nextContentIndex++
 			c.thinkingBlockStarted = true
 			log.Debugf("[OpenAI->Claude] Generated content_block_start event for thinking at index %d", c.thinkingBlockIndex)
+			emptyStr := ""
 			responses = append(responses, &claudeTextGenStreamResponse{
 				Type:  "content_block_start",
 				Index: &c.thinkingBlockIndex,
 				ContentBlock: &claudeTextGenContent{
 					Type:      "thinking",
-					Signature: "", // OpenAI doesn't provide signature
-					Thinking:  "",
+					Signature: &emptyStr, // Use pointer for empty string output
+					Thinking:  &emptyStr, // Use pointer for empty string output
 				},
 			})
 		}
@@ -532,8 +549,8 @@ func (c *ClaudeToOpenAIConverter) buildClaudeStreamResponse(ctx wrapper.HttpCont
 			Type:  "content_block_delta",
 			Index: &c.thinkingBlockIndex,
 			Delta: &claudeTextGenDelta{
-				Type: "thinking_delta", // Use thinking_delta for reasoning content
-				Text: reasoningText,
+				Type:     "thinking_delta",
+				Thinking: reasoningText, // Use Thinking field, not Text
 			},
 		})
 	}
@@ -564,12 +581,13 @@ func (c *ClaudeToOpenAIConverter) buildClaudeStreamResponse(ctx wrapper.HttpCont
 			c.nextContentIndex++
 			c.textBlockStarted = true
 			log.Debugf("[OpenAI->Claude] Generated content_block_start event for text at index %d", c.textBlockIndex)
+			emptyText := ""
 			responses = append(responses, &claudeTextGenStreamResponse{
 				Type:  "content_block_start",
 				Index: &c.textBlockIndex,
 				ContentBlock: &claudeTextGenContent{
 					Type: "text",
-					Text: "",
+					Text: &emptyText,
 				},
 			})
 		}
@@ -588,6 +606,30 @@ func (c *ClaudeToOpenAIConverter) buildClaudeStreamResponse(ctx wrapper.HttpCont

 	// Handle tool calls in streaming response
 	if choice.Delta != nil && len(choice.Delta.ToolCalls) > 0 {
+		// Ensure message_start is sent before any content blocks
+		if !c.messageStartSent {
+			c.messageId = openaiResponse.Id
+			c.messageStartSent = true
+			message := &claudeTextGenResponse{
+				Id:      openaiResponse.Id,
+				Type:    "message",
+				Role:    "assistant",
+				Model:   openaiResponse.Model,
+				Content: []claudeTextGenContent{},
+			}
+			if openaiResponse.Usage != nil {
+				message.Usage = claudeTextGenUsage{
+					InputTokens:  openaiResponse.Usage.PromptTokens,
+					OutputTokens: 0,
+				}
+			}
+			responses = append(responses, &claudeTextGenStreamResponse{
+				Type:    "message_start",
+				Message: message,
+			})
+			log.Debugf("[OpenAI->Claude] Generated message_start event before tool calls for id: %s", openaiResponse.Id)
+		}
+
 		// Initialize toolCallStates if needed
 		if c.toolCallStates == nil {
 			c.toolCallStates = make(map[int]*toolCallInfo)
@@ -722,7 +764,9 @@ func (c *ClaudeToOpenAIConverter) buildClaudeStreamResponse(ctx wrapper.HttpCont
 	}

 	// Handle usage information
-	if openaiResponse.Usage != nil && choice.FinishReason == nil {
+	// Note: Some providers may send usage in the same chunk as finish_reason,
+	// so we check for usage regardless of whether finish_reason is present
+	if openaiResponse.Usage != nil {
 		log.Debugf("[OpenAI->Claude] Processing usage info - input: %d, output: %d",
 			openaiResponse.Usage.PromptTokens, openaiResponse.Usage.CompletionTokens)

@@ -730,7 +774,7 @@ func (c *ClaudeToOpenAIConverter) buildClaudeStreamResponse(ctx wrapper.HttpCont
 		messageDelta := &claudeTextGenStreamResponse{
 			Type: "message_delta",
 			Delta: &claudeTextGenDelta{
-				Type: "message_delta",
+				StopSequence: json.RawMessage("null"), // Explicit null per Claude spec
 			},
 			Usage: &claudeTextGenUsage{
 				InputTokens:  openaiResponse.Usage.PromptTokens,
@@ -789,10 +833,12 @@ func (c *ClaudeToOpenAIConverter) convertContentArray(claudeContents []claudeCha
 		switch claudeContent.Type {
 		case "text":
 			if claudeContent.Text != "" {
-				result.textParts = append(result.textParts, claudeContent.Text)
+				// Strip dynamic cch field from billing header to enable caching
+				processedText := stripCchFromBillingHeader(claudeContent.Text)
+				result.textParts = append(result.textParts, processedText)
 				result.openaiContents = append(result.openaiContents, chatMessageContent{
 					Type:         contentTypeText,
-					Text:         claudeContent.Text,
+					Text:         processedText,
 					CacheControl: claudeContent.CacheControl,
 				})
 			}
@@ -884,6 +930,7 @@ func (c *ClaudeToOpenAIConverter) startToolCall(toolState *toolCallInfo) []*clau
 		toolState.claudeContentIndex, toolState.id, toolState.name)

 	// Send content_block_start
+	emptyInput := map[string]interface{}{}
 	responses = append(responses, &claudeTextGenStreamResponse{
 		Type:  "content_block_start",
 		Index: &toolState.claudeContentIndex,
@@ -891,7 +938,7 @@ func (c *ClaudeToOpenAIConverter) startToolCall(toolState *toolCallInfo) []*clau
 			Type:  "tool_use",
 			Id:    toolState.id,
 			Name:  toolState.name,
-			Input: map[string]interface{}{}, // Empty input as per Claude spec
+			Input: &emptyInput, // Empty input as per Claude spec
 		},
 	})

@@ -910,3 +957,42 @@ func (c *ClaudeToOpenAIConverter) startToolCall(toolState *toolCallInfo) []*clau

 	return responses
 }
+
+// stripCchFromBillingHeader removes the dynamic cch field from x-anthropic-billing-header text
+// to enable caching. The cch value changes on every request, which would break prompt caching.
+// Example input:  "x-anthropic-billing-header: cc_version=2.1.37.3a3; cc_entrypoint=claude-vscode; cch=abc123;"
+// Example output: "x-anthropic-billing-header: cc_version=2.1.37.3a3; cc_entrypoint=claude-vscode;"
+func stripCchFromBillingHeader(text string) string {
+	const billingHeaderPrefix = "x-anthropic-billing-header:"
+
+	// Check if this is a billing header
+	if !strings.HasPrefix(text, billingHeaderPrefix) {
+		return text
+	}
+
+	// Remove cch=xxx pattern (may appear with or without trailing semicolon)
+	// Pattern: ; cch=<any-non-semicolon-chars> followed by ; or end of string
+	result := text
+
+	// Try to find and remove ; cch=... pattern
+	// We need to handle both "; cch=xxx;" and "; cch=xxx" (at end)
+	for {
+		cchIdx := strings.Index(result, "; cch=")
+		if cchIdx == -1 {
+			break
+		}
+
+		// Find the end of cch value (next semicolon or end of string)
+		start := cchIdx + 2 // skip "; "
+		end := strings.Index(result[start:], ";")
+		if end == -1 {
+			// cch is at the end, remove from "; cch=" to end
+			result = result[:cchIdx]
+		} else {
+			// cch is followed by more content, remove "; cch=xxx" part
+			result = result[:cchIdx] + result[start+end:]
+		}
+	}
+
+	return result
+}
--- a/plugins/wasm-go/extensions/ai-proxy/provider/claude_to_openai_test.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/claude_to_openai_test.go
@@ -388,6 +388,7 @@ func TestClaudeToOpenAIConverter_ConvertClaudeRequestToOpenAI(t *testing.T) {

 	t.Run("convert_tool_result_with_actual_error_data", func(t *testing.T) {
 		// Test using the actual JSON data from the error log to ensure our fix works
+		// This tests the fix for issue #3344 - text content alongside tool_result should be preserved
 		claudeRequest := `{
 			"model": "anthropic/claude-sonnet-4", 
 			"messages": [{
@@ -415,14 +416,20 @@ func TestClaudeToOpenAIConverter_ConvertClaudeRequestToOpenAI(t *testing.T) {
 		err = json.Unmarshal(result, &openaiRequest)
 		require.NoError(t, err)

-		// Should have one tool message (the text content is included in the same message array)
-		require.Len(t, openaiRequest.Messages, 1)
+		// Should have two messages: tool message + user message with text content
+		// This is the fix for issue #3344 - text content alongside tool_result is preserved
+		require.Len(t, openaiRequest.Messages, 2)

-		// Should be tool message
+		// First should be tool message
 		toolMsg := openaiRequest.Messages[0]
 		assert.Equal(t, "tool", toolMsg.Role)
 		assert.Contains(t, toolMsg.Content, "three.js")
 		assert.Equal(t, "toolu_vrtx_01UbCfwoTgoDBqbYEwkVaxd5", toolMsg.ToolCallId)
+
+		// Second should be user message with text content
+		userMsg := openaiRequest.Messages[1]
+		assert.Equal(t, "user", userMsg.Role)
+		assert.Equal(t, "继续", userMsg.Content)
 	})

 	t.Run("convert_multiple_tool_calls", func(t *testing.T) {
@@ -617,7 +624,7 @@ func TestClaudeToOpenAIConverter_ConvertOpenAIResponseToClaude(t *testing.T) {
 		// First content should be text
 		textContent := claudeResponse.Content[0]
 		assert.Equal(t, "text", textContent.Type)
-		assert.Equal(t, "I'll analyze the README file to understand this project's purpose.", textContent.Text)
+		assert.Equal(t, "I'll analyze the README file to understand this project's purpose.", *textContent.Text)

 		// Second content should be tool_use
 		toolContent := claudeResponse.Content[1]
@@ -627,7 +634,7 @@ func TestClaudeToOpenAIConverter_ConvertOpenAIResponseToClaude(t *testing.T) {

 		// Verify tool arguments
 		require.NotNil(t, toolContent.Input)
-		assert.Equal(t, "/Users/zhangty/git/higress/README.md", toolContent.Input["file_path"])
+		assert.Equal(t, "/Users/zhangty/git/higress/README.md", (*toolContent.Input)["file_path"])
 	})
 }

@@ -830,21 +837,147 @@ func TestClaudeToOpenAIConverter_ConvertReasoningResponseToClaude(t *testing.T)
 				// First should be thinking
 				thinkingContent := claudeResponse.Content[0]
 				assert.Equal(t, "thinking", thinkingContent.Type)
-				assert.Equal(t, "", thinkingContent.Signature) // OpenAI doesn't provide signature
-				assert.Contains(t, thinkingContent.Thinking, "Let me think about this step by step")
+				require.NotNil(t, thinkingContent.Signature)
+				assert.Equal(t, "", *thinkingContent.Signature) // OpenAI doesn't provide signature
+				require.NotNil(t, thinkingContent.Thinking)
+				assert.Contains(t, *thinkingContent.Thinking, "Let me think about this step by step")

 				// Second should be text
 				textContent := claudeResponse.Content[1]
 				assert.Equal(t, "text", textContent.Type)
-				assert.Equal(t, tt.expectedText, textContent.Text)
+				require.NotNil(t, textContent.Text)
+				assert.Equal(t, tt.expectedText, *textContent.Text)
 			} else {
 				// Should only have text content
 				assert.Len(t, claudeResponse.Content, 1)

 				textContent := claudeResponse.Content[0]
 				assert.Equal(t, "text", textContent.Type)
-				assert.Equal(t, tt.expectedText, textContent.Text)
+				require.NotNil(t, textContent.Text)
+				assert.Equal(t, tt.expectedText, *textContent.Text)
 			}
 		})
 	}
 }
+
+func TestClaudeToOpenAIConverter_StripCchFromSystemMessage(t *testing.T) {
+	converter := &ClaudeToOpenAIConverter{}
+
+	t.Run("string_system_with_billing_header", func(t *testing.T) {
+		// Test that cch field is stripped from string format system message
+		claudeRequest := `{
+			"model": "claude-sonnet-4",
+			"max_tokens": 1024,
+			"system": [
+				{
+					"type": "text",
+					"text": "x-anthropic-billing-header: cc_version=2.1.37.3a3; cc_entrypoint=claude-vscode; cch=abc123;"
+				}
+			],
+			"messages": [{
+				"role": "user",
+				"content": "Hello"
+			}]
+		}`
+
+		result, err := converter.ConvertClaudeRequestToOpenAI([]byte(claudeRequest))
+		require.NoError(t, err)
+
+		var openaiRequest chatCompletionRequest
+		err = json.Unmarshal(result, &openaiRequest)
+		require.NoError(t, err)
+
+		require.Len(t, openaiRequest.Messages, 2)
+
+		// First message should be system with cch stripped
+		systemMsg := openaiRequest.Messages[0]
+		assert.Equal(t, "system", systemMsg.Role)
+
+		// The system content should have cch removed
+		contentArray, ok := systemMsg.Content.([]interface{})
+		require.True(t, ok, "System content should be an array")
+		require.Len(t, contentArray, 1)
+
+		contentMap, ok := contentArray[0].(map[string]interface{})
+		require.True(t, ok)
+		assert.Equal(t, "text", contentMap["type"])
+		assert.Equal(t, "x-anthropic-billing-header: cc_version=2.1.37.3a3; cc_entrypoint=claude-vscode;", contentMap["text"])
+		assert.NotContains(t, contentMap["text"], "cch=")
+	})
+
+	t.Run("plain_string_system_unchanged", func(t *testing.T) {
+		// Test that normal system messages are not modified
+		claudeRequest := `{
+			"model": "claude-sonnet-4",
+			"max_tokens": 1024,
+			"system": "You are a helpful assistant.",
+			"messages": [{
+				"role": "user",
+				"content": "Hello"
+			}]
+		}`
+
+		result, err := converter.ConvertClaudeRequestToOpenAI([]byte(claudeRequest))
+		require.NoError(t, err)
+
+		var openaiRequest chatCompletionRequest
+		err = json.Unmarshal(result, &openaiRequest)
+		require.NoError(t, err)
+
+		// First message should be system with original content
+		systemMsg := openaiRequest.Messages[0]
+		assert.Equal(t, "system", systemMsg.Role)
+		assert.Equal(t, "You are a helpful assistant.", systemMsg.Content)
+	})
+}
+
+func TestStripCchFromBillingHeader(t *testing.T) {
+	tests := []struct {
+		name     string
+		input    string
+		expected string
+	}{
+		{
+			name:     "billing header with cch at end",
+			input:    "x-anthropic-billing-header: cc_version=2.1.37.3a3; cc_entrypoint=claude-vscode; cch=abc123;",
+			expected: "x-anthropic-billing-header: cc_version=2.1.37.3a3; cc_entrypoint=claude-vscode;",
+		},
+		{
+			name:     "billing header with cch at end without trailing semicolon",
+			input:    "x-anthropic-billing-header: cc_version=2.1.37.3a3; cc_entrypoint=claude-vscode; cch=abc123",
+			expected: "x-anthropic-billing-header: cc_version=2.1.37.3a3; cc_entrypoint=claude-vscode",
+		},
+		{
+			name:     "billing header with cch in middle",
+			input:    "x-anthropic-billing-header: cc_version=2.1.37.3a3; cch=abc123; cc_entrypoint=claude-vscode;",
+			expected: "x-anthropic-billing-header: cc_version=2.1.37.3a3; cc_entrypoint=claude-vscode;",
+		},
+		{
+			name:     "billing header without cch",
+			input:    "x-anthropic-billing-header: cc_version=2.1.37.3a3; cc_entrypoint=claude-vscode;",
+			expected: "x-anthropic-billing-header: cc_version=2.1.37.3a3; cc_entrypoint=claude-vscode;",
+		},
+		{
+			name:     "non-billing header text unchanged",
+			input:    "This is a normal system prompt",
+			expected: "This is a normal system prompt",
+		},
+		{
+			name:     "empty string unchanged",
+			input:    "",
+			expected: "",
+		},
+		{
+			name:     "billing header with multiple cch fields",
+			input:    "x-anthropic-billing-header: cc_version=2.1.37.3a3; cch=first; cc_entrypoint=claude-vscode; cch=second;",
+			expected: "x-anthropic-billing-header: cc_version=2.1.37.3a3; cc_entrypoint=claude-vscode;",
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			result := stripCchFromBillingHeader(tt.input)
+			assert.Equal(t, tt.expected, result)
+		})
+	}
+}
--- a/plugins/wasm-go/extensions/ai-proxy/provider/failover.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/failover.go
@@ -605,7 +605,7 @@ func (c *ProviderConfig) SetApiTokenInUse(ctx wrapper.HttpContext) {
 	if c.isFailoverEnabled() {
 		apiToken = c.GetGlobalRandomToken()
 	} else {
-		apiToken = c.GetRandomToken()
+		apiToken = c.GetOrSetTokenWithContext(ctx)
 	}
 	log.Debugf("Use apiToken %s to send request", apiToken)
 	ctx.SetContext(c.failover.ctxApiTokenInUse, apiToken)
--- a/plugins/wasm-go/extensions/ai-proxy/provider/model.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/model.go
@@ -30,7 +30,13 @@ const (
 )

 type NonOpenAIStyleOptions struct {
-	ReasoningMaxTokens int `json:"reasoning_max_tokens,omitempty"`
+	ReasoningMaxTokens int            `json:"reasoning_max_tokens,omitempty"`
+	Thinking           *thinkingParam `json:"thinking,omitempty"`
+}
+
+type thinkingParam struct {
+	Type        string `json:"type,omitempty"`
+	BudgetToken int    `json:"budget_token,omitempty"`
 }

 type chatCompletionRequest struct {
--- a/plugins/wasm-go/extensions/ai-proxy/provider/openai.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/openai.go
@@ -7,6 +7,7 @@ import (
 	"strings"

 	"github.com/alibaba/higress/plugins/wasm-go/extensions/ai-proxy/util"
+	"github.com/higress-group/proxy-wasm-go-sdk/proxywasm"
 	"github.com/higress-group/proxy-wasm-go-sdk/proxywasm/types"
 	"github.com/higress-group/wasm-go/pkg/log"
 	"github.com/higress-group/wasm-go/pkg/wrapper"
@@ -134,8 +135,63 @@ func (m *openaiProvider) TransformRequestHeaders(ctx wrapper.HttpContext, apiNam
 	} else {
 		util.OverwriteRequestHostHeader(headers, defaultOpenaiDomain)
 	}
+
+	var token string
+
+	// 1. If apiTokens is configured, use it first
 	if len(m.config.apiTokens) > 0 {
-		util.OverwriteRequestAuthorizationHeader(headers, "Bearer "+m.config.GetApiTokenInUse(ctx))
+		token = m.config.GetApiTokenInUse(ctx)
+		if token == "" {
+			log.Warnf("[openaiProvider.TransformRequestHeaders] apiTokens count > 0 but GetApiTokenInUse returned empty")
+		}
+	} else {
+		// If no apiToken is configured, try to extract from original request headers
+
+		// 2. If authHeaderKey is configured, use the specified header
+		if m.config.authHeaderKey != "" {
+			if apiKey, err := proxywasm.GetHttpRequestHeader(m.config.authHeaderKey); err == nil && apiKey != "" {
+				token = apiKey
+				log.Debugf("[openaiProvider.TransformRequestHeaders] Using token from configured header: %s", m.config.authHeaderKey)
+			}
+		}
+
+		// 3. If authHeaderKey is not configured, check default headers in priority order
+		if token == "" {
+			defaultHeaders := []string{"x-api-key", "x-authorization"}
+			for _, headerName := range defaultHeaders {
+				if apiKey, err := proxywasm.GetHttpRequestHeader(headerName); err == nil && apiKey != "" {
+					token = apiKey
+					log.Debugf("[openaiProvider.TransformRequestHeaders] Using token from %s header", headerName)
+					break
+				}
+			}
+		}
+
+		// 4. Finally check Authorization header
+		if token == "" {
+			if auth, err := proxywasm.GetHttpRequestHeader("Authorization"); err == nil && auth != "" {
+				// Extract token from "Bearer <token>" format
+				if strings.HasPrefix(auth, "Bearer ") {
+					token = strings.TrimPrefix(auth, "Bearer ")
+					log.Debugf("[openaiProvider.TransformRequestHeaders] Using token from Authorization header (Bearer format)")
+				} else {
+					token = auth
+					log.Debugf("[openaiProvider.TransformRequestHeaders] Using token from Authorization header (no Bearer prefix)")
+				}
+			}
+		}
+	}
+
+	// 5. Set Authorization header (avoid duplicate Bearer prefix)
+	if token != "" {
+		// Check if token already contains Bearer prefix
+		if !strings.HasPrefix(token, "Bearer ") {
+			token = "Bearer " + token
+		}
+		util.OverwriteRequestAuthorizationHeader(headers, token)
+		log.Debugf("[openaiProvider.TransformRequestHeaders] Set Authorization header successfully")
+	} else {
+		log.Warnf("[openaiProvider.TransformRequestHeaders] No auth token available - neither configured in apiTokens nor in request headers")
 	}
 	headers.Del("Content-Length")
 }
--- a/plugins/wasm-go/extensions/ai-proxy/provider/provider.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/provider.go
@@ -2,8 +2,10 @@ package provider

 import (
 	"bytes"
+	"encoding/json"
 	"errors"
 	"fmt"
+	"hash/fnv"
 	"math/rand"
 	"net/http"
 	"path"
@@ -70,6 +72,7 @@ const (
 	ApiNameGeminiStreamGenerateContent ApiName = "gemini/v1beta/streamgeneratecontent"
 	ApiNameAnthropicMessages           ApiName = "anthropic/v1/messages"
 	ApiNameAnthropicComplete           ApiName = "anthropic/v1/complete"
+	ApiNameVertexRaw                   ApiName = "vertex/raw"

 	// OpenAI
 	PathOpenAIPrefix                               = "/v1"
@@ -150,6 +153,7 @@ const (
 	protocolOriginal = "original"

 	roleSystem    = "system"
+	roleDeveloper = "developer"
 	roleAssistant = "assistant"
 	roleUser      = "user"
 	roleTool      = "tool"
@@ -192,6 +196,12 @@ type providerInitializer interface {
 var (
 	errUnsupportedApiName = errors.New("unsupported API name")

+	// Providers that support the "developer" role. Other providers will have "developer" roles converted to "system".
+	developerRoleSupportedProviders = map[string]bool{
+		providerTypeOpenAI: true,
+		providerTypeAzure:  true,
+	}
+
 	providerInitializers = map[string]providerInitializer{
 		providerTypeMoonshot:   &moonshotProviderInitializer{},
 		providerTypeAzure:      &azureProviderInitializer{},
@@ -387,12 +397,18 @@ type ProviderConfig struct {
 	// @Title zh-CN Vertex token刷新提前时间
 	// @Description zh-CN 用于Google服务账号认证，access token过期时间判定提前刷新，单位为秒，默认值为60秒
 	vertexTokenRefreshAhead int64 `required:"false" yaml:"vertexTokenRefreshAhead" json:"vertexTokenRefreshAhead"`
+	// @Title zh-CN Vertex AI OpenAI兼容模式
+	// @Description zh-CN 启用后将使用Vertex AI的OpenAI兼容API，请求和响应均使用OpenAI格式，无需协议转换。与Express Mode(apiTokens)互斥。
+	vertexOpenAICompatible bool `required:"false" yaml:"vertexOpenAICompatible" json:"vertexOpenAICompatible"`
 	// @Title zh-CN 翻译服务需指定的目标语种
 	// @Description zh-CN 翻译结果的语种，目前仅适用于DeepL服务。
 	targetLang string `required:"false" yaml:"targetLang" json:"targetLang"`
 	// @Title zh-CN  指定服务返回的响应需满足的JSON Schema
 	// @Description zh-CN 目前仅适用于OpenAI部分模型服务。参考：https://platform.openai.com/docs/guides/structured-outputs
 	responseJsonSchema map[string]interface{} `required:"false" yaml:"responseJsonSchema" json:"responseJsonSchema"`
+	// @Title zh-CN 自定义认证Header名称
+	// @Description zh-CN 用于从请求中提取认证token的自定义header名称。如不配置，则按默认优先级检查 x-api-key、x-authorization、anthropic-api-key 和 Authorization header。
+	authHeaderKey string `required:"false" yaml:"authHeaderKey" json:"authHeaderKey"`
 	// @Title zh-CN 自定义大模型参数配置
 	// @Description zh-CN 用于填充或者覆盖大模型调用时的参数
 	customSettings []CustomSetting
@@ -414,6 +430,9 @@ type ProviderConfig struct {
 	// @Title zh-CN generic Provider 对应的Host
 	// @Description zh-CN 仅适用于generic provider，用于覆盖请求转发的目标Host
 	genericHost string `required:"false" yaml:"genericHost" json:"genericHost"`
+	// @Title zh-CN 上下文清理命令
+	// @Description zh-CN 配置清理命令文本列表，当请求的 messages 中存在完全匹配任意一个命令的 user 消息时，将该消息及之前所有非 system 消息清理掉，实现主动清理上下文的效果
+	contextCleanupCommands []string `required:"false" yaml:"contextCleanupCommands" json:"contextCleanupCommands"`
 	// @Title zh-CN 首包超时
 	// @Description zh-CN 流式请求中收到上游服务第一个响应包的超时时间，单位为毫秒。默认值为 0，表示不开启首包超时
 	firstByteTimeout uint32 `required:"false" yaml:"firstByteTimeout" json:"firstByteTimeout"`
@@ -432,6 +451,15 @@ type ProviderConfig struct {
 	// @Title zh-CN 豆包服务域名
 	// @Description zh-CN 仅适用于豆包服务，默认转发域名为 ark.cn-beijing.volces.com
 	doubaoDomain string `required:"false" yaml:"doubaoDomain" json:"doubaoDomain"`
+	// @Title zh-CN Claude Code 模式
+	// @Description zh-CN 仅适用于Claude服务。启用后将伪装成Claude Code客户端发起请求，支持使用Claude Code的OAuth Token进行认证。
+	claudeCodeMode bool `required:"false" yaml:"claudeCodeMode" json:"claudeCodeMode"`
+	// @Title zh-CN 智谱AI服务域名
+	// @Description zh-CN 仅适用于智谱AI服务。默认为 open.bigmodel.cn（中国），可配置为 api.z.ai（国际）
+	zhipuDomain string `required:"false" yaml:"zhipuDomain" json:"zhipuDomain"`
+	// @Title zh-CN 智谱AI Code Plan 模式
+	// @Description zh-CN 仅适用于智谱AI服务。启用后将使用 /api/coding/paas/v4/chat/completions 接口
+	zhipuCodePlanMode bool `required:"false" yaml:"zhipuCodePlanMode" json:"zhipuCodePlanMode"`
 }

 func (c *ProviderConfig) GetId() string {
@@ -454,6 +482,10 @@ func (c *ProviderConfig) GetVllmServerHost() string {
 	return c.vllmServerHost
 }

+func (c *ProviderConfig) GetContextCleanupCommands() []string {
+	return c.contextCleanupCommands
+}
+
 func (c *ProviderConfig) IsOpenAIProtocol() bool {
 	return c.protocol == protocolOpenAI
 }
@@ -540,6 +572,7 @@ func (c *ProviderConfig) FromJson(json gjson.Result) {
 	if c.vertexTokenRefreshAhead == 0 {
 		c.vertexTokenRefreshAhead = 60
 	}
+	c.vertexOpenAICompatible = json.Get("vertexOpenAICompatible").Bool()
 	c.targetLang = json.Get("targetLang").String()

 	if schemaValue, ok := json.Get("responseJsonSchema").Value().(map[string]interface{}); ok {
@@ -631,6 +664,15 @@ func (c *ProviderConfig) FromJson(json gjson.Result) {
 	c.vllmServerHost = json.Get("vllmServerHost").String()
 	c.vllmCustomUrl = json.Get("vllmCustomUrl").String()
 	c.doubaoDomain = json.Get("doubaoDomain").String()
+	c.claudeCodeMode = json.Get("claudeCodeMode").Bool()
+	c.zhipuDomain = json.Get("zhipuDomain").String()
+	c.zhipuCodePlanMode = json.Get("zhipuCodePlanMode").Bool()
+	c.contextCleanupCommands = make([]string, 0)
+	for _, cmd := range json.Get("contextCleanupCommands").Array() {
+		if cmd.String() != "" {
+			c.contextCleanupCommands = append(c.contextCleanupCommands, cmd.String())
+		}
+	}
 }

 func (c *ProviderConfig) Validate() error {
@@ -665,12 +707,45 @@ func (c *ProviderConfig) Validate() error {
 func (c *ProviderConfig) GetOrSetTokenWithContext(ctx wrapper.HttpContext) string {
 	ctxApiKey := ctx.GetContext(ctxKeyApiKey)
 	if ctxApiKey == nil {
-		ctxApiKey = c.GetRandomToken()
+		token := c.selectApiToken(ctx)
+		ctxApiKey = token
 		ctx.SetContext(ctxKeyApiKey, ctxApiKey)
 	}
 	return ctxApiKey.(string)
 }

+// selectApiToken selects an API token based on the request context
+// For stateful APIs, it uses consumer affinity if available
+func (c *ProviderConfig) selectApiToken(ctx wrapper.HttpContext) string {
+	// Get API name from context if available
+	ctxApiName := ctx.GetContext(CtxKeyApiName)
+	var apiName string
+	if ctxApiName != nil {
+		// ctxApiName is of type ApiName, need to convert to string
+		apiName = string(ctxApiName.(ApiName))
+	}
+
+	// For stateful APIs, try to use consumer affinity
+	if isStatefulAPI(apiName) {
+		consumer := c.getConsumerFromContext(ctx)
+		if consumer != "" {
+			return c.GetTokenWithConsumerAffinity(ctx, consumer)
+		}
+	}
+
+	// Fall back to random selection
+	return c.GetRandomToken()
+}
+
+// getConsumerFromContext retrieves the consumer identifier from the request context
+func (c *ProviderConfig) getConsumerFromContext(ctx wrapper.HttpContext) string {
+	consumer, err := proxywasm.GetHttpRequestHeader("x-mse-consumer")
+	if err == nil && consumer != "" {
+		return consumer
+	}
+	return ""
+}
+
 func (c *ProviderConfig) GetRandomToken() string {
 	apiTokens := c.apiTokens
 	count := len(apiTokens)
@@ -684,6 +759,50 @@ func (c *ProviderConfig) GetRandomToken() string {
 	}
 }

+// isStatefulAPI checks if the given API name is a stateful API that requires consumer affinity
+func isStatefulAPI(apiName string) bool {
+	// These APIs maintain session state and should be routed to the same provider consistently
+	statefulAPIs := map[string]bool{
+		string(ApiNameResponses):                 true, // Response API - uses previous_response_id
+		string(ApiNameFiles):                     true, // Files API - maintains file state
+		string(ApiNameRetrieveFile):              true, // File retrieval - depends on file upload
+		string(ApiNameRetrieveFileContent):       true, // File content - depends on file upload
+		string(ApiNameBatches):                   true, // Batch API - maintains batch state
+		string(ApiNameRetrieveBatch):             true, // Batch status - depends on batch creation
+		string(ApiNameCancelBatch):               true, // Batch operations - depends on batch state
+		string(ApiNameFineTuningJobs):            true, // Fine-tuning - maintains job state
+		string(ApiNameRetrieveFineTuningJob):     true, // Fine-tuning job status
+		string(ApiNameFineTuningJobEvents):       true, // Fine-tuning events
+		string(ApiNameFineTuningJobCheckpoints):  true, // Fine-tuning checkpoints
+		string(ApiNameCancelFineTuningJob):       true, // Cancel fine-tuning job
+		string(ApiNameResumeFineTuningJob):       true, // Resume fine-tuning job
+	}
+	return statefulAPIs[apiName]
+}
+
+// GetTokenWithConsumerAffinity selects an API token based on consumer affinity
+// If x-mse-consumer header is present and API is stateful, it will consistently select the same token
+func (c *ProviderConfig) GetTokenWithConsumerAffinity(ctx wrapper.HttpContext, consumer string) string {
+	apiTokens := c.apiTokens
+	count := len(apiTokens)
+	switch count {
+	case 0:
+		return ""
+	case 1:
+		return apiTokens[0]
+	default:
+		// Use FNV-1a hash for consistent token selection
+		h := fnv.New32a()
+		h.Write([]byte(consumer))
+		hashValue := h.Sum32()
+		index := int(hashValue) % count
+		if index < 0 {
+			index += count
+		}
+		return apiTokens[index]
+	}
+}
+
 func (c *ProviderConfig) IsOriginal() bool {
 	return c.protocol == protocolOriginal
 }
@@ -813,6 +932,34 @@ func doGetMappedModel(model string, modelMapping map[string]string) string {
 	return ""
 }

+// isDeveloperRoleSupported checks if the provider supports the "developer" role.
+func isDeveloperRoleSupported(providerType string) bool {
+	return developerRoleSupportedProviders[providerType]
+}
+
+// convertDeveloperRoleToSystem converts "developer" roles to "system" role in the request body.
+// This is used for providers that don't support the "developer" role.
+func convertDeveloperRoleToSystem(body []byte) ([]byte, error) {
+	request := &chatCompletionRequest{}
+	if err := json.Unmarshal(body, request); err != nil {
+		return body, fmt.Errorf("unable to unmarshal request for developer role conversion: %v", err)
+	}
+
+	converted := false
+	for i := range request.Messages {
+		if request.Messages[i].Role == roleDeveloper {
+			request.Messages[i].Role = roleSystem
+			converted = true
+		}
+	}
+
+	if converted {
+		return json.Marshal(request)
+	}
+
+	return body, nil
+}
+
 func ExtractStreamingEvents(ctx wrapper.HttpContext, chunk []byte) []StreamEvent {
 	body := chunk
 	if bufferedStreamingBody, has := ctx.GetContext(ctxKeyStreamingBody).([]byte); has {
@@ -941,6 +1088,28 @@ func (c *ProviderConfig) handleRequestBody(
 		log.Debugf("[Auto Protocol] converted Claude request body to OpenAI format")
 	}

+	// handle context cleanup command for chat completion requests
+	if apiName == ApiNameChatCompletion && len(c.contextCleanupCommands) > 0 {
+		body, err = cleanupContextMessages(body, c.contextCleanupCommands)
+		if err != nil {
+			log.Warnf("[contextCleanup] failed to cleanup context messages: %v", err)
+			// Continue processing even if cleanup fails
+			err = nil
+		}
+	}
+
+	// convert developer role to system role for providers that don't support it
+	if apiName == ApiNameChatCompletion && !isDeveloperRoleSupported(c.typ) {
+		body, err = convertDeveloperRoleToSystem(body)
+		if err != nil {
+			log.Warnf("[developerRole] failed to convert developer role to system: %v", err)
+			// Continue processing even if conversion fails
+			err = nil
+		} else {
+			log.Debugf("[developerRole] converted developer role to system for provider: %s", c.typ)
+		}
+	}
+
 	// use openai protocol (either original openai or converted from claude)
 	if handler, ok := provider.(TransformRequestBodyHandler); ok {
 		body, err = handler.TransformRequestBody(ctx, apiName, body)
--- a/plugins/wasm-go/extensions/ai-proxy/provider/provider_test.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/provider_test.go
@@ -0,0 +1,275 @@
+package provider
+
+import (
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+)
+
+func TestIsStatefulAPI(t *testing.T) {
+	tests := []struct {
+		name     string
+		apiName  string
+		expected bool
+	}{
+		// Stateful APIs - should return true
+		{
+			name:     "responses_api",
+			apiName:  string(ApiNameResponses),
+			expected: true,
+		},
+		{
+			name:     "files_api",
+			apiName:  string(ApiNameFiles),
+			expected: true,
+		},
+		{
+			name:     "retrieve_file_api",
+			apiName:  string(ApiNameRetrieveFile),
+			expected: true,
+		},
+		{
+			name:     "retrieve_file_content_api",
+			apiName:  string(ApiNameRetrieveFileContent),
+			expected: true,
+		},
+		{
+			name:     "batches_api",
+			apiName:  string(ApiNameBatches),
+			expected: true,
+		},
+		{
+			name:     "retrieve_batch_api",
+			apiName:  string(ApiNameRetrieveBatch),
+			expected: true,
+		},
+		{
+			name:     "cancel_batch_api",
+			apiName:  string(ApiNameCancelBatch),
+			expected: true,
+		},
+		{
+			name:     "fine_tuning_jobs_api",
+			apiName:  string(ApiNameFineTuningJobs),
+			expected: true,
+		},
+		{
+			name:     "retrieve_fine_tuning_job_api",
+			apiName:  string(ApiNameRetrieveFineTuningJob),
+			expected: true,
+		},
+		{
+			name:     "fine_tuning_job_events_api",
+			apiName:  string(ApiNameFineTuningJobEvents),
+			expected: true,
+		},
+		{
+			name:     "fine_tuning_job_checkpoints_api",
+			apiName:  string(ApiNameFineTuningJobCheckpoints),
+			expected: true,
+		},
+		{
+			name:     "cancel_fine_tuning_job_api",
+			apiName:  string(ApiNameCancelFineTuningJob),
+			expected: true,
+		},
+		{
+			name:     "resume_fine_tuning_job_api",
+			apiName:  string(ApiNameResumeFineTuningJob),
+			expected: true,
+		},
+		// Non-stateful APIs - should return false
+		{
+			name:     "chat_completion_api",
+			apiName:  string(ApiNameChatCompletion),
+			expected: false,
+		},
+		{
+			name:     "completion_api",
+			apiName:  string(ApiNameCompletion),
+			expected: false,
+		},
+		{
+			name:     "embeddings_api",
+			apiName:  string(ApiNameEmbeddings),
+			expected: false,
+		},
+		{
+			name:     "models_api",
+			apiName:  string(ApiNameModels),
+			expected: false,
+		},
+		{
+			name:     "image_generation_api",
+			apiName:  string(ApiNameImageGeneration),
+			expected: false,
+		},
+		{
+			name:     "audio_speech_api",
+			apiName:  string(ApiNameAudioSpeech),
+			expected: false,
+		},
+		// Empty/unknown API - should return false
+		{
+			name:     "empty_api_name",
+			apiName:  "",
+			expected: false,
+		},
+		{
+			name:     "unknown_api_name",
+			apiName:  "unknown/api",
+			expected: false,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			result := isStatefulAPI(tt.apiName)
+			assert.Equal(t, tt.expected, result)
+		})
+	}
+}
+
+func TestGetTokenWithConsumerAffinity(t *testing.T) {
+	tests := []struct {
+		name       string
+		apiTokens  []string
+		consumer   string
+		wantEmpty  bool
+		wantToken  string // If not empty, expected specific token (for single token case)
+	}{
+		{
+			name:      "no_tokens_returns_empty",
+			apiTokens: []string{},
+			consumer:  "consumer1",
+			wantEmpty: true,
+		},
+		{
+			name:      "nil_tokens_returns_empty",
+			apiTokens: nil,
+			consumer:  "consumer1",
+			wantEmpty: true,
+		},
+		{
+			name:      "single_token_always_returns_same_token",
+			apiTokens: []string{"token1"},
+			consumer:  "consumer1",
+			wantToken: "token1",
+		},
+		{
+			name:      "single_token_with_different_consumer",
+			apiTokens: []string{"token1"},
+			consumer:  "consumer2",
+			wantToken: "token1",
+		},
+		{
+			name:      "multiple_tokens_consistent_for_same_consumer",
+			apiTokens: []string{"token1", "token2", "token3"},
+			consumer:  "consumer1",
+			wantEmpty: false, // Will get one of the tokens, consistently
+		},
+		{
+			name:      "multiple_tokens_different_consumers_may_get_different_tokens",
+			apiTokens: []string{"token1", "token2"},
+			consumer:  "consumerA",
+			wantEmpty: false,
+		},
+	}
+
+	for _, tt := range tests {
+		t.Run(tt.name, func(t *testing.T) {
+			config := &ProviderConfig{
+				apiTokens: tt.apiTokens,
+			}
+
+			result := config.GetTokenWithConsumerAffinity(nil, tt.consumer)
+
+			if tt.wantEmpty {
+				assert.Empty(t, result)
+			} else if tt.wantToken != "" {
+				assert.Equal(t, tt.wantToken, result)
+			} else {
+				assert.NotEmpty(t, result)
+				assert.Contains(t, tt.apiTokens, result)
+			}
+		})
+	}
+}
+
+func TestGetTokenWithConsumerAffinity_Consistency(t *testing.T) {
+	// Test that the same consumer always gets the same token (consistency)
+	config := &ProviderConfig{
+		apiTokens: []string{"token1", "token2", "token3", "token4", "token5"},
+	}
+
+	t.Run("same_consumer_gets_same_token_repeatedly", func(t *testing.T) {
+		consumer := "test-consumer"
+		var firstResult string
+
+		// Call multiple times and verify consistency
+		for i := 0; i < 10; i++ {
+			result := config.GetTokenWithConsumerAffinity(nil, consumer)
+			if i == 0 {
+				firstResult = result
+			}
+			assert.Equal(t, firstResult, result, "Consumer should consistently get the same token")
+		}
+	})
+
+	t.Run("different_consumers_distribute_across_tokens", func(t *testing.T) {
+		// Use multiple consumers and verify they distribute across tokens
+		consumers := []string{"consumer1", "consumer2", "consumer3", "consumer4", "consumer5", "consumer6", "consumer7", "consumer8", "consumer9", "consumer10"}
+		tokenCounts := make(map[string]int)
+
+		for _, consumer := range consumers {
+			token := config.GetTokenWithConsumerAffinity(nil, consumer)
+			tokenCounts[token]++
+		}
+
+		// Verify all tokens returned are valid
+		for token := range tokenCounts {
+			assert.Contains(t, config.apiTokens, token)
+		}
+
+		// With 10 consumers and 5 tokens, we expect some distribution
+		// (not necessarily perfect distribution, but should use multiple tokens)
+		assert.GreaterOrEqual(t, len(tokenCounts), 2, "Should use at least 2 different tokens")
+	})
+
+	t.Run("empty_consumer_returns_empty_string", func(t *testing.T) {
+		config := &ProviderConfig{
+			apiTokens: []string{"token1", "token2"},
+		}
+		result := config.GetTokenWithConsumerAffinity(nil, "")
+		// Empty consumer still returns a token (hash of empty string)
+		assert.NotEmpty(t, result)
+		assert.Contains(t, []string{"token1", "token2"}, result)
+	})
+}
+
+func TestGetTokenWithConsumerAffinity_HashDistribution(t *testing.T) {
+	// Test that the hash function distributes consumers reasonably across tokens
+	config := &ProviderConfig{
+		apiTokens: []string{"token1", "token2", "token3"},
+	}
+
+	// Test specific consumers to verify hash behavior
+	testCases := []struct {
+		consumer    string
+		expectValid bool
+	}{
+		{"user-alice", true},
+		{"user-bob", true},
+		{"user-charlie", true},
+		{"service-api-v1", true},
+		{"service-api-v2", true},
+	}
+
+	for _, tc := range testCases {
+		t.Run("consumer_"+tc.consumer, func(t *testing.T) {
+			result := config.GetTokenWithConsumerAffinity(nil, tc.consumer)
+			assert.True(t, tc.expectValid)
+			assert.Contains(t, config.apiTokens, result)
+		})
+	}
+}
--- a/plugins/wasm-go/extensions/ai-proxy/provider/qwen.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/qwen.go
@@ -334,6 +334,11 @@ func (m *qwenProvider) buildChatCompletionResponse(ctx wrapper.HttpContext, qwen
 }

 func (m *qwenProvider) buildChatCompletionStreamingResponse(ctx wrapper.HttpContext, qwenResponse *qwenTextGenResponse, incrementalStreaming bool) []*chatCompletionResponse {
+	if len(qwenResponse.Output.Choices) == 0 {
+		log.Warnf("qwen response has no choices, request_id: %s", qwenResponse.RequestId)
+		return nil
+	}
+
 	baseMessage := chatCompletionResponse{
 		Id:                qwenResponse.RequestId,
 		Created:           time.Now().UnixMilli() / 1000,
--- a/plugins/wasm-go/extensions/ai-proxy/provider/request_helper.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/request_helper.go
@@ -73,6 +73,73 @@ func insertContextMessage(request *chatCompletionRequest, content string) {
 	}
 }

+// cleanupContextMessages 根据配置的清理命令清理上下文消息
+// 查找最后一个完全匹配任意 cleanupCommands 的 user 消息，将该消息及之前所有非 system 消息清理掉，只保留 system 消息
+func cleanupContextMessages(body []byte, cleanupCommands []string) ([]byte, error) {
+	if len(cleanupCommands) == 0 {
+		return body, nil
+	}
+
+	request := &chatCompletionRequest{}
+	if err := json.Unmarshal(body, request); err != nil {
+		return body, fmt.Errorf("unable to unmarshal request for context cleanup: %v", err)
+	}
+
+	if len(request.Messages) == 0 {
+		return body, nil
+	}
+
+	// 从后往前查找最后一个匹配任意清理命令的 user 消息
+	cleanupIndex := -1
+	for i := len(request.Messages) - 1; i >= 0; i-- {
+		msg := request.Messages[i]
+		if msg.Role == roleUser {
+			content := msg.StringContent()
+			for _, cmd := range cleanupCommands {
+				if content == cmd {
+					cleanupIndex = i
+					break
+				}
+			}
+			if cleanupIndex != -1 {
+				break
+			}
+		}
+	}
+
+	// 没有找到匹配的清理命令
+	if cleanupIndex == -1 {
+		return body, nil
+	}
+
+	log.Debugf("[contextCleanup] found cleanup command at index %d, cleaning up messages", cleanupIndex)
+
+	// 构建新的消息列表：
+	// 1. 保留 cleanupIndex 之前的 system 消息（只保留 system，其他都清理）
+	// 2. 删除 cleanupIndex 位置的清理命令消息
+	// 3. 保留 cleanupIndex 之后的所有消息
+	var newMessages []chatMessage
+
+	// 处理 cleanupIndex 之前的消息，只保留 system
+	for i := 0; i < cleanupIndex; i++ {
+		msg := request.Messages[i]
+		if msg.Role == roleSystem {
+			newMessages = append(newMessages, msg)
+		}
+	}
+
+	// 跳过 cleanupIndex 位置的消息（清理命令本身）
+	// 保留 cleanupIndex 之后的所有消息
+	for i := cleanupIndex + 1; i < len(request.Messages); i++ {
+		newMessages = append(newMessages, request.Messages[i])
+	}
+
+	request.Messages = newMessages
+	log.Debugf("[contextCleanup] messages after cleanup: %d", len(newMessages))
+
+	return json.Marshal(request)
+}
+
 func ReplaceResponseBody(body []byte) error {
 	log.Debugf("response body: %s", string(body))
 	err := proxywasm.ReplaceHttpResponseBody(body)
--- a/plugins/wasm-go/extensions/ai-proxy/provider/request_helper_test.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/request_helper_test.go
@@ -0,0 +1,253 @@
+package provider
+
+import (
+	"encoding/json"
+	"testing"
+
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
+)
+
+func TestCleanupContextMessages(t *testing.T) {
+	t.Run("empty_cleanup_commands", func(t *testing.T) {
+		body := []byte(`{"messages":[{"role":"user","content":"hello"}]}`)
+		result, err := cleanupContextMessages(body, []string{})
+		assert.NoError(t, err)
+		assert.Equal(t, body, result)
+	})
+
+	t.Run("no_matching_command", func(t *testing.T) {
+		body := []byte(`{"messages":[{"role":"system","content":"你是助手"},{"role":"user","content":"hello"}]}`)
+		result, err := cleanupContextMessages(body, []string{"清理上下文", "/clear"})
+		assert.NoError(t, err)
+		assert.Equal(t, body, result)
+	})
+
+	t.Run("cleanup_with_single_command", func(t *testing.T) {
+		input := chatCompletionRequest{
+			Messages: []chatMessage{
+				{Role: "system", Content: "你是一个助手"},
+				{Role: "user", Content: "你好"},
+				{Role: "assistant", Content: "你好！"},
+				{Role: "user", Content: "清理上下文"},
+				{Role: "user", Content: "新问题"},
+			},
+		}
+		body, err := json.Marshal(input)
+		require.NoError(t, err)
+
+		result, err := cleanupContextMessages(body, []string{"清理上下文"})
+		assert.NoError(t, err)
+
+		var output chatCompletionRequest
+		err = json.Unmarshal(result, &output)
+		require.NoError(t, err)
+
+		assert.Len(t, output.Messages, 2)
+		assert.Equal(t, "system", output.Messages[0].Role)
+		assert.Equal(t, "你是一个助手", output.Messages[0].Content)
+		assert.Equal(t, "user", output.Messages[1].Role)
+		assert.Equal(t, "新问题", output.Messages[1].Content)
+	})
+
+	t.Run("cleanup_with_multiple_commands_match_first", func(t *testing.T) {
+		input := chatCompletionRequest{
+			Messages: []chatMessage{
+				{Role: "system", Content: "你是一个助手"},
+				{Role: "user", Content: "你好"},
+				{Role: "assistant", Content: "你好！"},
+				{Role: "user", Content: "/clear"},
+				{Role: "user", Content: "新问题"},
+			},
+		}
+		body, err := json.Marshal(input)
+		require.NoError(t, err)
+
+		result, err := cleanupContextMessages(body, []string{"清理上下文", "/clear", "重新开始"})
+		assert.NoError(t, err)
+
+		var output chatCompletionRequest
+		err = json.Unmarshal(result, &output)
+		require.NoError(t, err)
+
+		assert.Len(t, output.Messages, 2)
+		assert.Equal(t, "system", output.Messages[0].Role)
+		assert.Equal(t, "user", output.Messages[1].Role)
+		assert.Equal(t, "新问题", output.Messages[1].Content)
+	})
+
+	t.Run("cleanup_removes_tool_messages", func(t *testing.T) {
+		input := chatCompletionRequest{
+			Messages: []chatMessage{
+				{Role: "system", Content: "你是一个助手"},
+				{Role: "user", Content: "查天气"},
+				{Role: "assistant", Content: ""},
+				{Role: "tool", Content: "北京 25°C"},
+				{Role: "assistant", Content: "北京今天25度"},
+				{Role: "user", Content: "清理上下文"},
+				{Role: "user", Content: "新问题"},
+			},
+		}
+		body, err := json.Marshal(input)
+		require.NoError(t, err)
+
+		result, err := cleanupContextMessages(body, []string{"清理上下文"})
+		assert.NoError(t, err)
+
+		var output chatCompletionRequest
+		err = json.Unmarshal(result, &output)
+		require.NoError(t, err)
+
+		assert.Len(t, output.Messages, 2)
+		assert.Equal(t, "system", output.Messages[0].Role)
+		assert.Equal(t, "user", output.Messages[1].Role)
+	})
+
+	t.Run("cleanup_keeps_multiple_system_messages", func(t *testing.T) {
+		input := chatCompletionRequest{
+			Messages: []chatMessage{
+				{Role: "system", Content: "系统提示1"},
+				{Role: "system", Content: "系统提示2"},
+				{Role: "user", Content: "你好"},
+				{Role: "assistant", Content: "你好！"},
+				{Role: "user", Content: "清理上下文"},
+				{Role: "user", Content: "新问题"},
+			},
+		}
+		body, err := json.Marshal(input)
+		require.NoError(t, err)
+
+		result, err := cleanupContextMessages(body, []string{"清理上下文"})
+		assert.NoError(t, err)
+
+		var output chatCompletionRequest
+		err = json.Unmarshal(result, &output)
+		require.NoError(t, err)
+
+		assert.Len(t, output.Messages, 3)
+		assert.Equal(t, "system", output.Messages[0].Role)
+		assert.Equal(t, "系统提示1", output.Messages[0].Content)
+		assert.Equal(t, "system", output.Messages[1].Role)
+		assert.Equal(t, "系统提示2", output.Messages[1].Content)
+		assert.Equal(t, "user", output.Messages[2].Role)
+	})
+
+	t.Run("cleanup_finds_last_matching_command", func(t *testing.T) {
+		input := chatCompletionRequest{
+			Messages: []chatMessage{
+				{Role: "system", Content: "你是一个助手"},
+				{Role: "user", Content: "清理上下文"},
+				{Role: "user", Content: "中间问题"},
+				{Role: "assistant", Content: "中间回答"},
+				{Role: "user", Content: "清理上下文"},
+				{Role: "user", Content: "最后问题"},
+			},
+		}
+		body, err := json.Marshal(input)
+		require.NoError(t, err)
+
+		result, err := cleanupContextMessages(body, []string{"清理上下文"})
+		assert.NoError(t, err)
+
+		var output chatCompletionRequest
+		err = json.Unmarshal(result, &output)
+		require.NoError(t, err)
+
+		// 应该匹配最后一个清理命令，保留 system 和 "最后问题"
+		assert.Len(t, output.Messages, 2)
+		assert.Equal(t, "system", output.Messages[0].Role)
+		assert.Equal(t, "user", output.Messages[1].Role)
+		assert.Equal(t, "最后问题", output.Messages[1].Content)
+	})
+
+	t.Run("cleanup_at_end_of_messages", func(t *testing.T) {
+		input := chatCompletionRequest{
+			Messages: []chatMessage{
+				{Role: "system", Content: "你是一个助手"},
+				{Role: "user", Content: "你好"},
+				{Role: "assistant", Content: "你好！"},
+				{Role: "user", Content: "清理上下文"},
+			},
+		}
+		body, err := json.Marshal(input)
+		require.NoError(t, err)
+
+		result, err := cleanupContextMessages(body, []string{"清理上下文"})
+		assert.NoError(t, err)
+
+		var output chatCompletionRequest
+		err = json.Unmarshal(result, &output)
+		require.NoError(t, err)
+
+		// 清理命令在最后，只保留 system
+		assert.Len(t, output.Messages, 1)
+		assert.Equal(t, "system", output.Messages[0].Role)
+	})
+
+	t.Run("cleanup_without_system_message", func(t *testing.T) {
+		input := chatCompletionRequest{
+			Messages: []chatMessage{
+				{Role: "user", Content: "你好"},
+				{Role: "assistant", Content: "你好！"},
+				{Role: "user", Content: "清理上下文"},
+				{Role: "user", Content: "新问题"},
+			},
+		}
+		body, err := json.Marshal(input)
+		require.NoError(t, err)
+
+		result, err := cleanupContextMessages(body, []string{"清理上下文"})
+		assert.NoError(t, err)
+
+		var output chatCompletionRequest
+		err = json.Unmarshal(result, &output)
+		require.NoError(t, err)
+
+		// 没有 system 消息，只保留清理命令之后的消息
+		assert.Len(t, output.Messages, 1)
+		assert.Equal(t, "user", output.Messages[0].Role)
+		assert.Equal(t, "新问题", output.Messages[0].Content)
+	})
+
+	t.Run("cleanup_with_empty_messages", func(t *testing.T) {
+		input := chatCompletionRequest{
+			Messages: []chatMessage{},
+		}
+		body, err := json.Marshal(input)
+		require.NoError(t, err)
+
+		result, err := cleanupContextMessages(body, []string{"清理上下文"})
+		assert.NoError(t, err)
+
+		var output chatCompletionRequest
+		err = json.Unmarshal(result, &output)
+		require.NoError(t, err)
+
+		assert.Len(t, output.Messages, 0)
+	})
+
+	t.Run("cleanup_command_partial_match_not_triggered", func(t *testing.T) {
+		input := chatCompletionRequest{
+			Messages: []chatMessage{
+				{Role: "system", Content: "你是一个助手"},
+				{Role: "user", Content: "请清理上下文吧"},
+				{Role: "assistant", Content: "好的"},
+			},
+		}
+		body, err := json.Marshal(input)
+		require.NoError(t, err)
+
+		result, err := cleanupContextMessages(body, []string{"清理上下文"})
+		assert.NoError(t, err)
+
+		// 部分匹配不应触发清理
+		assert.Equal(t, body, result)
+	})
+
+	t.Run("invalid_json_body", func(t *testing.T) {
+		body := []byte(`invalid json`)
+		result, err := cleanupContextMessages(body, []string{"清理上下文"})
+		assert.Error(t, err)
+		assert.Equal(t, body, result)
+	})
+}
--- a/plugins/wasm-go/extensions/ai-proxy/provider/vertex.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/vertex.go
@@ -21,14 +21,21 @@ import (
 	"github.com/higress-group/wasm-go/pkg/log"
 	"github.com/higress-group/wasm-go/pkg/wrapper"
 	"github.com/tidwall/gjson"
+	"github.com/tidwall/sjson"
 )

 const (
 	vertexAuthDomain = "oauth2.googleapis.com"
 	vertexDomain     = "aiplatform.googleapis.com"
 	// /v1/projects/{PROJECT_ID}/locations/{REGION}/publishers/google/models/{MODEL_ID}:{ACTION}
-	vertexPathTemplate                 = "/v1/projects/%s/locations/%s/publishers/google/models/%s:%s"
-	vertexPathAnthropicTemplate        = "/v1/projects/%s/locations/%s/publishers/anthropic/models/%s:%s"
+	vertexPathTemplate          = "/v1/projects/%s/locations/%s/publishers/google/models/%s:%s"
+	vertexPathAnthropicTemplate = "/v1/projects/%s/locations/%s/publishers/anthropic/models/%s:%s"
+	// Express Mode 路径模板 (不含 project/location)
+	vertexExpressPathTemplate          = "/v1/publishers/google/models/%s:%s"
+	vertexExpressPathAnthropicTemplate = "/v1/publishers/anthropic/models/%s:%s"
+	// OpenAI-compatible endpoint 路径模板
+	// /v1beta1/projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/openapi/chat/completions
+	vertexOpenAICompatiblePathTemplate = "/v1beta1/projects/%s/locations/%s/endpoints/openapi/chat/completions"
 	vertexChatCompletionAction         = "generateContent"
 	vertexChatCompletionStreamAction   = "streamGenerateContent?alt=sse"
 	vertexAnthropicMessageAction       = "rawPredict"
@@ -36,12 +43,44 @@ const (
 	vertexEmbeddingAction              = "predict"
 	vertexGlobalRegion                 = "global"
 	contextClaudeMarker                = "isClaudeRequest"
+	contextOpenAICompatibleMarker      = "isOpenAICompatibleRequest"
+	contextVertexRawMarker             = "isVertexRawRequest"
 	vertexAnthropicVersion             = "vertex-2023-10-16"
 )

+// vertexRawPathRegex 匹配原生 Vertex AI REST API 路径
+// 格式: [任意前缀]/{api-version}/projects/{project}/locations/{location}/publishers/{publisher}/models/{model}:{action}
+// 允许任意 basePath 前缀，兼容 basePathHandling 配置
+var vertexRawPathRegex = regexp.MustCompile(`^.*/([^/]+)/projects/([^/]+)/locations/([^/]+)/publishers/([^/]+)/models/([^/:]+):([^/?]+)`)
+
 type vertexProviderInitializer struct{}

 func (v *vertexProviderInitializer) ValidateConfig(config *ProviderConfig) error {
+	// Express Mode: 如果配置了 apiTokens，则使用 API Key 认证
+	if len(config.apiTokens) > 0 {
+		// Express Mode 与 OpenAI 兼容模式互斥
+		if config.vertexOpenAICompatible {
+			return errors.New("vertexOpenAICompatible is not compatible with Express Mode (apiTokens)")
+		}
+		// Express Mode 不需要其他配置
+		return nil
+	}
+
+	// OpenAI 兼容模式: 需要 OAuth 认证配置
+	if config.vertexOpenAICompatible {
+		if config.vertexAuthKey == "" {
+			return errors.New("missing vertexAuthKey in vertex provider config for OpenAI compatible mode")
+		}
+		if config.vertexRegion == "" || config.vertexProjectId == "" {
+			return errors.New("missing vertexRegion or vertexProjectId in vertex provider config for OpenAI compatible mode")
+		}
+		if config.vertexAuthServiceName == "" {
+			return errors.New("missing vertexAuthServiceName in vertex provider config for OpenAI compatible mode")
+		}
+		return nil
+	}
+
+	// 标准模式: 保持原有验证逻辑
 	if config.vertexAuthKey == "" {
 		return errors.New("missing vertexAuthKey in vertex provider config")
 	}
@@ -56,26 +95,47 @@ func (v *vertexProviderInitializer) ValidateConfig(config *ProviderConfig) error

 func (v *vertexProviderInitializer) DefaultCapabilities() map[string]string {
 	return map[string]string{
-		string(ApiNameChatCompletion): vertexPathTemplate,
-		string(ApiNameEmbeddings):     vertexPathTemplate,
+		string(ApiNameChatCompletion):  vertexPathTemplate,
+		string(ApiNameEmbeddings):      vertexPathTemplate,
+		string(ApiNameImageGeneration): vertexPathTemplate,
+		string(ApiNameVertexRaw):       "", // 空字符串表示保持原路径，不做路径转换
 	}
 }

 func (v *vertexProviderInitializer) CreateProvider(config ProviderConfig) (Provider, error) {
 	config.setDefaultCapabilities(v.DefaultCapabilities())
-	return &vertexProvider{
-		config: config,
-		client: wrapper.NewClusterClient(wrapper.DnsCluster{
-			Domain:      vertexAuthDomain,
-			ServiceName: config.vertexAuthServiceName,
-			Port:        443,
-		}),
+
+	provider := &vertexProvider{
+		config:       config,
 		contextCache: createContextCache(&config),
 		claude: &claudeProvider{
 			config:       config,
 			contextCache: createContextCache(&config),
 		},
-	}, nil
+	}
+
+	// 仅标准模式需要 OAuth 客户端（Express Mode 通过 apiTokens 配置）
+	if !provider.isExpressMode() {
+		provider.client = wrapper.NewClusterClient(wrapper.DnsCluster{
+			Domain:      vertexAuthDomain,
+			ServiceName: config.vertexAuthServiceName,
+			Port:        443,
+		})
+	}
+
+	return provider, nil
+}
+
+// isExpressMode 检测是否启用 Express Mode
+// 如果配置了 apiTokens，则使用 Express Mode（API Key 认证）
+func (v *vertexProvider) isExpressMode() bool {
+	return len(v.config.apiTokens) > 0
+}
+
+// isOpenAICompatibleMode 检测是否启用 OpenAI 兼容模式
+// 使用 Vertex AI 的 OpenAI-compatible Chat Completions API
+func (v *vertexProvider) isOpenAICompatibleMode() bool {
+	return v.config.vertexOpenAICompatible
 }

 type vertexProvider struct {
@@ -90,6 +150,12 @@ func (v *vertexProvider) GetProviderType() string {
 }

 func (v *vertexProvider) GetApiName(path string) ApiName {
+	// 优先匹配原生 Vertex AI REST API 路径，支持任意 basePath 前缀
+	// 格式: [任意前缀]/{api-version}/projects/{project}/locations/{location}/publishers/{publisher}/models/{model}:{action}
+	// 必须在其他 action 检查之前，因为 :predict、:generateContent 等 action 会被其他规则匹配
+	if vertexRawPathRegex.MatchString(path) {
+		return ApiNameVertexRaw
+	}
 	if strings.HasSuffix(path, vertexChatCompletionAction) || strings.HasSuffix(path, vertexChatCompletionStreamAction) {
 		return ApiNameChatCompletion
 	}
@@ -106,11 +172,19 @@ func (v *vertexProvider) OnRequestHeaders(ctx wrapper.HttpContext, apiName ApiNa

 func (v *vertexProvider) TransformRequestHeaders(ctx wrapper.HttpContext, apiName ApiName, headers http.Header) {
 	var finalVertexDomain string
-	if v.config.vertexRegion != vertexGlobalRegion {
-		finalVertexDomain = fmt.Sprintf("%s-%s", v.config.vertexRegion, vertexDomain)
-	} else {
+
+	if v.isExpressMode() {
+		// Express Mode: 固定域名，不带 region 前缀
 		finalVertexDomain = vertexDomain
+	} else {
+		// 标准模式: 带 region 前缀
+		if v.config.vertexRegion != vertexGlobalRegion {
+			finalVertexDomain = fmt.Sprintf("%s-%s", v.config.vertexRegion, vertexDomain)
+		} else {
+			finalVertexDomain = vertexDomain
+		}
 	}
+
 	util.OverwriteRequestHostHeader(headers, finalVertexDomain)
 }

@@ -150,12 +224,66 @@ func (v *vertexProvider) OnRequestBody(ctx wrapper.HttpContext, apiName ApiName,
 	if !v.config.isSupportedAPI(apiName) {
 		return types.ActionContinue, errUnsupportedApiName
 	}
+
+	// Vertex Raw 模式: 透传请求体，只做 OAuth 认证
+	// 用于直接访问 Vertex AI REST API，不做协议转换
+	// 注意：此检查必须在 IsOriginal() 之前，因为 Vertex Raw 模式通常与 original 协议一起使用
+	if apiName == ApiNameVertexRaw {
+		ctx.SetContext(contextVertexRawMarker, true)
+		// Express Mode 不需要 OAuth 认证
+		if v.isExpressMode() {
+			return types.ActionContinue, nil
+		}
+		// 标准模式需要获取 OAuth token
+		cached, err := v.getToken()
+		if cached {
+			return types.ActionContinue, nil
+		}
+		if err == nil {
+			return types.ActionPause, nil
+		}
+		return types.ActionContinue, err
+	}
+
 	if v.config.IsOriginal() {
 		return types.ActionContinue, nil
 	}
+
 	headers := util.GetRequestHeaders()
+
+	// OpenAI 兼容模式: 不转换请求体，只设置路径和进行模型映射
+	if v.isOpenAICompatibleMode() {
+		ctx.SetContext(contextOpenAICompatibleMarker, true)
+		body, err := v.onOpenAICompatibleRequestBody(ctx, apiName, body, headers)
+		headers.Set("Content-Length", fmt.Sprint(len(body)))
+		util.ReplaceRequestHeaders(headers)
+		_ = proxywasm.ReplaceHttpRequestBody(body)
+		if err != nil {
+			return types.ActionContinue, err
+		}
+		// OpenAI 兼容模式需要 OAuth token
+		cached, err := v.getToken()
+		if cached {
+			return types.ActionContinue, nil
+		}
+		if err == nil {
+			return types.ActionPause, nil
+		}
+		return types.ActionContinue, err
+	}
+
 	body, err := v.TransformRequestBodyHeaders(ctx, apiName, body, headers)
 	headers.Set("Content-Length", fmt.Sprint(len(body)))
+
+	if v.isExpressMode() {
+		// Express Mode: 不需要 Authorization header，API Key 已在 URL 中
+		headers.Del("Authorization")
+		util.ReplaceRequestHeaders(headers)
+		_ = proxywasm.ReplaceHttpRequestBody(body)
+		return types.ActionContinue, err
+	}
+
+	// 标准模式: 需要获取 OAuth token
 	util.ReplaceRequestHeaders(headers)
 	_ = proxywasm.ReplaceHttpRequestBody(body)
 	if err != nil {
@@ -172,13 +300,44 @@ func (v *vertexProvider) OnRequestBody(ctx wrapper.HttpContext, apiName ApiName,
 }

 func (v *vertexProvider) TransformRequestBodyHeaders(ctx wrapper.HttpContext, apiName ApiName, body []byte, headers http.Header) ([]byte, error) {
-	if apiName == ApiNameChatCompletion {
+	switch apiName {
+	case ApiNameChatCompletion:
 		return v.onChatCompletionRequestBody(ctx, body, headers)
-	} else {
+	case ApiNameEmbeddings:
 		return v.onEmbeddingsRequestBody(ctx, body, headers)
+	case ApiNameImageGeneration:
+		return v.onImageGenerationRequestBody(ctx, body, headers)
+	default:
+		return body, nil
 	}
 }

+// onOpenAICompatibleRequestBody 处理 OpenAI 兼容模式的请求
+// 不转换请求体格式，只进行模型映射和路径设置
+func (v *vertexProvider) onOpenAICompatibleRequestBody(ctx wrapper.HttpContext, apiName ApiName, body []byte, headers http.Header) ([]byte, error) {
+	if apiName != ApiNameChatCompletion {
+		return nil, fmt.Errorf("OpenAI compatible mode only supports chat completions API")
+	}
+
+	// 解析请求进行模型映射
+	request := &chatCompletionRequest{}
+	if err := v.config.parseRequestAndMapModel(ctx, request, body); err != nil {
+		return nil, err
+	}
+
+	// 设置 OpenAI 兼容端点路径
+	path := v.getOpenAICompatibleRequestPath()
+	util.OverwriteRequestPathHeader(headers, path)
+
+	// 如果模型被映射，需要更新请求体中的模型字段
+	if request.Model != "" {
+		body, _ = sjson.SetBytes(body, "model", request.Model)
+	}
+
+	// 保持 OpenAI 格式，直接返回（可能更新了模型字段）
+	return body, nil
+}
+
 func (v *vertexProvider) onChatCompletionRequestBody(ctx wrapper.HttpContext, body []byte, headers http.Header) ([]byte, error) {
 	request := &chatCompletionRequest{}
 	err := v.config.parseRequestAndMapModel(ctx, request, body)
@@ -219,7 +378,126 @@ func (v *vertexProvider) onEmbeddingsRequestBody(ctx wrapper.HttpContext, body [
 	return json.Marshal(vertexRequest)
 }

+func (v *vertexProvider) onImageGenerationRequestBody(ctx wrapper.HttpContext, body []byte, headers http.Header) ([]byte, error) {
+	request := &imageGenerationRequest{}
+	if err := v.config.parseRequestAndMapModel(ctx, request, body); err != nil {
+		return nil, err
+	}
+	// 图片生成不使用流式端点，需要完整响应
+	path := v.getRequestPath(ApiNameImageGeneration, request.Model, false)
+	util.OverwriteRequestPathHeader(headers, path)
+
+	vertexRequest := v.buildVertexImageGenerationRequest(request)
+	return json.Marshal(vertexRequest)
+}
+
+func (v *vertexProvider) buildVertexImageGenerationRequest(request *imageGenerationRequest) *vertexChatRequest {
+	// 构建安全设置
+	safetySettings := make([]vertexChatSafetySetting, 0)
+	for category, threshold := range v.config.geminiSafetySetting {
+		safetySettings = append(safetySettings, vertexChatSafetySetting{
+			Category:  category,
+			Threshold: threshold,
+		})
+	}
+
+	// 解析尺寸参数
+	aspectRatio, imageSize := v.parseImageSize(request.Size)
+
+	// 确定输出 MIME 类型
+	mimeType := "image/png"
+	if request.OutputFormat != "" {
+		switch request.OutputFormat {
+		case "jpeg", "jpg":
+			mimeType = "image/jpeg"
+		case "webp":
+			mimeType = "image/webp"
+		default:
+			mimeType = "image/png"
+		}
+	}
+
+	vertexRequest := &vertexChatRequest{
+		Contents: []vertexChatContent{{
+			Role: roleUser,
+			Parts: []vertexPart{{
+				Text: request.Prompt,
+			}},
+		}},
+		SafetySettings: safetySettings,
+		GenerationConfig: vertexChatGenerationConfig{
+			Temperature:        1.0,
+			MaxOutputTokens:    32768,
+			ResponseModalities: []string{"TEXT", "IMAGE"},
+			ImageConfig: &vertexImageConfig{
+				AspectRatio: aspectRatio,
+				ImageSize:   imageSize,
+				ImageOutputOptions: &vertexImageOutputOptions{
+					MimeType: mimeType,
+				},
+				PersonGeneration: "ALLOW_ALL",
+			},
+		},
+	}
+
+	return vertexRequest
+}
+
+// parseImageSize 解析 OpenAI 格式的尺寸字符串（如 "1024x1024"）为 Vertex AI 的 aspectRatio 和 imageSize
+// Vertex AI 支持的 aspectRatio: 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
+// Vertex AI 支持的 imageSize: 1k, 2k, 4k
+func (v *vertexProvider) parseImageSize(size string) (aspectRatio, imageSize string) {
+	// 默认值
+	aspectRatio = "1:1"
+	imageSize = "1k"
+
+	if size == "" {
+		return
+	}
+
+	// 预定义的尺寸映射（OpenAI 标准尺寸）
+	sizeMapping := map[string]struct {
+		aspectRatio string
+		imageSize   string
+	}{
+		// OpenAI DALL-E 标准尺寸
+		"256x256":   {"1:1", "1k"},
+		"512x512":   {"1:1", "1k"},
+		"1024x1024": {"1:1", "1k"},
+		"1792x1024": {"16:9", "2k"},
+		"1024x1792": {"9:16", "2k"},
+		// 扩展尺寸支持
+		"2048x2048": {"1:1", "2k"},
+		"4096x4096": {"1:1", "4k"},
+		// 3:2 和 2:3 比例
+		"1536x1024": {"3:2", "2k"},
+		"1024x1536": {"2:3", "2k"},
+		// 4:3 和 3:4 比例
+		"1024x768":  {"4:3", "1k"},
+		"768x1024":  {"3:4", "1k"},
+		"1365x1024": {"4:3", "1k"},
+		"1024x1365": {"3:4", "1k"},
+		// 5:4 和 4:5 比例
+		"1280x1024": {"5:4", "1k"},
+		"1024x1280": {"4:5", "1k"},
+		// 21:9 超宽比例
+		"2560x1080": {"21:9", "2k"},
+	}
+
+	if mapping, ok := sizeMapping[size]; ok {
+		return mapping.aspectRatio, mapping.imageSize
+	}
+
+	return
+}
+
 func (v *vertexProvider) OnStreamingResponseBody(ctx wrapper.HttpContext, name ApiName, chunk []byte, isLastChunk bool) ([]byte, error) {
+	// OpenAI 兼容模式: 透传响应，但需要解码 Unicode 转义序列
+	// Vertex AI OpenAI-compatible API 返回 ASCII-safe JSON，将非 ASCII 字符编码为 \uXXXX
+	if ctx.GetContext(contextOpenAICompatibleMarker) != nil && ctx.GetContext(contextOpenAICompatibleMarker).(bool) {
+		return util.DecodeUnicodeEscapesInSSE(chunk), nil
+	}
+
 	if ctx.GetContext(contextClaudeMarker) != nil && ctx.GetContext(contextClaudeMarker).(bool) {
 		return v.claude.OnStreamingResponseBody(ctx, name, chunk, isLastChunk)
 	}
@@ -260,13 +538,25 @@ func (v *vertexProvider) OnStreamingResponseBody(ctx wrapper.HttpContext, name A
 }

 func (v *vertexProvider) TransformResponseBody(ctx wrapper.HttpContext, apiName ApiName, body []byte) ([]byte, error) {
+	// OpenAI 兼容模式: 透传响应，但需要解码 Unicode 转义序列
+	// Vertex AI OpenAI-compatible API 返回 ASCII-safe JSON，将非 ASCII 字符编码为 \uXXXX
+	if ctx.GetContext(contextOpenAICompatibleMarker) != nil && ctx.GetContext(contextOpenAICompatibleMarker).(bool) {
+		return util.DecodeUnicodeEscapes(body), nil
+	}
+
 	if ctx.GetContext(contextClaudeMarker) != nil && ctx.GetContext(contextClaudeMarker).(bool) {
 		return v.claude.TransformResponseBody(ctx, apiName, body)
 	}
-	if apiName == ApiNameChatCompletion {
+
+	switch apiName {
+	case ApiNameChatCompletion:
 		return v.onChatCompletionResponseBody(ctx, body)
-	} else {
+	case ApiNameEmbeddings:
 		return v.onEmbeddingsResponseBody(ctx, body)
+	case ApiNameImageGeneration:
+		return v.onImageGenerationResponseBody(ctx, body)
+	default:
+		return body, nil
 	}
 }

@@ -359,6 +649,54 @@ func (v *vertexProvider) buildEmbeddingsResponse(ctx wrapper.HttpContext, vertex
 	return &response
 }

+func (v *vertexProvider) onImageGenerationResponseBody(ctx wrapper.HttpContext, body []byte) ([]byte, error) {
+	// 使用 gjson 直接提取字段，避免完整反序列化大型 base64 数据
+	// 这样可以显著减少内存分配和复制次数
+	response := v.buildImageGenerationResponseFromJSON(body)
+	return json.Marshal(response)
+}
+
+// buildImageGenerationResponseFromJSON 使用 gjson 从原始 JSON 中提取图片生成响应
+// 相比 json.Unmarshal 完整反序列化，这种方式内存效率更高
+func (v *vertexProvider) buildImageGenerationResponseFromJSON(body []byte) *imageGenerationResponse {
+	result := gjson.ParseBytes(body)
+	data := make([]imageGenerationData, 0)
+
+	// 遍历所有 candidates，提取图片数据
+	candidates := result.Get("candidates")
+	candidates.ForEach(func(_, candidate gjson.Result) bool {
+		parts := candidate.Get("content.parts")
+		parts.ForEach(func(_, part gjson.Result) bool {
+			// 跳过思考过程 (thought: true)
+			if part.Get("thought").Bool() {
+				return true
+			}
+			// 提取图片数据
+			inlineData := part.Get("inlineData.data")
+			if inlineData.Exists() && inlineData.String() != "" {
+				data = append(data, imageGenerationData{
+					B64: inlineData.String(),
+				})
+			}
+			return true
+		})
+		return true
+	})
+
+	// 提取 usage 信息
+	usage := result.Get("usageMetadata")
+
+	return &imageGenerationResponse{
+		Created: time.Now().UnixMilli() / 1000,
+		Data:    data,
+		Usage: &imageGenerationUsage{
+			TotalTokens:  int(usage.Get("totalTokenCount").Int()),
+			InputTokens:  int(usage.Get("promptTokenCount").Int()),
+			OutputTokens: int(usage.Get("candidatesTokenCount").Int()),
+		},
+	}
+}
+
 func (v *vertexProvider) buildChatCompletionStreamResponse(ctx wrapper.HttpContext, vertexResp *vertexChatResponse) *chatCompletionResponse {
 	var choice chatCompletionChoice
 	choice.Delta = &chatMessage{}
@@ -422,19 +760,62 @@ func (v *vertexProvider) getAhthropicRequestPath(apiName ApiName, modelId string
 	} else {
 		action = vertexAnthropicMessageAction
 	}
-	return fmt.Sprintf(vertexPathAnthropicTemplate, v.config.vertexProjectId, v.config.vertexRegion, modelId, action)
+
+	if v.isExpressMode() {
+		// Express Mode: 简化路径 + API Key 参数
+		basePath := fmt.Sprintf(vertexExpressPathAnthropicTemplate, modelId, action)
+		apiKey := v.config.GetRandomToken()
+		// 如果 action 已经包含 ?，使用 & 拼接
+		var fullPath string
+		if strings.Contains(action, "?") {
+			fullPath = basePath + "&key=" + apiKey
+		} else {
+			fullPath = basePath + "?key=" + apiKey
+		}
+		return fullPath
+	}
+
+	path := fmt.Sprintf(vertexPathAnthropicTemplate, v.config.vertexProjectId, v.config.vertexRegion, modelId, action)
+	return path
 }

 func (v *vertexProvider) getRequestPath(apiName ApiName, modelId string, stream bool) string {
 	action := ""
-	if apiName == ApiNameEmbeddings {
+	switch apiName {
+	case ApiNameEmbeddings:
 		action = vertexEmbeddingAction
-	} else if stream {
-		action = vertexChatCompletionStreamAction
-	} else {
+	case ApiNameImageGeneration:
+		// 图片生成使用非流式端点，需要完整响应
 		action = vertexChatCompletionAction
+	default:
+		if stream {
+			action = vertexChatCompletionStreamAction
+		} else {
+			action = vertexChatCompletionAction
+		}
 	}
-	return fmt.Sprintf(vertexPathTemplate, v.config.vertexProjectId, v.config.vertexRegion, modelId, action)
+
+	if v.isExpressMode() {
+		// Express Mode: 简化路径 + API Key 参数
+		basePath := fmt.Sprintf(vertexExpressPathTemplate, modelId, action)
+		apiKey := v.config.GetRandomToken()
+		// 如果 action 已经包含 ?（如 streamGenerateContent?alt=sse），使用 & 拼接
+		var fullPath string
+		if strings.Contains(action, "?") {
+			fullPath = basePath + "&key=" + apiKey
+		} else {
+			fullPath = basePath + "?key=" + apiKey
+		}
+		return fullPath
+	}
+
+	path := fmt.Sprintf(vertexPathTemplate, v.config.vertexProjectId, v.config.vertexRegion, modelId, action)
+	return path
+}
+
+// getOpenAICompatibleRequestPath 获取 OpenAI 兼容模式的请求路径
+func (v *vertexProvider) getOpenAICompatibleRequestPath() string {
+	return fmt.Sprintf(vertexOpenAICompatiblePathTemplate, v.config.vertexProjectId, v.config.vertexRegion)
 }

 func (v *vertexProvider) buildVertexChatRequest(request *chatCompletionRequest) *vertexChatRequest {
@@ -521,7 +902,7 @@ func (v *vertexProvider) buildVertexChatRequest(request *chatCompletionRequest)
 						})
 					}
 				case contentTypeImageUrl:
-					vpart, err := convertImageContent(part.ImageUrl.Url)
+					vpart, err := convertMediaContent(part.ImageUrl.Url)
 					if err != nil {
 						log.Errorf("unable to convert image content: %v", err)
 					} else {
@@ -636,12 +1017,25 @@ type vertexChatSafetySetting struct {
 }

 type vertexChatGenerationConfig struct {
-	Temperature     float64              `json:"temperature,omitempty"`
-	TopP            float64              `json:"topP,omitempty"`
-	TopK            int                  `json:"topK,omitempty"`
-	CandidateCount  int                  `json:"candidateCount,omitempty"`
-	MaxOutputTokens int                  `json:"maxOutputTokens,omitempty"`
-	ThinkingConfig  vertexThinkingConfig `json:"thinkingConfig,omitempty"`
+	Temperature        float64              `json:"temperature,omitempty"`
+	TopP               float64              `json:"topP,omitempty"`
+	TopK               int                  `json:"topK,omitempty"`
+	CandidateCount     int                  `json:"candidateCount,omitempty"`
+	MaxOutputTokens    int                  `json:"maxOutputTokens,omitempty"`
+	ThinkingConfig     vertexThinkingConfig `json:"thinkingConfig,omitempty"`
+	ResponseModalities []string             `json:"responseModalities,omitempty"`
+	ImageConfig        *vertexImageConfig   `json:"imageConfig,omitempty"`
+}
+
+type vertexImageConfig struct {
+	AspectRatio        string                    `json:"aspectRatio,omitempty"`
+	ImageSize          string                    `json:"imageSize,omitempty"`
+	ImageOutputOptions *vertexImageOutputOptions `json:"imageOutputOptions,omitempty"`
+	PersonGeneration   string                    `json:"personGeneration,omitempty"`
+}
+
+type vertexImageOutputOptions struct {
+	MimeType string `json:"mimeType,omitempty"`
 }

 type vertexThinkingConfig struct {
@@ -852,32 +1246,106 @@ func setCachedAccessToken(key string, accessToken string, expireTime int64) erro
 	return proxywasm.SetSharedData(key, data, cas)
 }

-func convertImageContent(imageUrl string) (vertexPart, error) {
+// convertMediaContent 将 OpenAI 格式的媒体 URL 转换为 Vertex AI 格式
+// 支持图片、视频、音频等多种媒体类型
+func convertMediaContent(mediaUrl string) (vertexPart, error) {
 	part := vertexPart{}
-	if strings.HasPrefix(imageUrl, "http") {
-		arr := strings.Split(imageUrl, ".")
-		mimeType := "image/" + arr[len(arr)-1]
+	if strings.HasPrefix(mediaUrl, "http") {
+		mimeType := detectMimeTypeFromURL(mediaUrl)
 		part.FileData = &fileData{
 			MimeType: mimeType,
-			FileUri:  imageUrl,
+			FileUri:  mediaUrl,
 		}
 		return part, nil
 	} else {
+		// Base64 data URL 格式: data:<mimeType>;base64,<data>
 		re := regexp.MustCompile(`^data:([^;]+);base64,`)
-		matches := re.FindStringSubmatch(imageUrl)
+		matches := re.FindStringSubmatch(mediaUrl)
 		if len(matches) < 2 {
-			return part, fmt.Errorf("invalid base64 format")
+			return part, fmt.Errorf("invalid base64 format, expected data:<mimeType>;base64,<data>")
 		}

-		mimeType := matches[1] // e.g. image/png
+		mimeType := matches[1] // e.g. image/png, video/mp4, audio/mp3
 		parts := strings.Split(mimeType, "/")
 		if len(parts) < 2 {
-			return part, fmt.Errorf("invalid mimeType")
+			return part, fmt.Errorf("invalid mimeType: %s", mimeType)
 		}
 		part.InlineData = &blob{
 			MimeType: mimeType,
-			Data:     strings.TrimPrefix(imageUrl, matches[0]),
+			Data:     strings.TrimPrefix(mediaUrl, matches[0]),
 		}
 		return part, nil
 	}
 }
+
+// detectMimeTypeFromURL 根据 URL 的文件扩展名检测 MIME 类型
+// 支持图片、视频、音频和文档类型
+func detectMimeTypeFromURL(url string) string {
+	// 移除查询参数和片段标识符
+	if idx := strings.Index(url, "?"); idx != -1 {
+		url = url[:idx]
+	}
+	if idx := strings.Index(url, "#"); idx != -1 {
+		url = url[:idx]
+	}
+
+	// 获取最后一个路径段
+	lastSlash := strings.LastIndex(url, "/")
+	if lastSlash != -1 {
+		url = url[lastSlash+1:]
+	}
+
+	// 获取扩展名
+	lastDot := strings.LastIndex(url, ".")
+	if lastDot == -1 || lastDot == len(url)-1 {
+		return "application/octet-stream"
+	}
+	ext := strings.ToLower(url[lastDot+1:])
+
+	// 扩展名到 MIME 类型的映射
+	mimeTypes := map[string]string{
+		// 图片格式
+		"jpg":  "image/jpeg",
+		"jpeg": "image/jpeg",
+		"png":  "image/png",
+		"gif":  "image/gif",
+		"webp": "image/webp",
+		"bmp":  "image/bmp",
+		"svg":  "image/svg+xml",
+		"ico":  "image/x-icon",
+		"heic": "image/heic",
+		"heif": "image/heif",
+		"tiff": "image/tiff",
+		"tif":  "image/tiff",
+		// 视频格式
+		"mp4":  "video/mp4",
+		"mpeg": "video/mpeg",
+		"mpg":  "video/mpeg",
+		"mov":  "video/quicktime",
+		"avi":  "video/x-msvideo",
+		"wmv":  "video/x-ms-wmv",
+		"webm": "video/webm",
+		"mkv":  "video/x-matroska",
+		"flv":  "video/x-flv",
+		"3gp":  "video/3gpp",
+		"3g2":  "video/3gpp2",
+		"m4v":  "video/x-m4v",
+		// 音频格式
+		"mp3":  "audio/mpeg",
+		"wav":  "audio/wav",
+		"ogg":  "audio/ogg",
+		"flac": "audio/flac",
+		"aac":  "audio/aac",
+		"m4a":  "audio/mp4",
+		"wma":  "audio/x-ms-wma",
+		"opus": "audio/opus",
+		// 文档格式
+		"pdf": "application/pdf",
+	}
+
+	if mimeType, ok := mimeTypes[ext]; ok {
+		return mimeType
+	}
+
+	return "application/octet-stream"
+}
--- a/plugins/wasm-go/extensions/ai-proxy/provider/zhipuai.go
+++ b/plugins/wasm-go/extensions/ai-proxy/provider/zhipuai.go
@@ -8,11 +8,15 @@ import (
 	"github.com/alibaba/higress/plugins/wasm-go/extensions/ai-proxy/util"
 	"github.com/higress-group/proxy-wasm-go-sdk/proxywasm/types"
 	"github.com/higress-group/wasm-go/pkg/wrapper"
+	"github.com/tidwall/gjson"
+	"github.com/tidwall/sjson"
 )

 const (
-	zhipuAiDomain                = "open.bigmodel.cn"
+	zhipuAiDefaultDomain         = "open.bigmodel.cn"
+	zhipuAiInternationalDomain   = "api.z.ai"
 	zhipuAiChatCompletionPath    = "/api/paas/v4/chat/completions"
+	zhipuAiCodePlanPath          = "/api/coding/paas/v4/chat/completions"
 	zhipuAiEmbeddingsPath        = "/api/paas/v4/embeddings"
 	zhipuAiAnthropicMessagesPath = "/api/anthropic/v1/messages"
 )
@@ -26,16 +30,20 @@ func (m *zhipuAiProviderInitializer) ValidateConfig(config *ProviderConfig) erro
 	return nil
 }

-func (m *zhipuAiProviderInitializer) DefaultCapabilities() map[string]string {
+func (m *zhipuAiProviderInitializer) DefaultCapabilities(codePlanMode bool) map[string]string {
+	chatPath := zhipuAiChatCompletionPath
+	if codePlanMode {
+		chatPath = zhipuAiCodePlanPath
+	}
 	return map[string]string{
-		string(ApiNameChatCompletion): zhipuAiChatCompletionPath,
+		string(ApiNameChatCompletion): chatPath,
 		string(ApiNameEmbeddings):     zhipuAiEmbeddingsPath,
 		// string(ApiNameAnthropicMessages): zhipuAiAnthropicMessagesPath,
 	}
 }

 func (m *zhipuAiProviderInitializer) CreateProvider(config ProviderConfig) (Provider, error) {
-	config.setDefaultCapabilities(m.DefaultCapabilities())
+	config.setDefaultCapabilities(m.DefaultCapabilities(config.zhipuCodePlanMode))
 	return &zhipuAiProvider{
 		config:       config,
 		contextCache: createContextCache(&config),
@@ -65,13 +73,35 @@ func (m *zhipuAiProvider) OnRequestBody(ctx wrapper.HttpContext, apiName ApiName

 func (m *zhipuAiProvider) TransformRequestHeaders(ctx wrapper.HttpContext, apiName ApiName, headers http.Header) {
 	util.OverwriteRequestPathHeaderByCapability(headers, string(apiName), m.config.capabilities)
-	util.OverwriteRequestHostHeader(headers, zhipuAiDomain)
+	// Use configured domain or default to China domain
+	domain := m.config.zhipuDomain
+	if domain == "" {
+		domain = zhipuAiDefaultDomain
+	}
+	util.OverwriteRequestHostHeader(headers, domain)
 	util.OverwriteRequestAuthorizationHeader(headers, "Bearer "+m.config.GetApiTokenInUse(ctx))
 	headers.Del("Content-Length")
 }

+func (m *zhipuAiProvider) TransformRequestBody(ctx wrapper.HttpContext, apiName ApiName, body []byte) ([]byte, error) {
+	if apiName != ApiNameChatCompletion {
+		return m.config.defaultTransformRequestBody(ctx, apiName, body)
+	}
+
+	// Check if reasoning_effort is set
+	reasoningEffort := gjson.GetBytes(body, "reasoning_effort").String()
+	if reasoningEffort != "" {
+		// Add thinking config for ZhipuAI
+		body, _ = sjson.SetBytes(body, "thinking", map[string]string{"type": "enabled"})
+		// Remove reasoning_effort field as ZhipuAI doesn't recognize it
+		body, _ = sjson.DeleteBytes(body, "reasoning_effort")
+	}
+
+	return m.config.defaultTransformRequestBody(ctx, apiName, body)
+}
+
 func (m *zhipuAiProvider) GetApiName(path string) ApiName {
-	if strings.Contains(path, zhipuAiChatCompletionPath) {
+	if strings.Contains(path, zhipuAiChatCompletionPath) || strings.Contains(path, zhipuAiCodePlanPath) {
 		return ApiNameChatCompletion
 	}
 	if strings.Contains(path, zhipuAiEmbeddingsPath) {
--- a/plugins/wasm-go/extensions/ai-proxy/test/azure.go
+++ b/plugins/wasm-go/extensions/ai-proxy/test/azure.go
@@ -343,7 +343,7 @@ func RunAzureOnHttpRequestHeadersTests(t *testing.T) {
 			// 验证Path是否被正确处理
 			pathValue, hasPath := test.GetHeaderValue(requestHeaders, ":path")
 			require.True(t, hasPath, "Path header should exist")
-			require.Contains(t, pathValue, "/openai/deployments/test-deployment/chat/completions", "Path should contain Azure deployment path")
+			require.Equal(t, "/openai/deployments/test-deployment/chat/completions?api-version=2024-02-15-preview", pathValue, "Path should equal Azure deployment path")

 			// 验证Content-Length是否被删除
 			_, hasContentLength := test.GetHeaderValue(requestHeaders, "Content-Length")
@@ -443,8 +443,7 @@ func RunAzureOnHttpRequestBodyTests(t *testing.T) {
 			requestHeaders := host.GetRequestHeaders()
 			pathValue, hasPath := test.GetHeaderValue(requestHeaders, ":path")
 			require.True(t, hasPath, "Path header should exist")
-			require.Contains(t, pathValue, "/openai/deployments/test-deployment/chat/completions", "Path should contain Azure deployment path")
-			require.Contains(t, pathValue, "api-version=2024-02-15-preview", "Path should contain API version")
+			require.Equal(t, pathValue, "/openai/deployments/test-deployment/chat/completions?api-version=2024-02-15-preview", "Path should contain Azure deployment path")
 		})

 		// 测试Azure OpenAI请求体处理（不同模型）
@@ -577,7 +576,7 @@ func RunAzureOnHttpRequestBodyTests(t *testing.T) {
 			requestHeaders := host.GetRequestHeaders()
 			pathValue, hasPath := test.GetHeaderValue(requestHeaders, ":path")
 			require.True(t, hasPath, "Path header should exist")
-			require.Contains(t, pathValue, "/openai/deployments/deployment-only/chat/completions", "Path should use default deployment")
+			require.Equal(t, pathValue, "/openai/deployments/deployment-only/chat/completions?api-version=2024-02-15-preview", "Path should use default deployment")
 		})

 		// 测试Azure OpenAI请求体处理（仅域名配置）
@@ -613,7 +612,42 @@ func RunAzureOnHttpRequestBodyTests(t *testing.T) {
 			requestHeaders := host.GetRequestHeaders()
 			pathValue, hasPath := test.GetHeaderValue(requestHeaders, ":path")
 			require.True(t, hasPath, "Path header should exist")
-			require.Contains(t, pathValue, "/openai/deployments/gpt-3.5-turbo/chat/completions", "Path should use model from request body")
+			require.Equal(t, pathValue, "/openai/deployments/gpt-3.5-turbo/chat/completions?api-version=2024-02-15-preview", "Path should use model from request body")
+		})
+
+		// 测试Azure OpenAI模型无关请求处理（仅域名配置）
+		t.Run("azure domain only model independent", func(t *testing.T) {
+			host, status := test.NewTestHost(azureDomainOnlyConfig)
+			defer host.Reset()
+			require.Equal(t, types.OnPluginStartStatusOK, status)
+
+			// 设置请求头
+			action := host.CallOnHttpRequestHeaders([][2]string{
+				{":authority", "example.com"},
+				{":path", "/v1/files?limit=10&purpose=assistants"},
+				{":method", "GET"},
+			})
+			require.Equal(t, types.HeaderStopIteration, action)
+
+			// 验证请求路径是否使用模型占位符
+			requestHeaders := host.GetRequestHeaders()
+			pathValue, hasPath := test.GetHeaderValue(requestHeaders, ":path")
+			require.True(t, hasPath, "Path header should exist")
+			require.Equal(t, pathValue, "/openai/files?limit=10&purpose=assistants&api-version=2024-02-15-preview", "Path should have api-version appended")
+
+			// 设置请求头
+			action = host.CallOnHttpRequestHeaders([][2]string{
+				{":authority", "example.com"},
+				{":path", "/v1/files?"},
+				{":method", "GET"},
+			})
+			require.Equal(t, types.HeaderStopIteration, action)
+
+			// 验证请求路径是否使用模型占位符
+			requestHeaders = host.GetRequestHeaders()
+			pathValue, hasPath = test.GetHeaderValue(requestHeaders, ":path")
+			require.True(t, hasPath, "Path header should exist")
+			require.Equal(t, pathValue, "/openai/files?api-version=2024-02-15-preview", "Path should have api-version appended")
 		})
 	})
 }
@@ -827,10 +861,8 @@ func RunAzureBasePathHandlingTests(t *testing.T) {
 			require.NotContains(t, pathValue, "/azure-gpt4",
 				"After body stage: basePath should be removed from path")
 			// 在 openai 协议下，路径会被转换为 Azure 的路径格式
-			require.Contains(t, pathValue, "/openai/deployments/gpt-4/chat/completions",
+			require.Equal(t, pathValue, "/openai/deployments/gpt-4/chat/completions?api-version=2024-02-15-preview",
 				"Path should be transformed to Azure format")
-			require.Contains(t, pathValue, "api-version=2024-02-15-preview",
-				"Path should contain API version")
 		})

 		// 测试 basePath prepend 在 original 协议下能正常工作
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
johnlanni	b446651dd3	Add release notes	2026-02-22 12:31:15 +00:00
github-actions[bot]	b3fb6324a4	Add release notes (#3524 ) Co-authored-by: johnlanni <6763318+johnlanni@users.noreply.github.com>	2026-02-22 20:14:09 +08:00
澄潭	8576128e4c	feat(ai-statistics): add Claude/Anthropic streaming tool calls parsing support (#3523 )	2026-02-21 14:14:22 +08:00
澄潭	caa5317723	feat: share hub parameter between deployments and plugins with separate namespaces (#3521 )	2026-02-20 23:30:48 +08:00
澄潭	093ef9a2c0	Update index.ts	2026-02-19 12:47:34 +08:00
澄潭	9346f1340b	refactor: migrate MCP SDK to main repo (#3516 )	2026-02-16 23:39:18 +08:00
澄潭	87c6cc9c9f	Fix model pattern for Dashscope entry	2026-02-16 22:40:37 +08:00
澄潭	ac29ba6984	Rename qwen3-coder-plus to qwen3.5-plus	2026-02-16 22:24:58 +08:00
澄潭	1c847dd553	feat(ai-proxy): strip dynamic cch field from billing header to enable caching (#3518 )	2026-02-15 23:57:08 +08:00
澄潭	a07f5024a9	fix(ai-proxy): convert OpenAI tool role to Claude user role with tool_result (#3517 )	2026-02-15 22:45:09 +08:00
澄潭	814c3307ba	fix(ai-statistics): lightweight mode should include question and model (#3513 )	2026-02-15 20:49:57 +08:00
澄潭	b76a3aca5e	feat(ai-statistics): add lightweight mode with use_default_response_attributes (#3512 )	2026-02-15 17:23:54 +08:00
澄潭	28df33c596	feat(ai-statistics): add system field support for Claude /v1/messages API (#3511 )	2026-02-15 14:16:19 +08:00
澄潭	8e7292c42e	fix(ai-proxy): fix Claude protocol conversion issues (#3510 )	2026-02-15 13:52:26 +08:00
澄潭	d03932b3ea	fix(ai-proxy): add streaming tool_calls support for Claude provider (#3507 )	2026-02-15 08:48:20 +08:00
澄潭	5a2ff8c836	fix(ai-proxy): convert Claude tool_use stop_reason to OpenAI tool_calls format (#3506 )	2026-02-14 21:52:25 +08:00
澄潭	6f8ef2ff69	fix(ai-statistics): use RuleAppend as default for streaming answer extraction (#3505 )这个	2026-02-14 13:58:55 +08:00
澄潭	67e2913f3d	fix(ai-proxy): preserve text content alongside tool_result in Claude to OpenAI conversion (#3503 )	2026-02-14 12:12:07 +08:00
澄潭	e996194228	fix(ai-proxy): add missing event field in Claude streaming response (#3502 )	2026-02-14 09:55:14 +08:00
澄潭	95f86d7ab5	feat(ai-proxy): add consumer affinity for stateful APIs (#3499 )	2026-02-14 09:22:12 +08:00
澄潭	5d5d20df1f	fix(ci): pin GitHub Actions runner to ubuntu-22.04 (#3500 )	2026-02-14 07:17:10 +08:00
澄潭	1ddc07992c	Update index.ts	2026-02-13 21:41:19 +08:00
澄潭	13ed2284ae	fix(ai-proxy): fix claude system content null serialization (#3496 )	2026-02-13 20:32:37 +08:00
澄潭	f9c7527753	Update index.ts	2026-02-13 09:40:50 +08:00
澄潭	c2be0e8c9a	fix(ai-statistics): add ValueSource to built-in attributes for streaming body buffering (#3491 )	2026-02-13 09:03:06 +08:00
澄潭	927fb52309	Update sync-skills-to-oss.yaml	2026-02-13 00:01:32 +08:00
澄潭	c0761c4553	Update SKILL.md	2026-02-12 23:38:07 +08:00
澄潭	4f857597da	docs: optimize provider list in OpenClaw integration skill (#3490 )	2026-02-12 23:36:42 +08:00
澄潭	0d45ce755f	feat(skill): add z.ai domain and code plan mode options (#3489 )	2026-02-12 23:21:49 +08:00
澄潭	44d688a168	feat(ai-proxy): add zhipu provider enhancements (#3488 )	2026-02-12 22:19:13 +08:00
澄潭	0d9354da16	fix(skill): correct model reference prefix for higress provider (#3485 )	2026-02-12 19:52:51 +08:00
澄潭	65834bff21	fix(skill): update higress-openclaw-integration to use dedicated install directory (#3484 )	2026-02-12 19:39:43 +08:00
澄潭	668c2b3669	Update SKILL.md	2026-02-12 18:35:24 +08:00
澄潭	ff4de901e7	Update SKILL.md	2026-02-12 18:28:29 +08:00
澄潭	a1967adb94	fix: use absolute path for packaging skills (#3483 )	2026-02-12 18:23:16 +08:00
澄潭	f6cb3031fe	feat: optimize skills packaging in OSS sync workflow (#3482 )	2026-02-12 18:19:57 +08:00
澄潭	d4a0665957	feat: add GitHub Action to sync skills to OSS (#3481 )	2026-02-12 18:15:03 +08:00
澄潭	2c7771da42	Update index.ts	2026-02-12 16:56:23 +08:00
澄潭	75c6fbe090	Rename plugin ID from 'higress-ai-gateway' to 'higress'	2026-02-12 16:50:51 +08:00
澄潭	b153d08610	Update README.md	2026-02-12 16:50:17 +08:00
澄潭	de633d8610	Rename plugin ID from 'higress-ai-gateway' to 'higress'	2026-02-12 16:48:41 +08:00
澄潭	f2e4942f00	Update package.json	2026-02-12 16:48:26 +08:00
澄潭	1b3a8b762b	docs: improve OpenClaw integration prompt for configuration updates (#3480 )	2026-02-12 16:41:31 +08:00
澄潭	c885b89d03	Update SKILL.md	2026-02-12 16:24:01 +08:00
澄潭	ce4dff9887	feat(ai-proxy): convert developer role to system for unsupported providers (#3479 )	2026-02-12 16:14:46 +08:00
澄潭	6935a44d53	docs: mark OpenClaw commands as interactive in SKILL.md (#3478 )	2026-02-12 15:49:13 +08:00
澄潭	b33e2be5e9	Update SKILL.md	2026-02-12 15:43:01 +08:00
澄潭	d2385f1b30	fix: remove duplicate /v1 path in OpenClaw plugin baseUrl (#3477 )	2026-02-12 15:40:01 +08:00
澄潭	ef5e3ee31b	Update index.ts	2026-02-12 15:26:55 +08:00
澄潭	d2b0885236	Update index.ts	2026-02-12 15:07:59 +08:00
澄潭	6cb48247fd	Delete compatibility information from README.md Removed compatibility section for OpenClaw and Higress AI Gateway.	2026-02-12 14:42:26 +08:00
澄潭	773f639260	Update SKILL.md	2026-02-12 14:41:18 +08:00
澄潭	fe58ce3943	Update SKILL.md	2026-02-12 14:37:29 +08:00
澄潭	0dbc056ce9	docs: improve higress-openclaw-integration skill for better usability (#3476 )	2026-02-12 14:34:03 +08:00
github-actions[bot]	3bf39b60ea	Add release notes (#3468 ) Co-authored-by: johnlanni <6763318+johnlanni@users.noreply.github.com>	2026-02-12 14:02:50 +08:00
澄潭	e9bb5d3255	refactor: rename skill to higress-openclaw-integration and update model configs (#3475 )	2026-02-12 14:02:29 +08:00
澄潭	1f10cc293f	chore: update higress-console helm dependency to 2.2.0 (#3472 )	2026-02-11 17:46:53 +08:00
Kent Dong	22ae1aaf69	fix: Fix the incorrect api-version appending logic in AzureProvider (#3289 )	2026-02-11 17:44:29 +08:00
Kent Dong	cd0a6116ce	fix: Fix jwt-auth plugin related typos (#3291 )	2026-02-11 17:43:49 +08:00
Kent Dong	de50630680	doc: Add more related repositories to README files (#3293 )	2026-02-11 17:43:39 +08:00
EndlessSeeker	b3f5d42210	fix: helm pull old image tag (#3471 )	2026-02-11 17:32:32 +08:00
woody	5e2892f18c	fix(provider/bedrock.go): 优化工具调用消息处理逻辑 \|\| fix(provider/bedrock.go): Optimization tool calls message processing logic (#3470 )	2026-02-11 12:33:12 +08:00
澄潭	0cc92aa6b8	fix: update golang.org/x/net to v0.47.0 for hgctl build (#3469 )	2026-02-10 23:21:24 +08:00
澄潭	3ac11743d6	Release 2.2.0 (#3457 ) Co-authored-by: EndlessSeeker <153817598+EndlessSeeker@users.noreply.github.com> Co-authored-by: jingze <daijingze.djz@alibaba-inc.com>	2026-02-10 21:33:23 +08:00
澄潭	cd670e957f	refactor(ai-proxy): remove automatic Bash tool injection in Claude Code mode (#3462 )	2026-02-07 20:24:43 +08:00
澄潭	92ece2c86d	docs: add Claude Code mode to higress-clawdbot-integration skill (#3461 )	2026-02-07 17:00:19 +08:00
澄潭	083bae0e73	feat(ai-proxy): add Claude Code mode support for Claude provider (#3459 )	2026-02-07 15:57:19 +08:00
johnlanni	d982f446dd	Revert "feat: update submodules for git (#3455 )" This reverts commit `ea8ca98d6b`.	2026-02-05 19:24:35 +08:00
EndlessSeeker	ea8ca98d6b	feat: update submodules for git (#3455 )	2026-02-05 19:12:20 +08:00
lvshui	9edb709ca4	fix(ai-statistics): 修复请求模型上下文未设置问题 \|\| fix(ai-statistics): Fix the problem that the request model context is not set (#3380 ) Co-authored-by: rinfx <yucheng.lxr@alibaba-inc.com>	2026-02-04 21:11:55 +08:00
daofeng	07cfdaf88a	fix(ai-proxy): 处理 Qwen 响应无选择项的情况 (#3448 )	2026-02-03 20:33:57 +08:00
澄潭	ec1420bdbd	Update OSS upload path in deployment workflow	2026-02-03 20:07:05 +08:00
澄潭	e2859b0bbf	Modify OSS deployment workflow for artifact path Updated the OSS deployment workflow to use a new artifact path.	2026-02-03 20:06:45 +08:00
github-actions[bot]	7d1e706244	Add release notes (#3449 ) Co-authored-by: johnlanni <6763318+johnlanni@users.noreply.github.com>	2026-02-03 19:53:51 +08:00
澄潭	2cc61a01dc	docs: add inotify max_user_instances troubleshooting to higress-clawdbot-integration skill (#3440 )	2026-02-03 10:02:36 +08:00
澄潭	acaf9fad8d	docs: remove IMAGE_REPO from skill, use PLUGIN_REGISTRY only (#3442 )	2026-02-02 11:36:40 +08:00
澄潭	6e1c3e6aba	docs: update skill for automatic registry selection (#3441 )	2026-02-02 11:12:42 +08:00
澄潭	3132039c27	docs(skill): add regional image repository selection for Higress deployment (#3439 )	2026-02-01 23:19:38 +08:00
澄潭	f81881e138	improve(skill): enhance higress-clawdbot-integration skill (#3438 )	2026-02-01 22:39:32 +08:00
johnlanni	2baacb4617	remove useless extensions	2026-02-01 21:59:05 +08:00
澄潭	04c35d7f6d	feat: integrate higress-ai-gateway plugin into higress-clawdbot-integration skill (#3437 )	2026-02-01 21:57:45 +08:00
澄潭	893b5feeb1	Update SKILL.md	2026-02-01 21:39:04 +08:00
澄潭	6427242787	feat: update integration SKILL provider list and add OpenClaw plugin package (#3436 )	2026-02-01 21:35:55 +08:00
澄潭	493a8d7524	fix: quote description values in skill frontmatter to fix YAML parsing (#3434 )	2026-02-01 19:08:38 +08:00
澄潭	2b8c08acda	docs: optimize higress-auto-router skill following Clawdbot standards (#3433 )	2026-02-01 18:40:30 +08:00
澄潭	961f32266f	docs: optimize higress-clawdbot-integration skill following Clawdbot standards (#3432 )	2026-02-01 18:33:43 +08:00
澄潭	611059a05f	docs: update higress-clawdbot-integration SKILL.md with config subcommand hot-reload (#3431 )	2026-02-01 18:23:55 +08:00
澄潭	6b10f08b86	feat: add Clawdbot integration skills (#3428 )	2026-02-01 14:40:05 +08:00
澄潭	38dedae47d	feat: support use_default_attributes for ai-statistics plugin (#3427 )	2026-02-01 13:47:55 +08:00
澄潭	f288ddf444	feat(skill): add agent-session-monitor skill for LLM observability (#3426 )	2026-02-01 12:23:15 +08:00
澄潭	0c0ec53a50	feat(ai-statistics): support token details and builtin keys for reasoning_tokens/cached_tokens (#3424 )	2026-02-01 11:54:52 +08:00
澄潭	c0ab271370	Update README.md	2026-02-01 11:11:22 +08:00
澄潭	1b0ee6e837	feat(ai-statistics): add session ID tracking for multi-turn agent conversations (#3420 )	2026-02-01 00:35:50 +08:00
澄潭	93075cbc03	fix(model-router): sync model field in request body for auto routing mode (#3422 )	2026-01-31 23:41:17 +08:00
澄潭	f2c5295c47	docs(skill): optimize nginx-to-higress-migration README (#3418 )	2026-01-31 14:18:19 +08:00
澄潭	3e7c559997	feat(skill): improve nginx-to-higress-migration with critical warnings and guides (#3417 )	2026-01-31 13:42:44 +08:00
澄潭	a68cac39c8	docs: add Nginx to Higress migration practice guide (#3416 )	2026-01-31 13:27:13 +08:00
澄潭	4c2e57dd8b	feat: add nginx-to-higress-migration skill (#3411 )	2026-01-31 00:14:49 +08:00
澄潭	6c3fd46c6f	feat(ai-proxy): add context cleanup command support (#3409 )	2026-01-30 17:56:31 +08:00
rinfx	8eaa385a56	support mcp security guard (#3295 )	2026-01-29 19:25:43 +08:00
澄潭	e824653378	docs: fix README - correct references from Claude to Clawdbot (#3405 )	2026-01-29 11:30:53 +08:00
澄潭	da3848c5de	feat: add Higress community governance daily report skill for Claude (#3404 )	2026-01-29 10:45:12 +08:00
澄潭	d30f6c6f0a	feat(model-router): add auto routing based on user message content (#3403 )	2026-01-29 00:08:07 +08:00
澄潭	2fe324761d	feat: add higress wasm go plugin development skill for Claude (#3402 )	2026-01-28 19:08:10 +08:00
zikunchang	f2fcd68ef8	feature: Support getting the API key from the request header when provider.apiTokens is not configured. (#3394 ) Co-authored-by: 澄潭 <zty98751@alibaba-inc.com>	2026-01-28 14:03:24 +08:00
rinfx	cbcc3ecf43	bugfix for model-mapper & model-router (#3370 )	2026-01-28 10:52:45 +08:00
澄潭	a92c89ce61	fix: remove duplicate loadBalancerClass definition in service.yaml (#3400 )	2026-01-27 18:48:39 +08:00
ThxCode-Chen	819f773297	feat: support upstream ipv6 static address (#3384 ) Co-authored-by: EricaLiu <30773688+Erica177@users.noreply.github.com>	2026-01-26 17:30:09 +08:00
aias00	255f0bde76	feat: Map Nacos instance weights to Istio WorkloadEntry weights in watchers (#3342 ) Co-authored-by: EricaLiu <30773688+Erica177@users.noreply.github.com>	2026-01-23 15:56:58 +08:00
woody	a2eb599eff	Implement Vertex Raw mode support in AI Proxy (#3375 )	2026-01-21 14:45:06 +08:00
rinfx	3a28a9b6a7	update wasm-go dependency (#3367 )	2026-01-20 15:13:59 +08:00
woody	399d2f372e	add support for image generation in Vertex AI provider (#3335 )	2026-01-19 16:40:29 +08:00
TianHao Zhang	ac69eb5b27	fix concurrent SSE connections returning wrong endpoint (#3341 )	2026-01-19 10:22:50 +08:00
johnlanni	9d8a1c2e95	Fix the issue of backend errors not being propagated in streamable proxy mode	2026-01-15 20:36:49 +08:00
johnlanni	fb71d7b33d	fix(mcp): remove accept-encoding header to prevent response compression	2026-01-15 16:43:14 +08:00
aias00	eb7b22d2b9	fix: skip unhealthy or disabled services form nacos and always marshal `AllowTools` field (#3220 ) Co-authored-by: EricaLiu <30773688+Erica177@users.noreply.github.com>	2026-01-15 10:46:21 +08:00
woody	f1a5f18c78	feat/ai proxy vertex ai compatible (#3324 )	2026-01-14 10:13:00 +08:00
韩贤涛	e7010256fe	feat: add authentication wrapper for debug endpoints (#3318 )	2026-01-14 09:30:51 +08:00
rinfx	5e787b3258	Replace model-router and model-mapper with Go implementation (#3317 )	2026-01-13 20:14:29 +08:00
woody	23fbe0e9e9	feat(vertex): 为 ai-proxy 插件的 Vertex AI Provider 添加 Express Mode 支持 \|\| feat(vertex): Add Express Mode support to Vertex AI Provider of ai-proxy plug-in (#3301 )	2026-01-13 20:00:05 +08:00
qshuai	72c87b3e15	docs: unknown config entry <show_limit_quota_header> in ai-token-ratelimit plugin (#3241 )	2026-01-10 11:07:43 +08:00
CZJCC	78d4b33424	feat(ai-proxy): add Bearer Token authentication support for Bedrock p… (#3305 )	2026-01-07 19:39:20 +08:00
澄潭	b09793c3d4	Update README.md	2026-01-04 10:45:03 +08:00
澄潭	5d7a30783f	Update README.md	2026-01-04 09:33:30 +08:00
nixidexiangjiao	b98b51ef06	feat(ai-load-balancer): enhance global least request load balancer (#3255 )	2025-12-29 09:28:56 +08:00
johnlanni	9c11c5406f	update helm README.md Change-Id: Ic216d36c4cb0e570c9084b63c9f250c9ab6f4cec	2025-12-26 17:35:49 +08:00
Wilson Wu	10ca6d9515	feat: add topology spread constraints for gateway and controller (#3171 ) Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>	2025-12-26 17:30:31 +08:00
Kent Dong	08a7204085	feat: Add traffic-editor plugin (#2825 )	2025-12-26 17:29:55 +08:00