feat(ai-proxy): add cooldownDuration support for failover token recovery (#3700)

Signed-off-by: wydream <yaodiwu618@gmail.com> Signed-off-by: woody <yaodiwu618@gmail.com>
2026-05-28 14:47:29 +08:00 · 2026-05-20 18:11:11 +08:00
parent e7651f3d3e
commit 739d47ba9c
6 changed files with 890 additions and 48 deletions
--- a/plugins/wasm-go/extensions/ai-proxy/README_EN.md
+++ b/plugins/wasm-go/extensions/ai-proxy/README_EN.md
@@ -51,6 +51,7 @@ Plugin execution priority: `100`
 | `protocol`       | string                 | Optional    | -       | API contract provided by the plugin. Currently supports the following values: openai (default, uses OpenAI's interface contract), original (uses the raw interface contract of the target service provider). **Note: Auto protocol detection is now supported, no need to configure this field to support both OpenAI and Claude protocols**                                                                                                                                                                               |
 | `context`        | object                 | Optional    | -       | Configuration for AI conversation context information                                                                                                                                                                                                                                                                                                                                     |
 | `customSettings` | array of customSetting | Optional    | -       | Specifies overrides or fills parameters for AI requests                                                                                                                                                                                                                                                                                                                                   |
+| `failover`       | object                 | Optional    | -       | Configures apiToken failover. When an apiToken becomes unavailable, it is removed from the available token list and restored after a successful health check or after the cooldown period expires.                                                                                                                                                                                        |
 | `subPath`        | string                 | Optional    | -       | If subPath is configured, the prefix will be removed from the request path before further processing.                                                                                                                                                                                                                                                                                     |
 | `contextCleanupCommands` | array of string | Optional    | -       | List of context cleanup commands. When a user message in the request exactly matches any of the configured commands, that message and all non-system messages before it will be removed, keeping only system messages and messages after the command. This enables users to actively clear conversation history.                                                                           |

@@ -84,6 +85,21 @@ The `custom-setting` adheres to the following table, replacing the corresponding
 If raw mode is enabled, `custom-setting` will directly alter the JSON content using the input `name` and `value`, without any restrictions or modifications to the parameter names.
 For most protocols, `custom-setting` modifies or fills parameters at the root path of the JSON content. For the `qwen` protocol, ai-proxy configures under the `parameters` subpath. For the `gemini` protocol, it configures under the `generation_config` subpath.

+**Details for the `failover` configuration fields:**
+
+| Name                  | Data Type       | Requirement                                      | Default        | Description                                                                                                          |
+| --------------------- | --------------- | ------------------------------------------------ | -------------- | -------------------------------------------------------------------------------------------------------------------- |
+| `enabled`             | bool            | Optional                                         | false          | Whether to enable apiToken failover.                                                                                 |
+| `failureThreshold`    | int             | Optional                                         | 3              | Number of consecutive request failures required before triggering failover.                                           |
+| `successThreshold`    | int             | Optional                                         | 1              | Number of successful health checks required before restoring an unavailable apiToken.                                 |
+| `healthCheckInterval` | int             | Optional                                         | 5000           | Health check interval in milliseconds.                                                                               |
+| `healthCheckTimeout`  | int             | Optional                                         | 5000           | Health check timeout in milliseconds.                                                                                |
+| `healthCheckModel`    | string          | Required when failover is enabled unless `cooldownDuration` is configured | - | Model used for health checks. When configured, unavailable apiTokens can be restored after passing health checks.     |
+| `cooldownDuration`    | int             | Required when failover is enabled unless `healthCheckModel` is configured | 0 | Cooldown duration in milliseconds after an apiToken becomes unavailable. When greater than 0, the apiToken is restored automatically after the cooldown expires. |
+| `failoverOnStatus`    | array of string | Optional                                         | ["4.*", "5.*"] | Response status codes that trigger failover for original requests. Regular expressions are supported.                 |
+
+At least one of `healthCheckModel` and `cooldownDuration` must be configured when failover is enabled. If both are configured, an apiToken can be restored either by a successful health check or after the cooldown period expires.
+
 ### Provider-Specific Configurations

 #### OpenAI