mirror of
https://github.com/alibaba/higress.git
synced 2026-05-28 14:47:29 +08:00
feat(ai-proxy): add cooldownDuration support for failover token recovery (#3700)
Signed-off-by: wydream <yaodiwu618@gmail.com> Signed-off-by: woody <yaodiwu618@gmail.com>
This commit is contained in:
@@ -51,6 +51,7 @@ Plugin execution priority: `100`
|
||||
| `protocol` | string | Optional | - | API contract provided by the plugin. Currently supports the following values: openai (default, uses OpenAI's interface contract), original (uses the raw interface contract of the target service provider). **Note: Auto protocol detection is now supported, no need to configure this field to support both OpenAI and Claude protocols** |
|
||||
| `context` | object | Optional | - | Configuration for AI conversation context information |
|
||||
| `customSettings` | array of customSetting | Optional | - | Specifies overrides or fills parameters for AI requests |
|
||||
| `failover` | object | Optional | - | Configures apiToken failover. When an apiToken becomes unavailable, it is removed from the available token list and restored after a successful health check or after the cooldown period expires. |
|
||||
| `subPath` | string | Optional | - | If subPath is configured, the prefix will be removed from the request path before further processing. |
|
||||
| `contextCleanupCommands` | array of string | Optional | - | List of context cleanup commands. When a user message in the request exactly matches any of the configured commands, that message and all non-system messages before it will be removed, keeping only system messages and messages after the command. This enables users to actively clear conversation history. |
|
||||
|
||||
@@ -84,6 +85,21 @@ The `custom-setting` adheres to the following table, replacing the corresponding
|
||||
If raw mode is enabled, `custom-setting` will directly alter the JSON content using the input `name` and `value`, without any restrictions or modifications to the parameter names.
|
||||
For most protocols, `custom-setting` modifies or fills parameters at the root path of the JSON content. For the `qwen` protocol, ai-proxy configures under the `parameters` subpath. For the `gemini` protocol, it configures under the `generation_config` subpath.
|
||||
|
||||
**Details for the `failover` configuration fields:**
|
||||
|
||||
| Name | Data Type | Requirement | Default | Description |
|
||||
| --------------------- | --------------- | ------------------------------------------------ | -------------- | -------------------------------------------------------------------------------------------------------------------- |
|
||||
| `enabled` | bool | Optional | false | Whether to enable apiToken failover. |
|
||||
| `failureThreshold` | int | Optional | 3 | Number of consecutive request failures required before triggering failover. |
|
||||
| `successThreshold` | int | Optional | 1 | Number of successful health checks required before restoring an unavailable apiToken. |
|
||||
| `healthCheckInterval` | int | Optional | 5000 | Health check interval in milliseconds. |
|
||||
| `healthCheckTimeout` | int | Optional | 5000 | Health check timeout in milliseconds. |
|
||||
| `healthCheckModel` | string | Required when failover is enabled unless `cooldownDuration` is configured | - | Model used for health checks. When configured, unavailable apiTokens can be restored after passing health checks. |
|
||||
| `cooldownDuration` | int | Required when failover is enabled unless `healthCheckModel` is configured | 0 | Cooldown duration in milliseconds after an apiToken becomes unavailable. When greater than 0, the apiToken is restored automatically after the cooldown expires. |
|
||||
| `failoverOnStatus` | array of string | Optional | ["4.*", "5.*"] | Response status codes that trigger failover for original requests. Regular expressions are supported. |
|
||||
|
||||
At least one of `healthCheckModel` and `cooldownDuration` must be configured when failover is enabled. If both are configured, an apiToken can be restored either by a successful health check or after the cooldown period expires.
|
||||
|
||||
### Provider-Specific Configurations
|
||||
|
||||
#### OpenAI
|
||||
|
||||
Reference in New Issue
Block a user