mirror of
https://github.com/alibaba/higress.git
synced 2026-05-28 14:47:29 +08:00
refactor(ai-cache): update README files to match latest config parsing code (#3730)
This commit is contained in:
@@ -3,46 +3,243 @@ title: AI Cache
|
||||
keywords: [higress,ai cache]
|
||||
description: AI Cache Plugin Configuration Reference
|
||||
---
|
||||
|
||||
## Function Description
|
||||
LLM result caching plugin, the default configuration can be directly used for result caching under the OpenAI protocol, and it supports caching of both streaming and non-streaming responses.
|
||||
|
||||
LLM result caching plugin. The default configuration can be directly used for OpenAI protocol result caching, and supports caching of both streaming and non-streaming responses.
|
||||
|
||||
**Tips**
|
||||
|
||||
When carrying the request header `x-higress-skip-ai-cache: on`, the current request will not use content from the cache but will be directly forwarded to the backend service. Additionally, the response content from this request will not be cached.
|
||||
|
||||
## Runtime Properties
|
||||
|
||||
Plugin Execution Phase: `Authentication Phase`
|
||||
Plugin Execution Priority: `10`
|
||||
|
||||
## Configuration Description
|
||||
| Name | Type | Requirement | Default | Description |
|
||||
| -------- | -------- | -------- | -------- | -------- |
|
||||
| cacheKeyFrom.requestBody | string | optional | "messages.@reverse.0.content" | Extracts a string from the request Body based on [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
|
||||
| cacheValueFrom.responseBody | string | optional | "choices.0.message.content" | Extracts a string from the response Body based on [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
|
||||
| cacheStreamValueFrom.responseBody | string | optional | "choices.0.delta.content" | Extracts a string from the streaming response Body based on [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
|
||||
| cacheKeyPrefix | string | optional | "higress-ai-cache:" | Prefix for the Redis cache key |
|
||||
| cacheTTL | integer | optional | 0 | Cache expiration time in seconds, default value is 0, which means never expire |
|
||||
| redis.serviceName | string | required | - | The complete FQDN name of the Redis service, including the service type, e.g., my-redis.dns, redis.my-ns.svc.cluster.local |
|
||||
| redis.servicePort | integer | optional | 6379 | Redis service port |
|
||||
| redis.timeout | integer | optional | 1000 | Timeout for requests to Redis, in milliseconds |
|
||||
| redis.username | string | optional | - | Username for logging into Redis |
|
||||
| redis.database | int | optional | 0 | The database ID used, limited to Redis, for example, configured as 1, corresponds to `SELECT 1`. |
|
||||
| redis.password | string | optional | - | Password for logging into Redis |
|
||||
| returnResponseTemplate | string | optional | `{"id":"from-cache","choices":[%s],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}` | Template for returning HTTP response, with %s marking the part to be replaced by cache value |
|
||||
| returnStreamResponseTemplate | string | optional | `data:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n` | Template for returning streaming HTTP response, with %s marking the part to be replaced by cache value |
|
||||
|
||||
The configuration is divided into 3 parts: Vector Database (vector), Text Embedding Service (embedding), and Cache Service (cache). It also provides fine-grained LLM request/response extraction parameter configurations.
|
||||
|
||||
This plugin supports both vector database-based semantic caching and string matching-based caching methods. If both vector database and cache database are configured, the cache database is used first, and the vector database capability is used when cache misses occur.
|
||||
|
||||
*Note*: Vector database (vector) and cache database (cache) cannot both be empty, otherwise this plugin cannot provide caching services.
|
||||
|
||||
| Name | Type | Requirement | Default | Description |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| vector | object | optional | - | Vector storage service configuration, see Vector Database Service section below |
|
||||
| embedding | object | optional | - | Text embedding service configuration, see Text Embedding Service section below |
|
||||
| cache | object | optional | - | Cache service configuration, see Cache Service section below |
|
||||
| cacheKeyStrategy | string | optional | "lastQuestion" | Strategy for generating cache key from historical questions. Options: "lastQuestion" (use last question), "allQuestions" (concatenate all questions), or "disabled" (disable caching) |
|
||||
| enableSemanticCache | bool | optional | false | Whether to enable semantic caching. If disabled, string matching is used to find cache, requiring cache service configuration. Automatically enabled when a vector provider is configured |
|
||||
|
||||
Depending on whether semantic caching is needed, you can configure component combinations as follows:
|
||||
1. `cache`: Enable string matching cache only
|
||||
2. `vector (+ embedding)`: Enable semantic caching. If `vector` does not provide string representation service, you need to configure `embedding` service separately
|
||||
3. `vector (+ embedding) + cache`: Enable semantic caching and use cache service to store LLM responses for acceleration
|
||||
|
||||
If you do not configure a related component, you can ignore the `required` fields of that component.
|
||||
|
||||
## Vector Database Service (vector)
|
||||
|
||||
| Name | Type | Requirement | Default | Description |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| vector.type | string | required | - | Vector storage service provider type, e.g., dashvector, chroma, elasticsearch, weaviate, pinecone, qdrant, milvus |
|
||||
| vector.serviceName | string | required | - | Vector storage service name |
|
||||
| vector.serviceHost | string | optional | - | Vector storage service domain. Required for some providers (e.g., dashvector, pinecone) |
|
||||
| vector.servicePort | int64 | optional | 443 | Vector storage service port |
|
||||
| vector.apiKey | string | optional | - | Vector storage service API Key |
|
||||
| vector.topK | int | optional | 1 | Return TopK results |
|
||||
| vector.timeout | uint32 | optional | 10000 | Timeout for requesting vector storage service, in milliseconds. Default is 10000 (10 seconds) |
|
||||
| vector.collectionID | string | optional | - | Vector storage service Collection ID |
|
||||
| vector.threshold | float64 | optional | 1000 | Vector similarity measurement threshold |
|
||||
| vector.thresholdRelation | string | optional | "lt" | Similarity measurement comparison method. Similarity measurement methods include `Cosine`, `DotProduct`, `Euclidean`, etc. The first two have higher similarity with larger values, while the latter has higher similarity with smaller values. Use `gt` for `Cosine` and `DotProduct`, and `lt` for `Euclidean`. All options include `lt` (less than), `lte` (less than or equal to), `gt` (greater than), `gte` (greater than or equal to) |
|
||||
| vector.esUsername | string | optional | - | ElasticSearch username, only for elasticsearch type |
|
||||
| vector.esPassword | string | optional | - | ElasticSearch password, only for elasticsearch type |
|
||||
|
||||
## Text Embedding Service (embedding)
|
||||
|
||||
| Name | Type | Requirement | Default | Description |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| embedding.type | string | required | - | Text embedding service type, e.g., dashscope, openai, azure, cohere, ollama, huggingface, textin, xfyun |
|
||||
| embedding.serviceName | string | required | - | Text embedding service name |
|
||||
| embedding.serviceHost | string | optional | - | Text embedding service domain |
|
||||
| embedding.servicePort | int64 | optional | 443 | Text embedding service port. Default varies by provider; ollama defaults to 11434 |
|
||||
| embedding.timeout | uint32 | optional | 10000 | Timeout for requesting text embedding service, in milliseconds. Default is 10000 (10 seconds) |
|
||||
| embedding.model | string | optional | - | Model name for text embedding service |
|
||||
| embedding.apiKey | string | optional | - | API Key for text embedding service |
|
||||
|
||||
## Cache Service (cache)
|
||||
|
||||
| Name | Type | Requirement | Default | Description |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| cache.type | string | required | - | Cache service type, e.g., redis |
|
||||
| cache.serviceName | string | required | - | Cache service name |
|
||||
| cache.serviceHost | string | optional | - | Cache service domain |
|
||||
| cache.servicePort | int64 | optional | 6379 | Cache service port. If serviceName ends with .static, default is 80 |
|
||||
| cache.username | string | optional | - | Cache service username |
|
||||
| cache.password | string | optional | - | Cache service password |
|
||||
| cache.timeout | uint32 | optional | 10000 | Cache service timeout, in milliseconds. Default is 10000 (10 seconds) |
|
||||
| cache.cacheTTL | int | optional | 0 | Cache expiration time, in seconds. Default is 0 (never expire) |
|
||||
| cache.cacheKeyPrefix | string | optional | "higress-ai-cache:" | Prefix for cache keys |
|
||||
| cache.database | int | optional | 0 | Database ID to use, only for Redis. For example, configure as 1 for `SELECT 1` |
|
||||
|
||||
## Other Configurations
|
||||
|
||||
| Name | Type | Requirement | Default | Description |
|
||||
| --- | --- | --- | --- | --- |
|
||||
| cacheKeyFrom | string | optional | "messages.@reverse.0.content" | Extract string from request Body using [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
|
||||
| cacheValueFrom | string | optional | "choices.0.message.content" | Extract string from response Body using [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
|
||||
| cacheStreamValueFrom | string | optional | "choices.0.delta.content" | Extract string from streaming response Body using [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
|
||||
| cacheToolCallsFrom | string | optional | "choices.0.delta.content.tool_calls" | Extract string from streaming response Body using [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
|
||||
| responseTemplate | string | optional | `{"id":"from-cache","choices":[{"index":0,"message":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}` | Template for returning HTTP response, with %s marking the part to be replaced by cache value |
|
||||
| streamResponseTemplate | string | optional | `data:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n` | Template for returning streaming HTTP response, with %s marking the part to be replaced by cache value |
|
||||
|
||||
## Text Embedding Provider Specific Configurations
|
||||
|
||||
### Azure OpenAI
|
||||
|
||||
For Azure OpenAI, set `embedding.type` to `azure`. You need to first create an [Azure OpenAI account](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/overview), then select and deploy a model in [Azure AI Foundry](https://ai.azure.com/resource/deployments). Click on your deployed model to see the target URI and key in the endpoint section. Please enter the host from the URI in `embedding.serviceHost` and the key in `embedding.apiKey`.
|
||||
|
||||
A complete URI example is https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings?api-version=2024-10-21. You need to enter `YOUR_RESOURCE_NAME.openai.azure.com` in `embedding.serviceHost`.
|
||||
|
||||
Specific configuration fields:
|
||||
|
||||
| Name | Data Type | Requirement | Default | Description |
|
||||
| ---------------------- | -------- | -------- | ------ | ------- |
|
||||
| `embedding.apiVersion` | string | required | - | API version, the api-version value from the obtained URI |
|
||||
|
||||
Note that you must specify `embedding.serviceHost`, such as `YOUR_RESOURCE_NAME.openai.azure.com`. The default model is `text-embedding-ada-002`. For other models, specify in `embedding.model`.
|
||||
|
||||
### Cohere
|
||||
|
||||
For Cohere, set `embedding.type` to `cohere`. There are no specific configuration fields. You need to create an [API Key](https://docs.cohere.com/reference/embed) and enter it in `embedding.apiKey`.
|
||||
|
||||
### OpenAI
|
||||
|
||||
For OpenAI, set `embedding.type` to `openai`. There are no specific configuration fields. You need to create an [API Key](https://platform.openai.com/settings/organization/api-keys) and enter it in `embedding.apiKey`. An API Key example is `sk-xxxxxxx`.
|
||||
|
||||
### Ollama
|
||||
|
||||
For Ollama, set `embedding.type` to `ollama`. There are no specific configuration fields.
|
||||
|
||||
### Hugging Face
|
||||
|
||||
For Hugging Face, set `embedding.type` to `huggingface`. There are no specific configuration fields. You need to create an [hf_token](https://huggingface.co/blog/getting-started-with-embeddings) and enter it in `embedding.apiKey`. An hf_token example is `hf_xxxxxxx`.
|
||||
|
||||
`embedding.model` defaults to `sentence-transformers/all-MiniLM-L6-v2`.
|
||||
|
||||
### DashScope
|
||||
|
||||
For DashScope, set `embedding.type` to `dashscope`. You need to create an [API Key](https://help.aliyun.com/document_detail/2712195.html) and enter it in `embedding.apiKey`.
|
||||
|
||||
`embedding.model` defaults to `text-embedding-v2`. Other models like `text-embedding-v1` can also be used.
|
||||
|
||||
### TextIn
|
||||
|
||||
For TextIn, set `embedding.type` to `textin`. You need to first obtain [`app-id` and `secret-code`](https://www.textin.com/document/acge_text_embedding).
|
||||
|
||||
Specific configuration fields:
|
||||
|
||||
| Name | Data Type | Requirement | Default | Description |
|
||||
| ------------------------------- | -------- | -------- | ------ | ------------------ |
|
||||
| `embedding.textinAppId` | string | required | - | Application ID, obtained app-id |
|
||||
| `embedding.textinSecretCode` | string | required | - | Secret for calling API, obtained secret-code |
|
||||
| `embedding.textinMatryoshkaDim` | int | required | - | Dimension of returned single vector |
|
||||
|
||||
### Xfyun (讯飞星火)
|
||||
|
||||
For Xfyun, set `embedding.type` to `xfyun`. You need to first create an [application](https://console.xfyun.cn/services/emb) to obtain `APPID`, `APISecret`, and `APIKey`, and enter `APIKey` in `embedding.apiKey`.
|
||||
|
||||
Specific configuration fields:
|
||||
|
||||
| Name | Data Type | Requirement | Default | Description |
|
||||
| --------------------- | -------- | -------- | ------ | -------------------- |
|
||||
| `embedding.appId` | string | required | - | Application ID, obtained APPID |
|
||||
| `embedding.apiSecret` | string | required | - | Secret for calling API, obtained APISecret |
|
||||
|
||||
## Vector Database Provider Specific Configurations
|
||||
|
||||
### Chroma
|
||||
|
||||
For Chroma, set `vector.type` to `chroma`. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection ID in `vector.collectionID`. A Collection ID example is `52bbb8b3-724c-477b-a4ce-d5b578214612`.
|
||||
|
||||
### DashVector
|
||||
|
||||
For DashVector, set `vector.type` to `dashvector`. There are no specific configuration fields. You need to create a Collection in advance and fill in the `Collection Name` in `vector.collectionID`.
|
||||
|
||||
### ElasticSearch
|
||||
|
||||
For ElasticSearch, set `vector.type` to `elasticsearch`. You need to create an Index in advance and fill in the Index Name in `vector.collectionID`.
|
||||
|
||||
It currently relies on the [KNN](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html) method. Please ensure your ES version supports `KNN`. It has been tested on version `8.16`.
|
||||
|
||||
Specific configuration fields:
|
||||
|
||||
| Name | Data Type | Requirement | Default | Description |
|
||||
|-------------------|----------|----------|--------|-------------------------------------------------------------------------------|
|
||||
| `vector.esUsername` | string | optional | - | ElasticSearch username |
|
||||
| `vector.esPassword` | string | optional | - | ElasticSearch password |
|
||||
|
||||
`vector.esUsername` and `vector.esPassword` are used for Basic authentication. API Key authentication is also supported. When `vector.apiKey` is filled in, API Key authentication is enabled. For SaaS versions, you need to fill in the `encoded` value.
|
||||
|
||||
### Milvus
|
||||
|
||||
For Milvus, set `vector.type` to `milvus`. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in `vector.collectionID`.
|
||||
|
||||
### Pinecone
|
||||
|
||||
For Pinecone, set `vector.type` to `pinecone`. There are no specific configuration fields. You need to create an Index in advance and fill in the Index access domain in `vector.serviceHost`.
|
||||
|
||||
The `Namespace` parameter in Pinecone is configured through the plugin's `vector.collectionID`. If `vector.collectionID` is not filled in, it defaults to the Default Namespace.
|
||||
|
||||
### Qdrant
|
||||
|
||||
For Qdrant, set `vector.type` to `qdrant`. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in `vector.collectionID`.
|
||||
|
||||
### Weaviate
|
||||
|
||||
For Weaviate, set `vector.type` to `weaviate`. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in `vector.collectionID`.
|
||||
|
||||
Note that Weaviate automatically capitalizes the first letter, so when filling in `collectionID`, the first letter should be capitalized.
|
||||
|
||||
If using SaaS, you need to fill in the `vector.serviceHost` parameter.
|
||||
|
||||
## Configuration Example
|
||||
```yaml
|
||||
redis:
|
||||
serviceName: my-redis.dns
|
||||
timeout: 2000
|
||||
|
||||
### Basic Configuration
|
||||
```yaml
|
||||
embedding:
|
||||
type: dashscope
|
||||
serviceName: my_dashscope.dns
|
||||
apiKey: [Your Key]
|
||||
|
||||
vector:
|
||||
type: dashvector
|
||||
serviceName: my_dashvector.dns
|
||||
collectionID: [Your Collection ID]
|
||||
serviceHost: [Your domain]
|
||||
apiKey: [Your key]
|
||||
|
||||
cache:
|
||||
type: redis
|
||||
serviceName: my_redis.dns
|
||||
servicePort: 6379
|
||||
database: 1
|
||||
```
|
||||
timeout: 100
|
||||
```
|
||||
|
||||
## Advanced Usage
|
||||
The current default cache key is based on the GJSON PATH expression: `messages.@reverse.0.content`, meaning to get the content of the first item after reversing the messages array;
|
||||
GJSON PATH supports conditional syntax, for instance, if you want to take the content of the last role as user as the key, it can be written as: `messages.@reverse.#(role=="user").content`;
|
||||
If you want to concatenate all the content with role as user into an array as the key, it can be written as: `messages.@reverse.#(role=="user")#.content`;
|
||||
It also supports pipeline syntax, for example, if you want to take the second role as user as the key, it can be written as: `messages.@reverse.#(role=="user")#.content|1`.
|
||||
For more usage, you can refer to the [official documentation](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) and use the [GJSON Playground](https://gjson.dev/) for syntax testing.
|
||||
|
||||
The current default cache key is extracted based on the GJSON PATH expression: `messages.@reverse.0.content`, which means reversing the messages array and taking the content of the first item.
|
||||
|
||||
GJSON PATH supports conditional syntax. For example, to get the content of the last role as user as the key, you can write: `messages.@reverse.#(role=="user").content`;
|
||||
|
||||
If you want to concatenate all content with role as user into an array as the key, you can write: `messages.@reverse.#(role=="user")#.content`;
|
||||
|
||||
It also supports pipeline syntax. For example, to get the second-to-last role as user as the key, you can write: `messages.@reverse.#(role=="user")#.content|1`.
|
||||
|
||||
For more usage, please refer to the [official documentation](https://github.com/tidwall/gjson/blob/master/SYNTAX.md). You can use the [GJSON Playground](https://gjson.dev/) for syntax testing.
|
||||
|
||||
## FAQ
|
||||
|
||||
1. If the returned error is `error status returned by host: bad argument`, please check if `serviceName` correctly includes the service type suffix (.dns, etc.).
|
||||
|
||||
Reference in New Issue
Block a user