title, keywords, description
| title | keywords | description | ||
|---|---|---|---|---|
| AI Cache |
|
AI Cache Plugin Configuration Reference |
Function Description
LLM result caching plugin. The default configuration can be directly used for OpenAI protocol result caching, and supports caching of both streaming and non-streaming responses.
Tips
When carrying the request header x-higress-skip-ai-cache: on, the current request will not use content from the cache but will be directly forwarded to the backend service. Additionally, the response content from this request will not be cached.
Runtime Properties
Plugin Execution Phase: Authentication Phase
Plugin Execution Priority: 10
Configuration Description
The configuration is divided into 3 parts: Vector Database (vector), Text Embedding Service (embedding), and Cache Service (cache). It also provides fine-grained LLM request/response extraction parameter configurations.
This plugin supports both vector database-based semantic caching and string matching-based caching methods. If both vector database and cache database are configured, the cache database is used first, and the vector database capability is used when cache misses occur.
Note: Vector database (vector) and cache database (cache) cannot both be empty, otherwise this plugin cannot provide caching services.
| Name | Type | Requirement | Default | Description |
|---|---|---|---|---|
| vector | object | optional | - | Vector storage service configuration, see Vector Database Service section below |
| embedding | object | optional | - | Text embedding service configuration, see Text Embedding Service section below |
| cache | object | optional | - | Cache service configuration, see Cache Service section below |
| cacheKeyStrategy | string | optional | "lastQuestion" | Strategy for generating cache key from historical questions. Options: "lastQuestion" (use last question), "allQuestions" (concatenate all questions), or "disabled" (disable caching) |
| enableSemanticCache | bool | optional | false | Whether to enable semantic caching. If disabled, string matching is used to find cache, requiring cache service configuration. Automatically enabled when a vector provider is configured |
Depending on whether semantic caching is needed, you can configure component combinations as follows:
cache: Enable string matching cache onlyvector (+ embedding): Enable semantic caching. Ifvectordoes not provide string representation service, you need to configureembeddingservice separatelyvector (+ embedding) + cache: Enable semantic caching and use cache service to store LLM responses for acceleration
If you do not configure a related component, you can ignore the required fields of that component.
Vector Database Service (vector)
| Name | Type | Requirement | Default | Description |
|---|---|---|---|---|
| vector.type | string | required | - | Vector storage service provider type, e.g., dashvector, chroma, elasticsearch, weaviate, pinecone, qdrant, milvus |
| vector.serviceName | string | required | - | Vector storage service name |
| vector.serviceHost | string | optional | - | Vector storage service domain. Required for some providers (e.g., dashvector, pinecone) |
| vector.servicePort | int64 | optional | 443 | Vector storage service port |
| vector.apiKey | string | optional | - | Vector storage service API Key |
| vector.topK | int | optional | 1 | Return TopK results |
| vector.timeout | uint32 | optional | 10000 | Timeout for requesting vector storage service, in milliseconds. Default is 10000 (10 seconds) |
| vector.collectionID | string | optional | - | Vector storage service Collection ID |
| vector.threshold | float64 | optional | 1000 | Vector similarity measurement threshold |
| vector.thresholdRelation | string | optional | "lt" | Similarity measurement comparison method. Similarity measurement methods include Cosine, DotProduct, Euclidean, etc. The first two have higher similarity with larger values, while the latter has higher similarity with smaller values. Use gt for Cosine and DotProduct, and lt for Euclidean. All options include lt (less than), lte (less than or equal to), gt (greater than), gte (greater than or equal to) |
| vector.esUsername | string | optional | - | ElasticSearch username, only for elasticsearch type |
| vector.esPassword | string | optional | - | ElasticSearch password, only for elasticsearch type |
Text Embedding Service (embedding)
| Name | Type | Requirement | Default | Description |
|---|---|---|---|---|
| embedding.type | string | required | - | Text embedding service type, e.g., dashscope, openai, azure, cohere, ollama, huggingface, textin, xfyun |
| embedding.serviceName | string | required | - | Text embedding service name |
| embedding.serviceHost | string | optional | - | Text embedding service domain |
| embedding.servicePort | int64 | optional | 443 | Text embedding service port. Default varies by provider; ollama defaults to 11434 |
| embedding.timeout | uint32 | optional | 10000 | Timeout for requesting text embedding service, in milliseconds. Default is 10000 (10 seconds) |
| embedding.model | string | optional | - | Model name for text embedding service |
| embedding.apiKey | string | optional | - | API Key for text embedding service |
Cache Service (cache)
| Name | Type | Requirement | Default | Description |
|---|---|---|---|---|
| cache.type | string | required | - | Cache service type, e.g., redis |
| cache.serviceName | string | required | - | Cache service name |
| cache.serviceHost | string | optional | - | Cache service domain |
| cache.servicePort | int64 | optional | 6379 | Cache service port. If serviceName ends with .static, default is 80 |
| cache.username | string | optional | - | Cache service username |
| cache.password | string | optional | - | Cache service password |
| cache.timeout | uint32 | optional | 10000 | Cache service timeout, in milliseconds. Default is 10000 (10 seconds) |
| cache.cacheTTL | int | optional | 0 | Cache expiration time, in seconds. Default is 0 (never expire) |
| cache.cacheKeyPrefix | string | optional | "higress-ai-cache:" | Prefix for cache keys |
| cache.database | int | optional | 0 | Database ID to use, only for Redis. For example, configure as 1 for SELECT 1 |
Other Configurations
| Name | Type | Requirement | Default | Description |
|---|---|---|---|---|
| cacheKeyFrom | string | optional | "messages.@reverse.0.content" | Extract string from request Body using GJSON PATH syntax |
| cacheValueFrom | string | optional | "choices.0.message.content" | Extract string from response Body using GJSON PATH syntax |
| cacheStreamValueFrom | string | optional | "choices.0.delta.content" | Extract string from streaming response Body using GJSON PATH syntax |
| cacheToolCallsFrom | string | optional | "choices.0.delta.content.tool_calls" | Extract string from streaming response Body using GJSON PATH syntax |
| responseTemplate | string | optional | {"id":"from-cache","choices":[{"index":0,"message":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}} |
Template for returning HTTP response, with %s marking the part to be replaced by cache value |
| streamResponseTemplate | string | optional | data:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n |
Template for returning streaming HTTP response, with %s marking the part to be replaced by cache value |
Text Embedding Provider Specific Configurations
Azure OpenAI
For Azure OpenAI, set embedding.type to azure. You need to first create an Azure OpenAI account, then select and deploy a model in Azure AI Foundry. Click on your deployed model to see the target URI and key in the endpoint section. Please enter the host from the URI in embedding.serviceHost and the key in embedding.apiKey.
A complete URI example is https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings?api-version=2024-10-21. You need to enter YOUR_RESOURCE_NAME.openai.azure.com in embedding.serviceHost.
Specific configuration fields:
| Name | Data Type | Requirement | Default | Description |
|---|---|---|---|---|
embedding.apiVersion |
string | required | - | API version, the api-version value from the obtained URI |
Note that you must specify embedding.serviceHost, such as YOUR_RESOURCE_NAME.openai.azure.com. The default model is text-embedding-ada-002. For other models, specify in embedding.model.
Cohere
For Cohere, set embedding.type to cohere. There are no specific configuration fields. You need to create an API Key and enter it in embedding.apiKey.
OpenAI
For OpenAI, set embedding.type to openai. There are no specific configuration fields. You need to create an API Key and enter it in embedding.apiKey. An API Key example is sk-xxxxxxx.
Ollama
For Ollama, set embedding.type to ollama. There are no specific configuration fields.
Hugging Face
For Hugging Face, set embedding.type to huggingface. There are no specific configuration fields. You need to create an hf_token and enter it in embedding.apiKey. An hf_token example is hf_xxxxxxx.
embedding.model defaults to sentence-transformers/all-MiniLM-L6-v2.
DashScope
For DashScope, set embedding.type to dashscope. You need to create an API Key and enter it in embedding.apiKey.
embedding.model defaults to text-embedding-v2. Other models like text-embedding-v1 can also be used.
TextIn
For TextIn, set embedding.type to textin. You need to first obtain app-id and secret-code.
Specific configuration fields:
| Name | Data Type | Requirement | Default | Description |
|---|---|---|---|---|
embedding.textinAppId |
string | required | - | Application ID, obtained app-id |
embedding.textinSecretCode |
string | required | - | Secret for calling API, obtained secret-code |
embedding.textinMatryoshkaDim |
int | required | - | Dimension of returned single vector |
Xfyun (讯飞星火)
For Xfyun, set embedding.type to xfyun. You need to first create an application to obtain APPID, APISecret, and APIKey, and enter APIKey in embedding.apiKey.
Specific configuration fields:
| Name | Data Type | Requirement | Default | Description |
|---|---|---|---|---|
embedding.appId |
string | required | - | Application ID, obtained APPID |
embedding.apiSecret |
string | required | - | Secret for calling API, obtained APISecret |
Vector Database Provider Specific Configurations
Chroma
For Chroma, set vector.type to chroma. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection ID in vector.collectionID. A Collection ID example is 52bbb8b3-724c-477b-a4ce-d5b578214612.
DashVector
For DashVector, set vector.type to dashvector. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in vector.collectionID.
ElasticSearch
For ElasticSearch, set vector.type to elasticsearch. You need to create an Index in advance and fill in the Index Name in vector.collectionID.
It currently relies on the KNN method. Please ensure your ES version supports KNN. It has been tested on version 8.16.
Specific configuration fields:
| Name | Data Type | Requirement | Default | Description |
|---|---|---|---|---|
vector.esUsername |
string | optional | - | ElasticSearch username |
vector.esPassword |
string | optional | - | ElasticSearch password |
vector.esUsername and vector.esPassword are used for Basic authentication. API Key authentication is also supported. When vector.apiKey is filled in, API Key authentication is enabled. For SaaS versions, you need to fill in the encoded value.
Milvus
For Milvus, set vector.type to milvus. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in vector.collectionID.
Pinecone
For Pinecone, set vector.type to pinecone. There are no specific configuration fields. You need to create an Index in advance and fill in the Index access domain in vector.serviceHost.
The Namespace parameter in Pinecone is configured through the plugin's vector.collectionID. If vector.collectionID is not filled in, it defaults to the Default Namespace.
Qdrant
For Qdrant, set vector.type to qdrant. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in vector.collectionID.
Weaviate
For Weaviate, set vector.type to weaviate. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in vector.collectionID.
Note that Weaviate automatically capitalizes the first letter, so when filling in collectionID, the first letter should be capitalized.
If using SaaS, you need to fill in the vector.serviceHost parameter.
Configuration Example
Basic Configuration
embedding:
type: dashscope
serviceName: my_dashscope.dns
apiKey: [Your Key]
vector:
type: dashvector
serviceName: my_dashvector.dns
collectionID: [Your Collection ID]
serviceHost: [Your domain]
apiKey: [Your key]
cache:
type: redis
serviceName: my_redis.dns
servicePort: 6379
timeout: 100
Advanced Usage
The current default cache key is extracted based on the GJSON PATH expression: messages.@reverse.0.content, which means reversing the messages array and taking the content of the first item.
GJSON PATH supports conditional syntax. For example, to get the content of the last role as user as the key, you can write: messages.@reverse.#(role=="user").content;
If you want to concatenate all content with role as user into an array as the key, you can write: messages.@reverse.#(role=="user")#.content;
It also supports pipeline syntax. For example, to get the second-to-last role as user as the key, you can write: messages.@reverse.#(role=="user")#.content|1.
For more usage, please refer to the official documentation. You can use the GJSON Playground for syntax testing.
FAQ
- If the returned error is
error status returned by host: bad argument, please check ifserviceNamecorrectly includes the service type suffix (.dns, etc.).