Files
higress/plugins/wasm-go/extensions/ai-cache

title, keywords, description
title keywords description
AI Cache
higress
ai cache
AI Cache Plugin Configuration Reference

Function Description

LLM result caching plugin. The default configuration can be directly used for OpenAI protocol result caching, and supports caching of both streaming and non-streaming responses.

Tips

When carrying the request header x-higress-skip-ai-cache: on, the current request will not use content from the cache but will be directly forwarded to the backend service. Additionally, the response content from this request will not be cached.

Runtime Properties

Plugin Execution Phase: Authentication Phase Plugin Execution Priority: 10

Configuration Description

The configuration is divided into 3 parts: Vector Database (vector), Text Embedding Service (embedding), and Cache Service (cache). It also provides fine-grained LLM request/response extraction parameter configurations.

This plugin supports both vector database-based semantic caching and string matching-based caching methods. If both vector database and cache database are configured, the cache database is used first, and the vector database capability is used when cache misses occur.

Note: Vector database (vector) and cache database (cache) cannot both be empty, otherwise this plugin cannot provide caching services.

Name Type Requirement Default Description
vector object optional - Vector storage service configuration, see Vector Database Service section below
embedding object optional - Text embedding service configuration, see Text Embedding Service section below
cache object optional - Cache service configuration, see Cache Service section below
cacheKeyStrategy string optional "lastQuestion" Strategy for generating cache key from historical questions. Options: "lastQuestion" (use last question), "allQuestions" (concatenate all questions), or "disabled" (disable caching)
enableSemanticCache bool optional false Whether to enable semantic caching. If disabled, string matching is used to find cache, requiring cache service configuration. Automatically enabled when a vector provider is configured

Depending on whether semantic caching is needed, you can configure component combinations as follows:

  1. cache: Enable string matching cache only
  2. vector (+ embedding): Enable semantic caching. If vector does not provide string representation service, you need to configure embedding service separately
  3. vector (+ embedding) + cache: Enable semantic caching and use cache service to store LLM responses for acceleration

If you do not configure a related component, you can ignore the required fields of that component.

Vector Database Service (vector)

Name Type Requirement Default Description
vector.type string required - Vector storage service provider type, e.g., dashvector, chroma, elasticsearch, weaviate, pinecone, qdrant, milvus
vector.serviceName string required - Vector storage service name
vector.serviceHost string optional - Vector storage service domain. Required for some providers (e.g., dashvector, pinecone)
vector.servicePort int64 optional 443 Vector storage service port
vector.apiKey string optional - Vector storage service API Key
vector.topK int optional 1 Return TopK results
vector.timeout uint32 optional 10000 Timeout for requesting vector storage service, in milliseconds. Default is 10000 (10 seconds)
vector.collectionID string optional - Vector storage service Collection ID
vector.threshold float64 optional 1000 Vector similarity measurement threshold
vector.thresholdRelation string optional "lt" Similarity measurement comparison method. Similarity measurement methods include Cosine, DotProduct, Euclidean, etc. The first two have higher similarity with larger values, while the latter has higher similarity with smaller values. Use gt for Cosine and DotProduct, and lt for Euclidean. All options include lt (less than), lte (less than or equal to), gt (greater than), gte (greater than or equal to)
vector.esUsername string optional - ElasticSearch username, only for elasticsearch type
vector.esPassword string optional - ElasticSearch password, only for elasticsearch type

Text Embedding Service (embedding)

Name Type Requirement Default Description
embedding.type string required - Text embedding service type, e.g., dashscope, openai, azure, cohere, ollama, huggingface, textin, xfyun
embedding.serviceName string required - Text embedding service name
embedding.serviceHost string optional - Text embedding service domain
embedding.servicePort int64 optional 443 Text embedding service port. Default varies by provider; ollama defaults to 11434
embedding.timeout uint32 optional 10000 Timeout for requesting text embedding service, in milliseconds. Default is 10000 (10 seconds)
embedding.model string optional - Model name for text embedding service
embedding.apiKey string optional - API Key for text embedding service

Cache Service (cache)

Name Type Requirement Default Description
cache.type string required - Cache service type, e.g., redis
cache.serviceName string required - Cache service name
cache.serviceHost string optional - Cache service domain
cache.servicePort int64 optional 6379 Cache service port. If serviceName ends with .static, default is 80
cache.username string optional - Cache service username
cache.password string optional - Cache service password
cache.timeout uint32 optional 10000 Cache service timeout, in milliseconds. Default is 10000 (10 seconds)
cache.cacheTTL int optional 0 Cache expiration time, in seconds. Default is 0 (never expire)
cache.cacheKeyPrefix string optional "higress-ai-cache:" Prefix for cache keys
cache.database int optional 0 Database ID to use, only for Redis. For example, configure as 1 for SELECT 1

Other Configurations

Name Type Requirement Default Description
cacheKeyFrom string optional "messages.@reverse.0.content" Extract string from request Body using GJSON PATH syntax
cacheValueFrom string optional "choices.0.message.content" Extract string from response Body using GJSON PATH syntax
cacheStreamValueFrom string optional "choices.0.delta.content" Extract string from streaming response Body using GJSON PATH syntax
cacheToolCallsFrom string optional "choices.0.delta.content.tool_calls" Extract string from streaming response Body using GJSON PATH syntax
responseTemplate string optional {"id":"from-cache","choices":[{"index":0,"message":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}} Template for returning HTTP response, with %s marking the part to be replaced by cache value
streamResponseTemplate string optional data:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n Template for returning streaming HTTP response, with %s marking the part to be replaced by cache value

Text Embedding Provider Specific Configurations

Azure OpenAI

For Azure OpenAI, set embedding.type to azure. You need to first create an Azure OpenAI account, then select and deploy a model in Azure AI Foundry. Click on your deployed model to see the target URI and key in the endpoint section. Please enter the host from the URI in embedding.serviceHost and the key in embedding.apiKey.

A complete URI example is https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings?api-version=2024-10-21. You need to enter YOUR_RESOURCE_NAME.openai.azure.com in embedding.serviceHost.

Specific configuration fields:

Name Data Type Requirement Default Description
embedding.apiVersion string required - API version, the api-version value from the obtained URI

Note that you must specify embedding.serviceHost, such as YOUR_RESOURCE_NAME.openai.azure.com. The default model is text-embedding-ada-002. For other models, specify in embedding.model.

Cohere

For Cohere, set embedding.type to cohere. There are no specific configuration fields. You need to create an API Key and enter it in embedding.apiKey.

OpenAI

For OpenAI, set embedding.type to openai. There are no specific configuration fields. You need to create an API Key and enter it in embedding.apiKey. An API Key example is sk-xxxxxxx.

Ollama

For Ollama, set embedding.type to ollama. There are no specific configuration fields.

Hugging Face

For Hugging Face, set embedding.type to huggingface. There are no specific configuration fields. You need to create an hf_token and enter it in embedding.apiKey. An hf_token example is hf_xxxxxxx.

embedding.model defaults to sentence-transformers/all-MiniLM-L6-v2.

DashScope

For DashScope, set embedding.type to dashscope. You need to create an API Key and enter it in embedding.apiKey.

embedding.model defaults to text-embedding-v2. Other models like text-embedding-v1 can also be used.

TextIn

For TextIn, set embedding.type to textin. You need to first obtain app-id and secret-code.

Specific configuration fields:

Name Data Type Requirement Default Description
embedding.textinAppId string required - Application ID, obtained app-id
embedding.textinSecretCode string required - Secret for calling API, obtained secret-code
embedding.textinMatryoshkaDim int required - Dimension of returned single vector

Xfyun (讯飞星火)

For Xfyun, set embedding.type to xfyun. You need to first create an application to obtain APPID, APISecret, and APIKey, and enter APIKey in embedding.apiKey.

Specific configuration fields:

Name Data Type Requirement Default Description
embedding.appId string required - Application ID, obtained APPID
embedding.apiSecret string required - Secret for calling API, obtained APISecret

Vector Database Provider Specific Configurations

Chroma

For Chroma, set vector.type to chroma. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection ID in vector.collectionID. A Collection ID example is 52bbb8b3-724c-477b-a4ce-d5b578214612.

DashVector

For DashVector, set vector.type to dashvector. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in vector.collectionID.

ElasticSearch

For ElasticSearch, set vector.type to elasticsearch. You need to create an Index in advance and fill in the Index Name in vector.collectionID.

It currently relies on the KNN method. Please ensure your ES version supports KNN. It has been tested on version 8.16.

Specific configuration fields:

Name Data Type Requirement Default Description
vector.esUsername string optional - ElasticSearch username
vector.esPassword string optional - ElasticSearch password

vector.esUsername and vector.esPassword are used for Basic authentication. API Key authentication is also supported. When vector.apiKey is filled in, API Key authentication is enabled. For SaaS versions, you need to fill in the encoded value.

Milvus

For Milvus, set vector.type to milvus. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in vector.collectionID.

Pinecone

For Pinecone, set vector.type to pinecone. There are no specific configuration fields. You need to create an Index in advance and fill in the Index access domain in vector.serviceHost.

The Namespace parameter in Pinecone is configured through the plugin's vector.collectionID. If vector.collectionID is not filled in, it defaults to the Default Namespace.

Qdrant

For Qdrant, set vector.type to qdrant. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in vector.collectionID.

Weaviate

For Weaviate, set vector.type to weaviate. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in vector.collectionID.

Note that Weaviate automatically capitalizes the first letter, so when filling in collectionID, the first letter should be capitalized.

If using SaaS, you need to fill in the vector.serviceHost parameter.

Configuration Example

Basic Configuration

embedding:
  type: dashscope
  serviceName: my_dashscope.dns
  apiKey: [Your Key]

vector:
  type: dashvector
  serviceName: my_dashvector.dns
  collectionID: [Your Collection ID]
  serviceHost: [Your domain]
  apiKey: [Your key]

cache:
  type: redis
  serviceName: my_redis.dns
  servicePort: 6379
  timeout: 100

Advanced Usage

The current default cache key is extracted based on the GJSON PATH expression: messages.@reverse.0.content, which means reversing the messages array and taking the content of the first item.

GJSON PATH supports conditional syntax. For example, to get the content of the last role as user as the key, you can write: messages.@reverse.#(role=="user").content;

If you want to concatenate all content with role as user into an array as the key, you can write: messages.@reverse.#(role=="user")#.content;

It also supports pipeline syntax. For example, to get the second-to-last role as user as the key, you can write: messages.@reverse.#(role=="user")#.content|1.

For more usage, please refer to the official documentation. You can use the GJSON Playground for syntax testing.

FAQ

  1. If the returned error is error status returned by host: bad argument, please check if serviceName correctly includes the service type suffix (.dns, etc.).