refactor(ai-cache): update README files to match latest config parsing code (#3730)

This commit is contained in:
Kent Dong
2026-04-20 09:45:39 +08:00
committed by GitHub
parent 65405965b6
commit 8b8a710305
2 changed files with 290 additions and 97 deletions

View File

@@ -1,4 +1,3 @@
## 简介
---
title: AI 缓存
keywords: [higress,ai cache]
@@ -7,9 +6,7 @@ description: AI 缓存插件配置参考
**Note**
> 需要数据面的proxy wasm版本大于等于0.2.100
> 编译时需要带上版本的tag例如`tinygo build -o main.wasm -scheduler=none -target=wasi -gc=custom -tags="custommalloc nottinygc_finalizer proxy_wasm_version_0_2_100" ./`
>
> 若使用 tinygo 编译,则需要数据面的proxy wasm版本大于等于0.2.100且编译时需要带上版本的tag例如`tinygo build -o main.wasm -scheduler=none -target=wasi -gc=custom -tags="custommalloc nottinygc_finalizer proxy_wasm_version_0_2_100" ./`
## 功能说明
@@ -36,16 +33,16 @@ LLM 结果缓存插件,默认配置方式可以直接用于 openai 协议的
| Name | Type | Requirement | Default | Description |
| --- | --- | --- | --- | --- |
| vector | string | optional | "" | 向量存储服务提供者类型,例如 dashvector |
| embedding | string | optional | "" | 请求文本向量化服务类型,例如 dashscope |
| cache | string | optional | "" | 缓存服务类型,例如 redis |
| vector | object | optional | - | 向量存储服务配置,详见下文向量数据库服务配置 |
| embedding | object | optional | - | 文本向量化服务配置,详见下文文本向量化服务配置 |
| cache | object | optional | - | 缓存服务配置,详见下文缓存服务配置 |
| cacheKeyStrategy | string | optional | "lastQuestion" | 决定如何根据历史问题生成缓存键的策略。可选值: "lastQuestion" (使用最后一个问题), "allQuestions" (拼接所有问题) 或 "disabled" (禁用缓存) |
| enableSemanticCache | bool | optional | true | 是否启用语义化缓存, 若不启用则使用字符串匹配的方式来查找缓存此时需要配置cache服务 |
| enableSemanticCache | bool | optional | false | 是否启用语义化缓存若不启用则使用字符串匹配的方式来查找缓存此时需要配置cache服务。当配置了 vector provider 时,默认自动开启 |
根据是否需要启用语义缓存,可以只配置组件的组合为:
1. `cache`: 仅启用字符串匹配缓存
3. `vector (+ embedding)`: 启用语义化缓存, 其中若 `vector` 未提供字符串表征服务,则需要自行配置 `embedding` 服务
2. `vector (+ embedding) + cache`: 启用语义化缓存并用缓存服务存储LLM响应以加速
2. `vector (+ embedding)`: 启用语义化缓存, 其中若 `vector` 未提供字符串表征服务,则需要自行配置 `embedding` 服务
3. `vector (+ embedding) + cache`: 启用语义化缓存并用缓存服务存储LLM响应以加速
注意若不配置相关组件,则可以忽略相应组件的`required`字段。
@@ -53,66 +50,69 @@ LLM 结果缓存插件,默认配置方式可以直接用于 openai 协议的
## 向量数据库服务vector
| Name | Type | Requirement | Default | Description |
| --- | --- | --- | --- | --- |
| vector.type | string | required | "" | 向量存储服务提供者类型,例如 dashvector |
| vector.serviceName | string | required | "" | 向量存储服务名称 |
| vector.serviceHost | string | required | "" | 向量存储服务域名 |
| vector.type | string | required | - | 向量存储服务提供者类型,例如 dashvector、chroma、elasticsearch、weaviate、pinecone、qdrant、milvus |
| vector.serviceName | string | required | - | 向量存储服务名称 |
| vector.serviceHost | string | optional | - | 向量存储服务域名。部分 provider如 dashvector、pinecone要求必填 |
| vector.servicePort | int64 | optional | 443 | 向量存储服务端口 |
| vector.apiKey | string | optional | "" | 向量存储服务 API Key |
| vector.topK | int | optional | 1 | 返回TopK结果,默认为 1 |
| vector.apiKey | string | optional | - | 向量存储服务 API Key |
| vector.topK | int | optional | 1 | 返回TopK结果 |
| vector.timeout | uint32 | optional | 10000 | 请求向量存储服务的超时时间单位为毫秒。默认值是10000即10秒 |
| vector.collectionID | string | optional | "" | 向量存储服务 Collection ID |
| vector.collectionID | string | optional | - | 向量存储服务 Collection ID |
| vector.threshold | float64 | optional | 1000 | 向量相似度度量阈值 |
| vector.thresholdRelation | string | optional | lt | 相似度度量方式有 `Cosine`, `DotProduct`, `Euclidean` 等,前两者值越大相似度越高,后者值越小相似度越高。对于 `Cosine``DotProduct` 选择 `gt`,对于 `Euclidean` 则选择 `lt`默认为 `lt`,所有条件包括 `lt` (less than小于)、`lte` (less than or equal to小等于)、`gt` (greater than大于)、`gte` (greater than or equal to大等于) |
| vector.thresholdRelation | string | optional | "lt" | 相似度度量比较方式。相似度度量方式有 `Cosine`, `DotProduct`, `Euclidean` 等,前两者值越大相似度越高,后者值越小相似度越高。对于 `Cosine``DotProduct` 选择 `gt`,对于 `Euclidean` 则选择 `lt`所有可选值包括 `lt` (less than小于)、`lte` (less than or equal to小等于)、`gt` (greater than大于)、`gte` (greater than or equal to大等于) |
| vector.esUsername | string | optional | - | ElasticSearch 用户名,仅用于 elasticsearch 类型 |
| vector.esPassword | string | optional | - | ElasticSearch 密码,仅用于 elasticsearch 类型 |
## 文本向量化服务embedding
| Name | Type | Requirement | Default | Description |
| --- | --- | --- | --- | --- |
| embedding.type | string | required | "" | 请求文本向量化服务类型,例如 dashscope |
| embedding.serviceName | string | required | "" | 请求文本向量化服务名称 |
| embedding.serviceHost | string | optional | "" | 请求文本向量化服务域名 |
| embedding.servicePort | int64 | optional | 443 | 请求文本向量化服务端口 |
| embedding.apiKey | string | optional | "" | 请求文本向量化服务的 API Key |
| embedding.type | string | required | - | 请求文本向量化服务类型,例如 dashscope、openai、azure、cohere、ollama、huggingface、textin、xfyun |
| embedding.serviceName | string | required | - | 请求文本向量化服务名称 |
| embedding.serviceHost | string | optional | - | 请求文本向量化服务域名 |
| embedding.servicePort | int64 | optional | 443 | 请求文本向量化服务端口。不同 provider 默认值可能不同ollama 默认为 11434 |
| embedding.timeout | uint32 | optional | 10000 | 请求文本向量化服务的超时时间单位为毫秒。默认值是10000即10秒 |
| embedding.model | string | optional | "" | 请求文本向量化服务的模型名称 |
| embedding.model | string | optional | - | 请求文本向量化服务的模型名称 |
| embedding.apiKey | string | optional | - | 请求文本向量化服务的 API Key |
## 缓存服务cache
| cache.type | string | required | "" | 缓存服务类型,例如 redis |
| Name | Type | Requirement | Default | Description |
| --- | --- | --- | --- | --- |
| cache.serviceName | string | required | "" | 缓存服务名称 |
| cache.serviceHost | string | required | "" | 缓存服务名 |
| cache.servicePort | int64 | optional | 6379 | 缓存服务端口 |
| cache.username | string | optional | "" | 缓存服务用户名 |
| cache.password | string | optional | "" | 缓存服务密码 |
| cache.type | string | required | - | 缓存服务类型,例如 redis |
| cache.serviceName | string | required | - | 缓存服务名 |
| cache.serviceHost | string | optional | - | 缓存服务域名 |
| cache.servicePort | int64 | optional | 6379 | 缓存服务端口。若 serviceName 以 .static 结尾,则默认值为 80 |
| cache.username | string | optional | - | 缓存服务用户名 |
| cache.password | string | optional | - | 缓存服务密码 |
| cache.timeout | uint32 | optional | 10000 | 缓存服务的超时时间单位为毫秒。默认值是10000即10秒 |
| cache.cacheTTL | int | optional | 0 | 缓存过期时间,单位为秒。默认值是 0 永不过期|
| cache.cacheKeyPrefix | string | optional | "higress-ai-cache:" | 缓存 Key 的前缀,默认值为 "higress-ai-cache:" |
| cache.cacheTTL | int | optional | 0 | 缓存过期时间,单位为秒。默认值是 0即永不过期 |
| cache.cacheKeyPrefix | string | optional | "higress-ai-cache:" | 缓存 Key 的前缀 |
| cache.database | int | optional | 0 | 使用的数据库id仅限redis例如配置为1对应`SELECT 1` |
## 其他配置
| Name | Type | Requirement | Default | Description |
| Name | Type | Requirement | Default | Description |
| --- | --- | --- | --- | --- |
| cacheKeyFrom | string | optional | "messages.@reverse.0.content" | 从请求 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 |
| cacheValueFrom | string | optional | "choices.0.message.content" | 从响应 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 |
| cacheStreamValueFrom | string | optional | "choices.0.delta.content" | 从流式响应 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 |
| cacheToolCallsFrom | string | optional | "choices.0.delta.content.tool_calls" | 从请求 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 |
| responseTemplate | string | optional | `{"id":"ai-cache.hit","choices":[{"index":0,"message":{"role":"assistant","content":%s},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}` | 返回 HTTP 响应的模版,用 %s 标记需要被 cache value 替换的部分 |
| streamResponseTemplate | string | optional | `data:{"id":"ai-cache.hit","choices":[{"index":0,"delta":{"role":"assistant","content":%s},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n` | 返回流式 HTTP 响应的模版,用 %s 标记需要被 cache value 替换的部分 |
| cacheKeyFrom | string | optional | "messages.@reverse.0.content" | 从请求 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 |
| cacheValueFrom | string | optional | "choices.0.message.content" | 从响应 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 |
| cacheStreamValueFrom | string | optional | "choices.0.delta.content" | 从流式响应 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 |
| cacheToolCallsFrom | string | optional | "choices.0.delta.content.tool_calls" | 从流式响应 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 |
| responseTemplate | string | optional | `{"id":"from-cache","choices":[{"index":0,"message":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}` | 返回 HTTP 响应的模版,用 %s 标记需要被 cache value 替换的部分 |
| streamResponseTemplate | string | optional | `data:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n`| 返回流式 HTTP 响应的模版,用 %s 标记需要被 cache value 替换的部分 |
## 文本向量化提供商特有配置
### Azure OpenAI
Azure OpenAI 所对应的 `embedding.type``azure`。它需要提前创建[Azure OpenAI 账户](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/overview),然后您需要在[Azure AI Foundry](https://ai.azure.com/resource/deployments)中挑选一个模型并将其部署,点击您部署好的模型,您可以在终结点中看到目标 URI 以及密钥。请将 URI 中的 host 填入`embedding.serviceHost`,密钥填入`apiKey`
Azure OpenAI 所对应的 `embedding.type``azure`。它需要提前创建[Azure OpenAI 账户](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/overview),然后您需要在[Azure AI Foundry](https://ai.azure.com/resource/deployments)中挑选一个模型并将其部署,点击您部署好的模型,您可以在终结点中看到目标 URI 以及密钥。请将 URI 中的 host 填入`embedding.serviceHost`,密钥填入`embedding.apiKey`
一个完整的 URI 示例为 https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings?api-version=2024-10-21您需要将`YOUR_RESOURCE_NAME.openai.azure.com`填入`embedding.serviceHost`
它特有的配置字段如下:
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 | 填写值 |
| ---------------------- | -------- | -------- | ------ | ------- | ---------------------------- |
| `embedding.apiVersion` | string | 必填 | - | api版本 | 获取到的URI中api-version的值 |
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
| ---------------------- | -------- | -------- | ------ | ------- |
| `embedding.apiVersion` | string | 必填 | - | api版本获取到的URI中api-version的值 |
需要注意的是您必须要指定`embedding.serviceHost`,如`YOUR_RESOURCE_NAME.openai.azure.com`。模型默认使用了`text-embedding-ada-002`,如需其他模型,请在`embedding.model`中进行指定。
@@ -122,7 +122,7 @@ Cohere 所对应的 `embedding.type` 为 `cohere`。它并无特有的配置字
### OpenAI
OpenAI 所对应的 `embedding.type``openai`。它并无特有的配置字段。需要提前创建 [API Key](https://platform.openai.com/settings/organization/api-keys),并将其填入`embedding.apiKey`,一个 API Key 的示例为` sk-xxxxxxx`
OpenAI 所对应的 `embedding.type``openai`。它并无特有的配置字段。需要提前创建 [API Key](https://platform.openai.com/settings/organization/api-keys),并将其填入`embedding.apiKey`,一个 API Key 的示例为`sk-xxxxxxx`
### Ollama
@@ -130,32 +130,38 @@ Ollama 所对应的 `embedding.type` 为 `ollama`。它并无特有的配置字
### Hugging Face
Hugging Face 所对应的 `embedding.type``huggingface`。它并无特有的配置字段。需要提前创建 [hf_token](https://huggingface.co/blog/getting-started-with-embeddings),并将其填入`embedding.apiKey`,一个 hf_token 的示例为` hf_xxxxxxx`
Hugging Face 所对应的 `embedding.type``huggingface`。它并无特有的配置字段。需要提前创建 [hf_token](https://huggingface.co/blog/getting-started-with-embeddings),并将其填入`embedding.apiKey`,一个 hf_token 的示例为`hf_xxxxxxx`
`embedding.model`默认指定为`sentence-transformers/all-MiniLM-L6-v2`
### Textln
### DashScope
Textln 所对应的 `embedding.type``textln`需要提前获取[`app-id` 和`secret-code`](https://www.textin.com/document/acge_text_embedding)
DashScope 所对应的 `embedding.type``dashscope`。需要提前创建 [API Key](https://help.aliyun.com/document_detail/2712195.html),并将其填入`embedding.apiKey`
`embedding.model`默认指定为`text-embedding-v2`,还可选用`text-embedding-v1`等模型。
### TextIn
TextIn 所对应的 `embedding.type``textin`。它需要提前获取[`app-id` 和`secret-code`](https://www.textin.com/document/acge_text_embedding)。
它特有的配置字段如下:
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 | 填写值 |
| ------------------------------- | -------- | -------- | ------ | -------------------- | ------------------ |
| `embedding.textinAppId` | string | 必填 | - | 应用 ID | 获取的 app-id |
| `embedding.textinSecretCode` | string | 必填 | - | 调用 API 所需 Secret | 获取的 secret-code |
| `embedding.textinMatryoshkaDim` | int | 必填 | - | 返回的单个向量长度 | |
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
| ------------------------------- | -------- | -------- | ------ | ------------------ |
| `embedding.textinAppId` | string | 必填 | - | 应用 ID获取的 app-id |
| `embedding.textinSecretCode` | string | 必填 | - | 调用 API 所需 Secret获取的 secret-code |
| `embedding.textinMatryoshkaDim` | int | 必填 | - | 返回的单个向量长度 |
### 讯飞星火
讯飞星火 所对应的 `embedding.type``xfyun`。它需要提前创建[应用](https://console.xfyun.cn/services/emb),获取`APPID` `APISecret``APIKey`,并将`APIKey`填入`embedding.apiKey`中。
讯飞星火 所对应的 `embedding.type``xfyun`。它需要提前创建[应用](https://console.xfyun.cn/services/emb),获取`APPID``APISecret``APIKey`,并将`APIKey`填入`embedding.apiKey`中。
它特有的配置字段如下:
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 | 填写值 |
| --------------------- | -------- | -------- | ------ | -------------------- | ---------------- |
| `embedding.appId` | string | 必填 | - | 应用 ID | 获取的 APPID |
| `embedding.apiSecret` | string | 必填 | - | 调用 API 所需 Secret | 获取的 APISecret |
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
| --------------------- | -------- | -------- | ------ | -------------------- |
| `embedding.appId` | string | 必填 | - | 应用 ID获取的 APPID |
| `embedding.apiSecret` | string | 必填 | - | 调用 API 所需 Secret获取的 APISecret |
## 向量数据库提供商特有配置
@@ -171,9 +177,9 @@ ElasticSearch 所对应的 `vector.type` 为 `elasticsearch`。需要提前创
当前依赖于 [KNN](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html) 方法,请保证 ES 版本支持 `KNN`,当前已在 `8.16` 版本测试。
它特有的配置字段如下:
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
|-------------------|----------|----------|--------|-------------------------------------------------------------------------------|
| `vector.esUsername` | string | 非必填 | - | ElasticSearch 用户名 |
| `vector.esUsername` | string | 非必填 | - | ElasticSearch 用户名 |
| `vector.esPassword` | string | 非必填 | - | ElasticSearch 密码 |
@@ -191,8 +197,7 @@ Pinecone 中的 `Namespace` 参数通过插件的 `vector.collectionID` 进行
Qdrant 所对应的 `vector.type``qdrant`。它并无特有的配置字段。需要提前创建 Collection并填写 Collection Name 至配置项 `vector.collectionID`
### Weaviate
Weaviate 所对应的 `vector.type``weaviate`。它并无特有的配置字段。
需要提前创建 Collection并填写 Collection Name 至配置项 `vector.collectionID`
Weaviate 所对应的 `vector.type``weaviate`。它并无特有的配置字段。需要提前创建 Collection并填写 Collection Name 至配置项 `vector.collectionID`
需要注意的是 Weaviate 会设置首字母自动大写,在填写配置 `collectionID` 的时候需要将首字母设置为大写。
@@ -210,7 +215,7 @@ vector:
type: dashvector
serviceName: my_dashvector.dns
collectionID: [Your Collection ID]
serviceDomain: [Your domain]
serviceHost: [Your domain]
apiKey: [Your key]
cache:
@@ -221,15 +226,6 @@ cache:
```
旧版本配置兼容
```yaml
redis:
serviceName: my_redis.dns
servicePort: 6379
timeout: 100
database: 1
```
## 进阶用法
当前默认的缓存 key 是基于 GJSON PATH 的表达式:`messages.@reverse.0.content` 提取,含义是把 messages 数组反转后取第一项的 content

View File

@@ -3,46 +3,243 @@ title: AI Cache
keywords: [higress,ai cache]
description: AI Cache Plugin Configuration Reference
---
## Function Description
LLM result caching plugin, the default configuration can be directly used for result caching under the OpenAI protocol, and it supports caching of both streaming and non-streaming responses.
LLM result caching plugin. The default configuration can be directly used for OpenAI protocol result caching, and supports caching of both streaming and non-streaming responses.
**Tips**
When carrying the request header `x-higress-skip-ai-cache: on`, the current request will not use content from the cache but will be directly forwarded to the backend service. Additionally, the response content from this request will not be cached.
## Runtime Properties
Plugin Execution Phase: `Authentication Phase`
Plugin Execution Priority: `10`
## Configuration Description
| Name | Type | Requirement | Default | Description |
| -------- | -------- | -------- | -------- | -------- |
| cacheKeyFrom.requestBody | string | optional | "messages.@reverse.0.content" | Extracts a string from the request Body based on [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
| cacheValueFrom.responseBody | string | optional | "choices.0.message.content" | Extracts a string from the response Body based on [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
| cacheStreamValueFrom.responseBody | string | optional | "choices.0.delta.content" | Extracts a string from the streaming response Body based on [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
| cacheKeyPrefix | string | optional | "higress-ai-cache:" | Prefix for the Redis cache key |
| cacheTTL | integer | optional | 0 | Cache expiration time in seconds, default value is 0, which means never expire |
| redis.serviceName | string | required | - | The complete FQDN name of the Redis service, including the service type, e.g., my-redis.dns, redis.my-ns.svc.cluster.local |
| redis.servicePort | integer | optional | 6379 | Redis service port |
| redis.timeout | integer | optional | 1000 | Timeout for requests to Redis, in milliseconds |
| redis.username | string | optional | - | Username for logging into Redis |
| redis.database | int | optional | 0 | The database ID used, limited to Redis, for example, configured as 1, corresponds to `SELECT 1`. |
| redis.password | string | optional | - | Password for logging into Redis |
| returnResponseTemplate | string | optional | `{"id":"from-cache","choices":[%s],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}` | Template for returning HTTP response, with %s marking the part to be replaced by cache value |
| returnStreamResponseTemplate | string | optional | `data:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n` | Template for returning streaming HTTP response, with %s marking the part to be replaced by cache value |
The configuration is divided into 3 parts: Vector Database (vector), Text Embedding Service (embedding), and Cache Service (cache). It also provides fine-grained LLM request/response extraction parameter configurations.
This plugin supports both vector database-based semantic caching and string matching-based caching methods. If both vector database and cache database are configured, the cache database is used first, and the vector database capability is used when cache misses occur.
*Note*: Vector database (vector) and cache database (cache) cannot both be empty, otherwise this plugin cannot provide caching services.
| Name | Type | Requirement | Default | Description |
| --- | --- | --- | --- | --- |
| vector | object | optional | - | Vector storage service configuration, see Vector Database Service section below |
| embedding | object | optional | - | Text embedding service configuration, see Text Embedding Service section below |
| cache | object | optional | - | Cache service configuration, see Cache Service section below |
| cacheKeyStrategy | string | optional | "lastQuestion" | Strategy for generating cache key from historical questions. Options: "lastQuestion" (use last question), "allQuestions" (concatenate all questions), or "disabled" (disable caching) |
| enableSemanticCache | bool | optional | false | Whether to enable semantic caching. If disabled, string matching is used to find cache, requiring cache service configuration. Automatically enabled when a vector provider is configured |
Depending on whether semantic caching is needed, you can configure component combinations as follows:
1. `cache`: Enable string matching cache only
2. `vector (+ embedding)`: Enable semantic caching. If `vector` does not provide string representation service, you need to configure `embedding` service separately
3. `vector (+ embedding) + cache`: Enable semantic caching and use cache service to store LLM responses for acceleration
If you do not configure a related component, you can ignore the `required` fields of that component.
## Vector Database Service (vector)
| Name | Type | Requirement | Default | Description |
| --- | --- | --- | --- | --- |
| vector.type | string | required | - | Vector storage service provider type, e.g., dashvector, chroma, elasticsearch, weaviate, pinecone, qdrant, milvus |
| vector.serviceName | string | required | - | Vector storage service name |
| vector.serviceHost | string | optional | - | Vector storage service domain. Required for some providers (e.g., dashvector, pinecone) |
| vector.servicePort | int64 | optional | 443 | Vector storage service port |
| vector.apiKey | string | optional | - | Vector storage service API Key |
| vector.topK | int | optional | 1 | Return TopK results |
| vector.timeout | uint32 | optional | 10000 | Timeout for requesting vector storage service, in milliseconds. Default is 10000 (10 seconds) |
| vector.collectionID | string | optional | - | Vector storage service Collection ID |
| vector.threshold | float64 | optional | 1000 | Vector similarity measurement threshold |
| vector.thresholdRelation | string | optional | "lt" | Similarity measurement comparison method. Similarity measurement methods include `Cosine`, `DotProduct`, `Euclidean`, etc. The first two have higher similarity with larger values, while the latter has higher similarity with smaller values. Use `gt` for `Cosine` and `DotProduct`, and `lt` for `Euclidean`. All options include `lt` (less than), `lte` (less than or equal to), `gt` (greater than), `gte` (greater than or equal to) |
| vector.esUsername | string | optional | - | ElasticSearch username, only for elasticsearch type |
| vector.esPassword | string | optional | - | ElasticSearch password, only for elasticsearch type |
## Text Embedding Service (embedding)
| Name | Type | Requirement | Default | Description |
| --- | --- | --- | --- | --- |
| embedding.type | string | required | - | Text embedding service type, e.g., dashscope, openai, azure, cohere, ollama, huggingface, textin, xfyun |
| embedding.serviceName | string | required | - | Text embedding service name |
| embedding.serviceHost | string | optional | - | Text embedding service domain |
| embedding.servicePort | int64 | optional | 443 | Text embedding service port. Default varies by provider; ollama defaults to 11434 |
| embedding.timeout | uint32 | optional | 10000 | Timeout for requesting text embedding service, in milliseconds. Default is 10000 (10 seconds) |
| embedding.model | string | optional | - | Model name for text embedding service |
| embedding.apiKey | string | optional | - | API Key for text embedding service |
## Cache Service (cache)
| Name | Type | Requirement | Default | Description |
| --- | --- | --- | --- | --- |
| cache.type | string | required | - | Cache service type, e.g., redis |
| cache.serviceName | string | required | - | Cache service name |
| cache.serviceHost | string | optional | - | Cache service domain |
| cache.servicePort | int64 | optional | 6379 | Cache service port. If serviceName ends with .static, default is 80 |
| cache.username | string | optional | - | Cache service username |
| cache.password | string | optional | - | Cache service password |
| cache.timeout | uint32 | optional | 10000 | Cache service timeout, in milliseconds. Default is 10000 (10 seconds) |
| cache.cacheTTL | int | optional | 0 | Cache expiration time, in seconds. Default is 0 (never expire) |
| cache.cacheKeyPrefix | string | optional | "higress-ai-cache:" | Prefix for cache keys |
| cache.database | int | optional | 0 | Database ID to use, only for Redis. For example, configure as 1 for `SELECT 1` |
## Other Configurations
| Name | Type | Requirement | Default | Description |
| --- | --- | --- | --- | --- |
| cacheKeyFrom | string | optional | "messages.@reverse.0.content" | Extract string from request Body using [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
| cacheValueFrom | string | optional | "choices.0.message.content" | Extract string from response Body using [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
| cacheStreamValueFrom | string | optional | "choices.0.delta.content" | Extract string from streaming response Body using [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
| cacheToolCallsFrom | string | optional | "choices.0.delta.content.tool_calls" | Extract string from streaming response Body using [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax |
| responseTemplate | string | optional | `{"id":"from-cache","choices":[{"index":0,"message":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}` | Template for returning HTTP response, with %s marking the part to be replaced by cache value |
| streamResponseTemplate | string | optional | `data:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n` | Template for returning streaming HTTP response, with %s marking the part to be replaced by cache value |
## Text Embedding Provider Specific Configurations
### Azure OpenAI
For Azure OpenAI, set `embedding.type` to `azure`. You need to first create an [Azure OpenAI account](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/overview), then select and deploy a model in [Azure AI Foundry](https://ai.azure.com/resource/deployments). Click on your deployed model to see the target URI and key in the endpoint section. Please enter the host from the URI in `embedding.serviceHost` and the key in `embedding.apiKey`.
A complete URI example is https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings?api-version=2024-10-21. You need to enter `YOUR_RESOURCE_NAME.openai.azure.com` in `embedding.serviceHost`.
Specific configuration fields:
| Name | Data Type | Requirement | Default | Description |
| ---------------------- | -------- | -------- | ------ | ------- |
| `embedding.apiVersion` | string | required | - | API version, the api-version value from the obtained URI |
Note that you must specify `embedding.serviceHost`, such as `YOUR_RESOURCE_NAME.openai.azure.com`. The default model is `text-embedding-ada-002`. For other models, specify in `embedding.model`.
### Cohere
For Cohere, set `embedding.type` to `cohere`. There are no specific configuration fields. You need to create an [API Key](https://docs.cohere.com/reference/embed) and enter it in `embedding.apiKey`.
### OpenAI
For OpenAI, set `embedding.type` to `openai`. There are no specific configuration fields. You need to create an [API Key](https://platform.openai.com/settings/organization/api-keys) and enter it in `embedding.apiKey`. An API Key example is `sk-xxxxxxx`.
### Ollama
For Ollama, set `embedding.type` to `ollama`. There are no specific configuration fields.
### Hugging Face
For Hugging Face, set `embedding.type` to `huggingface`. There are no specific configuration fields. You need to create an [hf_token](https://huggingface.co/blog/getting-started-with-embeddings) and enter it in `embedding.apiKey`. An hf_token example is `hf_xxxxxxx`.
`embedding.model` defaults to `sentence-transformers/all-MiniLM-L6-v2`.
### DashScope
For DashScope, set `embedding.type` to `dashscope`. You need to create an [API Key](https://help.aliyun.com/document_detail/2712195.html) and enter it in `embedding.apiKey`.
`embedding.model` defaults to `text-embedding-v2`. Other models like `text-embedding-v1` can also be used.
### TextIn
For TextIn, set `embedding.type` to `textin`. You need to first obtain [`app-id` and `secret-code`](https://www.textin.com/document/acge_text_embedding).
Specific configuration fields:
| Name | Data Type | Requirement | Default | Description |
| ------------------------------- | -------- | -------- | ------ | ------------------ |
| `embedding.textinAppId` | string | required | - | Application ID, obtained app-id |
| `embedding.textinSecretCode` | string | required | - | Secret for calling API, obtained secret-code |
| `embedding.textinMatryoshkaDim` | int | required | - | Dimension of returned single vector |
### Xfyun (讯飞星火)
For Xfyun, set `embedding.type` to `xfyun`. You need to first create an [application](https://console.xfyun.cn/services/emb) to obtain `APPID`, `APISecret`, and `APIKey`, and enter `APIKey` in `embedding.apiKey`.
Specific configuration fields:
| Name | Data Type | Requirement | Default | Description |
| --------------------- | -------- | -------- | ------ | -------------------- |
| `embedding.appId` | string | required | - | Application ID, obtained APPID |
| `embedding.apiSecret` | string | required | - | Secret for calling API, obtained APISecret |
## Vector Database Provider Specific Configurations
### Chroma
For Chroma, set `vector.type` to `chroma`. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection ID in `vector.collectionID`. A Collection ID example is `52bbb8b3-724c-477b-a4ce-d5b578214612`.
### DashVector
For DashVector, set `vector.type` to `dashvector`. There are no specific configuration fields. You need to create a Collection in advance and fill in the `Collection Name` in `vector.collectionID`.
### ElasticSearch
For ElasticSearch, set `vector.type` to `elasticsearch`. You need to create an Index in advance and fill in the Index Name in `vector.collectionID`.
It currently relies on the [KNN](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html) method. Please ensure your ES version supports `KNN`. It has been tested on version `8.16`.
Specific configuration fields:
| Name | Data Type | Requirement | Default | Description |
|-------------------|----------|----------|--------|-------------------------------------------------------------------------------|
| `vector.esUsername` | string | optional | - | ElasticSearch username |
| `vector.esPassword` | string | optional | - | ElasticSearch password |
`vector.esUsername` and `vector.esPassword` are used for Basic authentication. API Key authentication is also supported. When `vector.apiKey` is filled in, API Key authentication is enabled. For SaaS versions, you need to fill in the `encoded` value.
### Milvus
For Milvus, set `vector.type` to `milvus`. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in `vector.collectionID`.
### Pinecone
For Pinecone, set `vector.type` to `pinecone`. There are no specific configuration fields. You need to create an Index in advance and fill in the Index access domain in `vector.serviceHost`.
The `Namespace` parameter in Pinecone is configured through the plugin's `vector.collectionID`. If `vector.collectionID` is not filled in, it defaults to the Default Namespace.
### Qdrant
For Qdrant, set `vector.type` to `qdrant`. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in `vector.collectionID`.
### Weaviate
For Weaviate, set `vector.type` to `weaviate`. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in `vector.collectionID`.
Note that Weaviate automatically capitalizes the first letter, so when filling in `collectionID`, the first letter should be capitalized.
If using SaaS, you need to fill in the `vector.serviceHost` parameter.
## Configuration Example
```yaml
redis:
serviceName: my-redis.dns
timeout: 2000
### Basic Configuration
```yaml
embedding:
type: dashscope
serviceName: my_dashscope.dns
apiKey: [Your Key]
vector:
type: dashvector
serviceName: my_dashvector.dns
collectionID: [Your Collection ID]
serviceHost: [Your domain]
apiKey: [Your key]
cache:
type: redis
serviceName: my_redis.dns
servicePort: 6379
database: 1
```
timeout: 100
```
## Advanced Usage
The current default cache key is based on the GJSON PATH expression: `messages.@reverse.0.content`, meaning to get the content of the first item after reversing the messages array;
GJSON PATH supports conditional syntax, for instance, if you want to take the content of the last role as user as the key, it can be written as: `messages.@reverse.#(role=="user").content`;
If you want to concatenate all the content with role as user into an array as the key, it can be written as: `messages.@reverse.#(role=="user")#.content`;
It also supports pipeline syntax, for example, if you want to take the second role as user as the key, it can be written as: `messages.@reverse.#(role=="user")#.content|1`.
For more usage, you can refer to the [official documentation](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) and use the [GJSON Playground](https://gjson.dev/) for syntax testing.
The current default cache key is extracted based on the GJSON PATH expression: `messages.@reverse.0.content`, which means reversing the messages array and taking the content of the first item.
GJSON PATH supports conditional syntax. For example, to get the content of the last role as user as the key, you can write: `messages.@reverse.#(role=="user").content`;
If you want to concatenate all content with role as user into an array as the key, you can write: `messages.@reverse.#(role=="user")#.content`;
It also supports pipeline syntax. For example, to get the second-to-last role as user as the key, you can write: `messages.@reverse.#(role=="user")#.content|1`.
For more usage, please refer to the [official documentation](https://github.com/tidwall/gjson/blob/master/SYNTAX.md). You can use the [GJSON Playground](https://gjson.dev/) for syntax testing.
## FAQ
1. If the returned error is `error status returned by host: bad argument`, please check if `serviceName` correctly includes the service type suffix (.dns, etc.).