diff --git a/plugins/wasm-go/extensions/ai-cache/README.md b/plugins/wasm-go/extensions/ai-cache/README.md index 25b10ad9a..a340d1c15 100644 --- a/plugins/wasm-go/extensions/ai-cache/README.md +++ b/plugins/wasm-go/extensions/ai-cache/README.md @@ -1,4 +1,3 @@ -## 简介 --- title: AI 缓存 keywords: [higress,ai cache] @@ -7,9 +6,7 @@ description: AI 缓存插件配置参考 **Note** -> 需要数据面的proxy wasm版本大于等于0.2.100 -> 编译时,需要带上版本的tag,例如:`tinygo build -o main.wasm -scheduler=none -target=wasi -gc=custom -tags="custommalloc nottinygc_finalizer proxy_wasm_version_0_2_100" ./` -> +> 若使用 tinygo 编译,则需要数据面的proxy wasm版本大于等于0.2.100,且编译时需要带上版本的tag,例如:`tinygo build -o main.wasm -scheduler=none -target=wasi -gc=custom -tags="custommalloc nottinygc_finalizer proxy_wasm_version_0_2_100" ./` ## 功能说明 @@ -36,16 +33,16 @@ LLM 结果缓存插件,默认配置方式可以直接用于 openai 协议的 | Name | Type | Requirement | Default | Description | | --- | --- | --- | --- | --- | -| vector | string | optional | "" | 向量存储服务提供者类型,例如 dashvector | -| embedding | string | optional | "" | 请求文本向量化服务类型,例如 dashscope | -| cache | string | optional | "" | 缓存服务类型,例如 redis | +| vector | object | optional | - | 向量存储服务配置,详见下文向量数据库服务配置 | +| embedding | object | optional | - | 文本向量化服务配置,详见下文文本向量化服务配置 | +| cache | object | optional | - | 缓存服务配置,详见下文缓存服务配置 | | cacheKeyStrategy | string | optional | "lastQuestion" | 决定如何根据历史问题生成缓存键的策略。可选值: "lastQuestion" (使用最后一个问题), "allQuestions" (拼接所有问题) 或 "disabled" (禁用缓存) | -| enableSemanticCache | bool | optional | true | 是否启用语义化缓存, 若不启用,则使用字符串匹配的方式来查找缓存,此时需要配置cache服务 | +| enableSemanticCache | bool | optional | false | 是否启用语义化缓存。若不启用,则使用字符串匹配的方式来查找缓存,此时需要配置cache服务。当配置了 vector provider 时,默认自动开启 | 根据是否需要启用语义缓存,可以只配置组件的组合为: 1. `cache`: 仅启用字符串匹配缓存 -3. `vector (+ embedding)`: 启用语义化缓存, 其中若 `vector` 未提供字符串表征服务,则需要自行配置 `embedding` 服务 -2. `vector (+ embedding) + cache`: 启用语义化缓存并用缓存服务存储LLM响应以加速 +2. `vector (+ embedding)`: 启用语义化缓存, 其中若 `vector` 未提供字符串表征服务,则需要自行配置 `embedding` 服务 +3. `vector (+ embedding) + cache`: 启用语义化缓存并用缓存服务存储LLM响应以加速 注意若不配置相关组件,则可以忽略相应组件的`required`字段。 @@ -53,66 +50,69 @@ LLM 结果缓存插件,默认配置方式可以直接用于 openai 协议的 ## 向量数据库服务(vector) | Name | Type | Requirement | Default | Description | | --- | --- | --- | --- | --- | -| vector.type | string | required | "" | 向量存储服务提供者类型,例如 dashvector | -| vector.serviceName | string | required | "" | 向量存储服务名称 | -| vector.serviceHost | string | required | "" | 向量存储服务域名 | +| vector.type | string | required | - | 向量存储服务提供者类型,例如 dashvector、chroma、elasticsearch、weaviate、pinecone、qdrant、milvus | +| vector.serviceName | string | required | - | 向量存储服务名称 | +| vector.serviceHost | string | optional | - | 向量存储服务域名。部分 provider(如 dashvector、pinecone)要求必填 | | vector.servicePort | int64 | optional | 443 | 向量存储服务端口 | -| vector.apiKey | string | optional | "" | 向量存储服务 API Key | -| vector.topK | int | optional | 1 | 返回TopK结果,默认为 1 | +| vector.apiKey | string | optional | - | 向量存储服务 API Key | +| vector.topK | int | optional | 1 | 返回TopK结果 | | vector.timeout | uint32 | optional | 10000 | 请求向量存储服务的超时时间,单位为毫秒。默认值是10000,即10秒 | -| vector.collectionID | string | optional | "" | 向量存储服务 Collection ID | +| vector.collectionID | string | optional | - | 向量存储服务 Collection ID | | vector.threshold | float64 | optional | 1000 | 向量相似度度量阈值 | -| vector.thresholdRelation | string | optional | lt | 相似度度量方式有 `Cosine`, `DotProduct`, `Euclidean` 等,前两者值越大相似度越高,后者值越小相似度越高。对于 `Cosine` 和 `DotProduct` 选择 `gt`,对于 `Euclidean` 则选择 `lt`。默认为 `lt`,所有条件包括 `lt` (less than,小于)、`lte` (less than or equal to,小等于)、`gt` (greater than,大于)、`gte` (greater than or equal to,大等于) | +| vector.thresholdRelation | string | optional | "lt" | 相似度度量比较方式。相似度度量方式有 `Cosine`, `DotProduct`, `Euclidean` 等,前两者值越大相似度越高,后者值越小相似度越高。对于 `Cosine` 和 `DotProduct` 选择 `gt`,对于 `Euclidean` 则选择 `lt`。所有可选值包括 `lt` (less than,小于)、`lte` (less than or equal to,小等于)、`gt` (greater than,大于)、`gte` (greater than or equal to,大等于) | +| vector.esUsername | string | optional | - | ElasticSearch 用户名,仅用于 elasticsearch 类型 | +| vector.esPassword | string | optional | - | ElasticSearch 密码,仅用于 elasticsearch 类型 | ## 文本向量化服务(embedding) | Name | Type | Requirement | Default | Description | | --- | --- | --- | --- | --- | -| embedding.type | string | required | "" | 请求文本向量化服务类型,例如 dashscope | -| embedding.serviceName | string | required | "" | 请求文本向量化服务名称 | -| embedding.serviceHost | string | optional | "" | 请求文本向量化服务域名 | -| embedding.servicePort | int64 | optional | 443 | 请求文本向量化服务端口 | -| embedding.apiKey | string | optional | "" | 请求文本向量化服务的 API Key | +| embedding.type | string | required | - | 请求文本向量化服务类型,例如 dashscope、openai、azure、cohere、ollama、huggingface、textin、xfyun | +| embedding.serviceName | string | required | - | 请求文本向量化服务名称 | +| embedding.serviceHost | string | optional | - | 请求文本向量化服务域名 | +| embedding.servicePort | int64 | optional | 443 | 请求文本向量化服务端口。不同 provider 默认值可能不同,ollama 默认为 11434 | | embedding.timeout | uint32 | optional | 10000 | 请求文本向量化服务的超时时间,单位为毫秒。默认值是10000,即10秒 | -| embedding.model | string | optional | "" | 请求文本向量化服务的模型名称 | +| embedding.model | string | optional | - | 请求文本向量化服务的模型名称 | +| embedding.apiKey | string | optional | - | 请求文本向量化服务的 API Key | ## 缓存服务(cache) -| cache.type | string | required | "" | 缓存服务类型,例如 redis | +| Name | Type | Requirement | Default | Description | | --- | --- | --- | --- | --- | -| cache.serviceName | string | required | "" | 缓存服务名称 | -| cache.serviceHost | string | required | "" | 缓存服务域名 | -| cache.servicePort | int64 | optional | 6379 | 缓存服务端口 | -| cache.username | string | optional | "" | 缓存服务用户名 | -| cache.password | string | optional | "" | 缓存服务密码 | +| cache.type | string | required | - | 缓存服务类型,例如 redis | +| cache.serviceName | string | required | - | 缓存服务名称 | +| cache.serviceHost | string | optional | - | 缓存服务域名 | +| cache.servicePort | int64 | optional | 6379 | 缓存服务端口。若 serviceName 以 .static 结尾,则默认值为 80 | +| cache.username | string | optional | - | 缓存服务用户名 | +| cache.password | string | optional | - | 缓存服务密码 | | cache.timeout | uint32 | optional | 10000 | 缓存服务的超时时间,单位为毫秒。默认值是10000,即10秒 | -| cache.cacheTTL | int | optional | 0 | 缓存过期时间,单位为秒。默认值是 0,即 永不过期| -| cache.cacheKeyPrefix | string | optional | "higress-ai-cache:" | 缓存 Key 的前缀,默认值为 "higress-ai-cache:" | +| cache.cacheTTL | int | optional | 0 | 缓存过期时间,单位为秒。默认值是 0,即永不过期 | +| cache.cacheKeyPrefix | string | optional | "higress-ai-cache:" | 缓存 Key 的前缀 | | cache.database | int | optional | 0 | 使用的数据库id,仅限redis,例如配置为1,对应`SELECT 1` | ## 其他配置 -| Name | Type | Requirement | Default | Description | +| Name | Type | Requirement | Default | Description | | --- | --- | --- | --- | --- | -| cacheKeyFrom | string | optional | "messages.@reverse.0.content" | 从请求 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 | -| cacheValueFrom | string | optional | "choices.0.message.content" | 从响应 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 | -| cacheStreamValueFrom | string | optional | "choices.0.delta.content" | 从流式响应 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 | -| cacheToolCallsFrom | string | optional | "choices.0.delta.content.tool_calls" | 从请求 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 | -| responseTemplate | string | optional | `{"id":"ai-cache.hit","choices":[{"index":0,"message":{"role":"assistant","content":%s},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}` | 返回 HTTP 响应的模版,用 %s 标记需要被 cache value 替换的部分 | -| streamResponseTemplate | string | optional | `data:{"id":"ai-cache.hit","choices":[{"index":0,"delta":{"role":"assistant","content":%s},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n` | 返回流式 HTTP 响应的模版,用 %s 标记需要被 cache value 替换的部分 | +| cacheKeyFrom | string | optional | "messages.@reverse.0.content" | 从请求 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 | +| cacheValueFrom | string | optional | "choices.0.message.content" | 从响应 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 | +| cacheStreamValueFrom | string | optional | "choices.0.delta.content" | 从流式响应 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 | +| cacheToolCallsFrom | string | optional | "choices.0.delta.content.tool_calls" | 从流式响应 Body 中基于 [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) 语法提取字符串 | +| responseTemplate | string | optional | `{"id":"from-cache","choices":[{"index":0,"message":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}` | 返回 HTTP 响应的模版,用 %s 标记需要被 cache value 替换的部分 | +| streamResponseTemplate | string | optional | `data:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n`| 返回流式 HTTP 响应的模版,用 %s 标记需要被 cache value 替换的部分 | ## 文本向量化提供商特有配置 ### Azure OpenAI -Azure OpenAI 所对应的 `embedding.type` 为 `azure`。它需要提前创建[Azure OpenAI 账户](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/overview),然后您需要在[Azure AI Foundry](https://ai.azure.com/resource/deployments)中挑选一个模型并将其部署,点击您部署好的模型,您可以在终结点中看到目标 URI 以及密钥。请将 URI 中的 host 填入`embedding.serviceHost`,密钥填入`apiKey`。 +Azure OpenAI 所对应的 `embedding.type` 为 `azure`。它需要提前创建[Azure OpenAI 账户](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/overview),然后您需要在[Azure AI Foundry](https://ai.azure.com/resource/deployments)中挑选一个模型并将其部署,点击您部署好的模型,您可以在终结点中看到目标 URI 以及密钥。请将 URI 中的 host 填入`embedding.serviceHost`,密钥填入`embedding.apiKey`。 一个完整的 URI 示例为 https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings?api-version=2024-10-21,您需要将`YOUR_RESOURCE_NAME.openai.azure.com`填入`embedding.serviceHost`。 它特有的配置字段如下: -| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 | 填写值 | -| ---------------------- | -------- | -------- | ------ | ------- | ---------------------------- | -| `embedding.apiVersion` | string | 必填 | - | api版本 | 获取到的URI中api-version的值 | +| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 | +| ---------------------- | -------- | -------- | ------ | ------- | +| `embedding.apiVersion` | string | 必填 | - | api版本,获取到的URI中api-version的值 | 需要注意的是您必须要指定`embedding.serviceHost`,如`YOUR_RESOURCE_NAME.openai.azure.com`。模型默认使用了`text-embedding-ada-002`,如需其他模型,请在`embedding.model`中进行指定。 @@ -122,7 +122,7 @@ Cohere 所对应的 `embedding.type` 为 `cohere`。它并无特有的配置字 ### OpenAI -OpenAI 所对应的 `embedding.type` 为 `openai`。它并无特有的配置字段。需要提前创建 [API Key](https://platform.openai.com/settings/organization/api-keys),并将其填入`embedding.apiKey`,一个 API Key 的示例为` sk-xxxxxxx`。 +OpenAI 所对应的 `embedding.type` 为 `openai`。它并无特有的配置字段。需要提前创建 [API Key](https://platform.openai.com/settings/organization/api-keys),并将其填入`embedding.apiKey`,一个 API Key 的示例为`sk-xxxxxxx`。 ### Ollama @@ -130,32 +130,38 @@ Ollama 所对应的 `embedding.type` 为 `ollama`。它并无特有的配置字 ### Hugging Face -Hugging Face 所对应的 `embedding.type` 为 `huggingface`。它并无特有的配置字段。需要提前创建 [hf_token](https://huggingface.co/blog/getting-started-with-embeddings),并将其填入`embedding.apiKey`,一个 hf_token 的示例为` hf_xxxxxxx`。 +Hugging Face 所对应的 `embedding.type` 为 `huggingface`。它并无特有的配置字段。需要提前创建 [hf_token](https://huggingface.co/blog/getting-started-with-embeddings),并将其填入`embedding.apiKey`,一个 hf_token 的示例为`hf_xxxxxxx`。 `embedding.model`默认指定为`sentence-transformers/all-MiniLM-L6-v2` -### Textln +### DashScope -Textln 所对应的 `embedding.type` 为 `textln`。它需要提前获取[`app-id` 和`secret-code`](https://www.textin.com/document/acge_text_embedding)。 +DashScope 所对应的 `embedding.type` 为 `dashscope`。需要提前创建 [API Key](https://help.aliyun.com/document_detail/2712195.html),并将其填入`embedding.apiKey`。 + +`embedding.model`默认指定为`text-embedding-v2`,还可选用`text-embedding-v1`等模型。 + +### TextIn + +TextIn 所对应的 `embedding.type` 为 `textin`。它需要提前获取[`app-id` 和`secret-code`](https://www.textin.com/document/acge_text_embedding)。 它特有的配置字段如下: -| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 | 填写值 | -| ------------------------------- | -------- | -------- | ------ | -------------------- | ------------------ | -| `embedding.textinAppId` | string | 必填 | - | 应用 ID | 获取的 app-id | -| `embedding.textinSecretCode` | string | 必填 | - | 调用 API 所需 Secret | 获取的 secret-code | -| `embedding.textinMatryoshkaDim` | int | 必填 | - | 返回的单个向量长度 | | +| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 | +| ------------------------------- | -------- | -------- | ------ | ------------------ | +| `embedding.textinAppId` | string | 必填 | - | 应用 ID,获取的 app-id | +| `embedding.textinSecretCode` | string | 必填 | - | 调用 API 所需 Secret,获取的 secret-code | +| `embedding.textinMatryoshkaDim` | int | 必填 | - | 返回的单个向量长度 | ### 讯飞星火 -讯飞星火 所对应的 `embedding.type` 为 `xfyun`。它需要提前创建[应用](https://console.xfyun.cn/services/emb),获取`APPID` 、`APISecret`和`APIKey`,并将`APIKey`填入`embedding.apiKey`中。 +讯飞星火 所对应的 `embedding.type` 为 `xfyun`。它需要提前创建[应用](https://console.xfyun.cn/services/emb),获取`APPID`、`APISecret`和`APIKey`,并将`APIKey`填入`embedding.apiKey`中。 它特有的配置字段如下: -| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 | 填写值 | -| --------------------- | -------- | -------- | ------ | -------------------- | ---------------- | -| `embedding.appId` | string | 必填 | - | 应用 ID | 获取的 APPID | -| `embedding.apiSecret` | string | 必填 | - | 调用 API 所需 Secret | 获取的 APISecret | +| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 | +| --------------------- | -------- | -------- | ------ | -------------------- | +| `embedding.appId` | string | 必填 | - | 应用 ID,获取的 APPID | +| `embedding.apiSecret` | string | 必填 | - | 调用 API 所需 Secret,获取的 APISecret | ## 向量数据库提供商特有配置 @@ -171,9 +177,9 @@ ElasticSearch 所对应的 `vector.type` 为 `elasticsearch`。需要提前创 当前依赖于 [KNN](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html) 方法,请保证 ES 版本支持 `KNN`,当前已在 `8.16` 版本测试。 它特有的配置字段如下: -| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 | +| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 | |-------------------|----------|----------|--------|-------------------------------------------------------------------------------| -| `vector.esUsername` | string | 非必填 | - | ElasticSearch 用户名 | +| `vector.esUsername` | string | 非必填 | - | ElasticSearch 用户名 | | `vector.esPassword` | string | 非必填 | - | ElasticSearch 密码 | @@ -191,8 +197,7 @@ Pinecone 中的 `Namespace` 参数通过插件的 `vector.collectionID` 进行 Qdrant 所对应的 `vector.type` 为 `qdrant`。它并无特有的配置字段。需要提前创建 Collection,并填写 Collection Name 至配置项 `vector.collectionID`。 ### Weaviate -Weaviate 所对应的 `vector.type` 为 `weaviate`。它并无特有的配置字段。 -需要提前创建 Collection,并填写 Collection Name 至配置项 `vector.collectionID`。 +Weaviate 所对应的 `vector.type` 为 `weaviate`。它并无特有的配置字段。需要提前创建 Collection,并填写 Collection Name 至配置项 `vector.collectionID`。 需要注意的是 Weaviate 会设置首字母自动大写,在填写配置 `collectionID` 的时候需要将首字母设置为大写。 @@ -210,7 +215,7 @@ vector: type: dashvector serviceName: my_dashvector.dns collectionID: [Your Collection ID] - serviceDomain: [Your domain] + serviceHost: [Your domain] apiKey: [Your key] cache: @@ -221,15 +226,6 @@ cache: ``` -旧版本配置兼容 -```yaml -redis: - serviceName: my_redis.dns - servicePort: 6379 - timeout: 100 - database: 1 -``` - ## 进阶用法 当前默认的缓存 key 是基于 GJSON PATH 的表达式:`messages.@reverse.0.content` 提取,含义是把 messages 数组反转后取第一项的 content; diff --git a/plugins/wasm-go/extensions/ai-cache/README_EN.md b/plugins/wasm-go/extensions/ai-cache/README_EN.md index d48f9f71b..1ebf2b8e6 100644 --- a/plugins/wasm-go/extensions/ai-cache/README_EN.md +++ b/plugins/wasm-go/extensions/ai-cache/README_EN.md @@ -3,46 +3,243 @@ title: AI Cache keywords: [higress,ai cache] description: AI Cache Plugin Configuration Reference --- + ## Function Description -LLM result caching plugin, the default configuration can be directly used for result caching under the OpenAI protocol, and it supports caching of both streaming and non-streaming responses. + +LLM result caching plugin. The default configuration can be directly used for OpenAI protocol result caching, and supports caching of both streaming and non-streaming responses. **Tips** When carrying the request header `x-higress-skip-ai-cache: on`, the current request will not use content from the cache but will be directly forwarded to the backend service. Additionally, the response content from this request will not be cached. ## Runtime Properties + Plugin Execution Phase: `Authentication Phase` Plugin Execution Priority: `10` ## Configuration Description -| Name | Type | Requirement | Default | Description | -| -------- | -------- | -------- | -------- | -------- | -| cacheKeyFrom.requestBody | string | optional | "messages.@reverse.0.content" | Extracts a string from the request Body based on [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax | -| cacheValueFrom.responseBody | string | optional | "choices.0.message.content" | Extracts a string from the response Body based on [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax | -| cacheStreamValueFrom.responseBody | string | optional | "choices.0.delta.content" | Extracts a string from the streaming response Body based on [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax | -| cacheKeyPrefix | string | optional | "higress-ai-cache:" | Prefix for the Redis cache key | -| cacheTTL | integer | optional | 0 | Cache expiration time in seconds, default value is 0, which means never expire | -| redis.serviceName | string | required | - | The complete FQDN name of the Redis service, including the service type, e.g., my-redis.dns, redis.my-ns.svc.cluster.local | -| redis.servicePort | integer | optional | 6379 | Redis service port | -| redis.timeout | integer | optional | 1000 | Timeout for requests to Redis, in milliseconds | -| redis.username | string | optional | - | Username for logging into Redis | -| redis.database | int | optional | 0 | The database ID used, limited to Redis, for example, configured as 1, corresponds to `SELECT 1`. | -| redis.password | string | optional | - | Password for logging into Redis | -| returnResponseTemplate | string | optional | `{"id":"from-cache","choices":[%s],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}` | Template for returning HTTP response, with %s marking the part to be replaced by cache value | -| returnStreamResponseTemplate | string | optional | `data:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n` | Template for returning streaming HTTP response, with %s marking the part to be replaced by cache value | + +The configuration is divided into 3 parts: Vector Database (vector), Text Embedding Service (embedding), and Cache Service (cache). It also provides fine-grained LLM request/response extraction parameter configurations. + +This plugin supports both vector database-based semantic caching and string matching-based caching methods. If both vector database and cache database are configured, the cache database is used first, and the vector database capability is used when cache misses occur. + +*Note*: Vector database (vector) and cache database (cache) cannot both be empty, otherwise this plugin cannot provide caching services. + +| Name | Type | Requirement | Default | Description | +| --- | --- | --- | --- | --- | +| vector | object | optional | - | Vector storage service configuration, see Vector Database Service section below | +| embedding | object | optional | - | Text embedding service configuration, see Text Embedding Service section below | +| cache | object | optional | - | Cache service configuration, see Cache Service section below | +| cacheKeyStrategy | string | optional | "lastQuestion" | Strategy for generating cache key from historical questions. Options: "lastQuestion" (use last question), "allQuestions" (concatenate all questions), or "disabled" (disable caching) | +| enableSemanticCache | bool | optional | false | Whether to enable semantic caching. If disabled, string matching is used to find cache, requiring cache service configuration. Automatically enabled when a vector provider is configured | + +Depending on whether semantic caching is needed, you can configure component combinations as follows: +1. `cache`: Enable string matching cache only +2. `vector (+ embedding)`: Enable semantic caching. If `vector` does not provide string representation service, you need to configure `embedding` service separately +3. `vector (+ embedding) + cache`: Enable semantic caching and use cache service to store LLM responses for acceleration + +If you do not configure a related component, you can ignore the `required` fields of that component. + +## Vector Database Service (vector) + +| Name | Type | Requirement | Default | Description | +| --- | --- | --- | --- | --- | +| vector.type | string | required | - | Vector storage service provider type, e.g., dashvector, chroma, elasticsearch, weaviate, pinecone, qdrant, milvus | +| vector.serviceName | string | required | - | Vector storage service name | +| vector.serviceHost | string | optional | - | Vector storage service domain. Required for some providers (e.g., dashvector, pinecone) | +| vector.servicePort | int64 | optional | 443 | Vector storage service port | +| vector.apiKey | string | optional | - | Vector storage service API Key | +| vector.topK | int | optional | 1 | Return TopK results | +| vector.timeout | uint32 | optional | 10000 | Timeout for requesting vector storage service, in milliseconds. Default is 10000 (10 seconds) | +| vector.collectionID | string | optional | - | Vector storage service Collection ID | +| vector.threshold | float64 | optional | 1000 | Vector similarity measurement threshold | +| vector.thresholdRelation | string | optional | "lt" | Similarity measurement comparison method. Similarity measurement methods include `Cosine`, `DotProduct`, `Euclidean`, etc. The first two have higher similarity with larger values, while the latter has higher similarity with smaller values. Use `gt` for `Cosine` and `DotProduct`, and `lt` for `Euclidean`. All options include `lt` (less than), `lte` (less than or equal to), `gt` (greater than), `gte` (greater than or equal to) | +| vector.esUsername | string | optional | - | ElasticSearch username, only for elasticsearch type | +| vector.esPassword | string | optional | - | ElasticSearch password, only for elasticsearch type | + +## Text Embedding Service (embedding) + +| Name | Type | Requirement | Default | Description | +| --- | --- | --- | --- | --- | +| embedding.type | string | required | - | Text embedding service type, e.g., dashscope, openai, azure, cohere, ollama, huggingface, textin, xfyun | +| embedding.serviceName | string | required | - | Text embedding service name | +| embedding.serviceHost | string | optional | - | Text embedding service domain | +| embedding.servicePort | int64 | optional | 443 | Text embedding service port. Default varies by provider; ollama defaults to 11434 | +| embedding.timeout | uint32 | optional | 10000 | Timeout for requesting text embedding service, in milliseconds. Default is 10000 (10 seconds) | +| embedding.model | string | optional | - | Model name for text embedding service | +| embedding.apiKey | string | optional | - | API Key for text embedding service | + +## Cache Service (cache) + +| Name | Type | Requirement | Default | Description | +| --- | --- | --- | --- | --- | +| cache.type | string | required | - | Cache service type, e.g., redis | +| cache.serviceName | string | required | - | Cache service name | +| cache.serviceHost | string | optional | - | Cache service domain | +| cache.servicePort | int64 | optional | 6379 | Cache service port. If serviceName ends with .static, default is 80 | +| cache.username | string | optional | - | Cache service username | +| cache.password | string | optional | - | Cache service password | +| cache.timeout | uint32 | optional | 10000 | Cache service timeout, in milliseconds. Default is 10000 (10 seconds) | +| cache.cacheTTL | int | optional | 0 | Cache expiration time, in seconds. Default is 0 (never expire) | +| cache.cacheKeyPrefix | string | optional | "higress-ai-cache:" | Prefix for cache keys | +| cache.database | int | optional | 0 | Database ID to use, only for Redis. For example, configure as 1 for `SELECT 1` | + +## Other Configurations + +| Name | Type | Requirement | Default | Description | +| --- | --- | --- | --- | --- | +| cacheKeyFrom | string | optional | "messages.@reverse.0.content" | Extract string from request Body using [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax | +| cacheValueFrom | string | optional | "choices.0.message.content" | Extract string from response Body using [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax | +| cacheStreamValueFrom | string | optional | "choices.0.delta.content" | Extract string from streaming response Body using [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax | +| cacheToolCallsFrom | string | optional | "choices.0.delta.content.tool_calls" | Extract string from streaming response Body using [GJSON PATH](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) syntax | +| responseTemplate | string | optional | `{"id":"from-cache","choices":[{"index":0,"message":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}` | Template for returning HTTP response, with %s marking the part to be replaced by cache value | +| streamResponseTemplate | string | optional | `data:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"from-cache","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n` | Template for returning streaming HTTP response, with %s marking the part to be replaced by cache value | + +## Text Embedding Provider Specific Configurations + +### Azure OpenAI + +For Azure OpenAI, set `embedding.type` to `azure`. You need to first create an [Azure OpenAI account](https://portal.azure.com/#view/Microsoft_Azure_ProjectOxford/CognitiveServicesHub/~/overview), then select and deploy a model in [Azure AI Foundry](https://ai.azure.com/resource/deployments). Click on your deployed model to see the target URI and key in the endpoint section. Please enter the host from the URI in `embedding.serviceHost` and the key in `embedding.apiKey`. + +A complete URI example is https://YOUR_RESOURCE_NAME.openai.azure.com/openai/deployments/YOUR_DEPLOYMENT_NAME/embeddings?api-version=2024-10-21. You need to enter `YOUR_RESOURCE_NAME.openai.azure.com` in `embedding.serviceHost`. + +Specific configuration fields: + +| Name | Data Type | Requirement | Default | Description | +| ---------------------- | -------- | -------- | ------ | ------- | +| `embedding.apiVersion` | string | required | - | API version, the api-version value from the obtained URI | + +Note that you must specify `embedding.serviceHost`, such as `YOUR_RESOURCE_NAME.openai.azure.com`. The default model is `text-embedding-ada-002`. For other models, specify in `embedding.model`. + +### Cohere + +For Cohere, set `embedding.type` to `cohere`. There are no specific configuration fields. You need to create an [API Key](https://docs.cohere.com/reference/embed) and enter it in `embedding.apiKey`. + +### OpenAI + +For OpenAI, set `embedding.type` to `openai`. There are no specific configuration fields. You need to create an [API Key](https://platform.openai.com/settings/organization/api-keys) and enter it in `embedding.apiKey`. An API Key example is `sk-xxxxxxx`. + +### Ollama + +For Ollama, set `embedding.type` to `ollama`. There are no specific configuration fields. + +### Hugging Face + +For Hugging Face, set `embedding.type` to `huggingface`. There are no specific configuration fields. You need to create an [hf_token](https://huggingface.co/blog/getting-started-with-embeddings) and enter it in `embedding.apiKey`. An hf_token example is `hf_xxxxxxx`. + +`embedding.model` defaults to `sentence-transformers/all-MiniLM-L6-v2`. + +### DashScope + +For DashScope, set `embedding.type` to `dashscope`. You need to create an [API Key](https://help.aliyun.com/document_detail/2712195.html) and enter it in `embedding.apiKey`. + +`embedding.model` defaults to `text-embedding-v2`. Other models like `text-embedding-v1` can also be used. + +### TextIn + +For TextIn, set `embedding.type` to `textin`. You need to first obtain [`app-id` and `secret-code`](https://www.textin.com/document/acge_text_embedding). + +Specific configuration fields: + +| Name | Data Type | Requirement | Default | Description | +| ------------------------------- | -------- | -------- | ------ | ------------------ | +| `embedding.textinAppId` | string | required | - | Application ID, obtained app-id | +| `embedding.textinSecretCode` | string | required | - | Secret for calling API, obtained secret-code | +| `embedding.textinMatryoshkaDim` | int | required | - | Dimension of returned single vector | + +### Xfyun (讯飞星火) + +For Xfyun, set `embedding.type` to `xfyun`. You need to first create an [application](https://console.xfyun.cn/services/emb) to obtain `APPID`, `APISecret`, and `APIKey`, and enter `APIKey` in `embedding.apiKey`. + +Specific configuration fields: + +| Name | Data Type | Requirement | Default | Description | +| --------------------- | -------- | -------- | ------ | -------------------- | +| `embedding.appId` | string | required | - | Application ID, obtained APPID | +| `embedding.apiSecret` | string | required | - | Secret for calling API, obtained APISecret | + +## Vector Database Provider Specific Configurations + +### Chroma + +For Chroma, set `vector.type` to `chroma`. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection ID in `vector.collectionID`. A Collection ID example is `52bbb8b3-724c-477b-a4ce-d5b578214612`. + +### DashVector + +For DashVector, set `vector.type` to `dashvector`. There are no specific configuration fields. You need to create a Collection in advance and fill in the `Collection Name` in `vector.collectionID`. + +### ElasticSearch + +For ElasticSearch, set `vector.type` to `elasticsearch`. You need to create an Index in advance and fill in the Index Name in `vector.collectionID`. + +It currently relies on the [KNN](https://www.elastic.co/guide/en/elasticsearch/reference/current/knn-search.html) method. Please ensure your ES version supports `KNN`. It has been tested on version `8.16`. + +Specific configuration fields: + +| Name | Data Type | Requirement | Default | Description | +|-------------------|----------|----------|--------|-------------------------------------------------------------------------------| +| `vector.esUsername` | string | optional | - | ElasticSearch username | +| `vector.esPassword` | string | optional | - | ElasticSearch password | + +`vector.esUsername` and `vector.esPassword` are used for Basic authentication. API Key authentication is also supported. When `vector.apiKey` is filled in, API Key authentication is enabled. For SaaS versions, you need to fill in the `encoded` value. + +### Milvus + +For Milvus, set `vector.type` to `milvus`. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in `vector.collectionID`. + +### Pinecone + +For Pinecone, set `vector.type` to `pinecone`. There are no specific configuration fields. You need to create an Index in advance and fill in the Index access domain in `vector.serviceHost`. + +The `Namespace` parameter in Pinecone is configured through the plugin's `vector.collectionID`. If `vector.collectionID` is not filled in, it defaults to the Default Namespace. + +### Qdrant + +For Qdrant, set `vector.type` to `qdrant`. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in `vector.collectionID`. + +### Weaviate + +For Weaviate, set `vector.type` to `weaviate`. There are no specific configuration fields. You need to create a Collection in advance and fill in the Collection Name in `vector.collectionID`. + +Note that Weaviate automatically capitalizes the first letter, so when filling in `collectionID`, the first letter should be capitalized. + +If using SaaS, you need to fill in the `vector.serviceHost` parameter. ## Configuration Example -```yaml -redis: - serviceName: my-redis.dns - timeout: 2000 + +### Basic Configuration +```yaml +embedding: + type: dashscope + serviceName: my_dashscope.dns + apiKey: [Your Key] + +vector: + type: dashvector + serviceName: my_dashvector.dns + collectionID: [Your Collection ID] + serviceHost: [Your domain] + apiKey: [Your key] + +cache: + type: redis + serviceName: my_redis.dns servicePort: 6379 - database: 1 -``` + timeout: 100 +``` ## Advanced Usage -The current default cache key is based on the GJSON PATH expression: `messages.@reverse.0.content`, meaning to get the content of the first item after reversing the messages array; -GJSON PATH supports conditional syntax, for instance, if you want to take the content of the last role as user as the key, it can be written as: `messages.@reverse.#(role=="user").content`; -If you want to concatenate all the content with role as user into an array as the key, it can be written as: `messages.@reverse.#(role=="user")#.content`; -It also supports pipeline syntax, for example, if you want to take the second role as user as the key, it can be written as: `messages.@reverse.#(role=="user")#.content|1`. -For more usage, you can refer to the [official documentation](https://github.com/tidwall/gjson/blob/master/SYNTAX.md) and use the [GJSON Playground](https://gjson.dev/) for syntax testing. + +The current default cache key is extracted based on the GJSON PATH expression: `messages.@reverse.0.content`, which means reversing the messages array and taking the content of the first item. + +GJSON PATH supports conditional syntax. For example, to get the content of the last role as user as the key, you can write: `messages.@reverse.#(role=="user").content`; + +If you want to concatenate all content with role as user into an array as the key, you can write: `messages.@reverse.#(role=="user")#.content`; + +It also supports pipeline syntax. For example, to get the second-to-last role as user as the key, you can write: `messages.@reverse.#(role=="user")#.content|1`. + +For more usage, please refer to the [official documentation](https://github.com/tidwall/gjson/blob/master/SYNTAX.md). You can use the [GJSON Playground](https://gjson.dev/) for syntax testing. + +## FAQ + +1. If the returned error is `error status returned by host: bad argument`, please check if `serviceName` correctly includes the service type suffix (.dns, etc.).