higress

mirror of https://github.com/alibaba/higress.git synced 2026-02-06 23:21:08 +08:00

Files

Xijun Dai 47827ad271 refactor(v2): upgrade module to github.com/alibaba/higress/v2 (#2922 )

Signed-off-by: Xijun Dai <daixijun1990@gmail.com>

2025-09-21 14:29:07 +08:00

cache

Migrate WASM Go Plugins to New SDK and Go 1.24 (#2532 )

2025-07-11 10:43:00 +08:00

config

Migrate WASM Go Plugins to New SDK and Go 1.24 (#2532 )

2025-07-11 10:43:00 +08:00

embedding

Migrate WASM Go Plugins to New SDK and Go 1.24 (#2532 )

2025-07-11 10:43:00 +08:00

vector

Migrate WASM Go Plugins to New SDK and Go 1.24 (#2532 )

2025-07-11 10:43:00 +08:00

.gitignore

[ai-cache] Implement a WASM plugin for LLM result retrieval based on vector similarity (#1290 )

2024-10-27 16:21:04 +08:00

core.go

refactor(v2): upgrade module to github.com/alibaba/higress/v2 (#2922 )

2025-09-21 14:29:07 +08:00

go.mod

feat(wasm-go): add wasm go plugin unit test and ci workflow (#2809 )

2025-08-28 20:02:03 +08:00

go.sum

feat(wasm-go): add wasm go plugin unit test and ci workflow (#2809 )

2025-08-28 20:02:03 +08:00

main_test.go

feat(wasm-go): add wasm go plugin unit test and ci workflow (#2809 )

2025-08-28 20:02:03 +08:00

main.go

refactor(v2): upgrade module to github.com/alibaba/higress/v2 (#2922 )

2025-09-21 14:29:07 +08:00

option.yaml

docs(wasm-go): update README related to wasm-go (#2586 )

2025-07-16 10:27:40 +08:00

README_EN.md

Add database configuration for plugins that use Redis. (#1814 )

2025-02-26 10:52:54 +08:00

README.md

docs: 添加Azure OpenAI配置说明 (#1976 )

2025-03-29 20:11:48 +08:00

util.go

Migrate WASM Go Plugins to New SDK and Go 1.24 (#2532 )

2025-07-11 10:43:00 +08:00

README_EN.md

title, keywords, description

title

keywords

description

AI Cache

higress

ai cache

AI Cache Plugin Configuration Reference

Function Description

LLM result caching plugin, the default configuration can be directly used for result caching under the OpenAI protocol, and it supports caching of both streaming and non-streaming responses.

Tips

When carrying the request header x-higress-skip-ai-cache: on, the current request will not use content from the cache but will be directly forwarded to the backend service. Additionally, the response content from this request will not be cached.

Runtime Properties

Plugin Execution Phase: Authentication Phase Plugin Execution Priority: 10

Configuration Description

Name	Type	Requirement	Default	Description
cacheKeyFrom.requestBody	string	optional	"messages.@reverse.0.content"	Extracts a string from the request Body based on GJSON PATH syntax
cacheValueFrom.responseBody	string	optional	"choices.0.message.content"	Extracts a string from the response Body based on GJSON PATH syntax
cacheStreamValueFrom.responseBody	string	optional	"choices.0.delta.content"	Extracts a string from the streaming response Body based on GJSON PATH syntax
cacheKeyPrefix	string	optional	"higress-ai-cache:"	Prefix for the Redis cache key
cacheTTL	integer	optional	0	Cache expiration time in seconds, default value is 0, which means never expire
redis.serviceName	string	required	-	The complete FQDN name of the Redis service, including the service type, e.g., my-redis.dns, redis.my-ns.svc.cluster.local
redis.servicePort	integer	optional	6379	Redis service port
redis.timeout	integer	optional	1000	Timeout for requests to Redis, in milliseconds
redis.username	string	optional	-	Username for logging into Redis
redis.database	int	optional	0	The database ID used, limited to Redis, for example, configured as 1, corresponds to `SELECT 1`.
redis.password	string	optional	-	Password for logging into Redis
returnResponseTemplate	string	optional	`{"id":"from-cache","choices":[%s],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}`	Template for returning HTTP response, with %s marking the part to be replaced by cache value
returnStreamResponseTemplate	string	optional	`data:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}\n\ndata:[DONE]\n\n`	Template for returning streaming HTTP response, with %s marking the part to be replaced by cache value

Configuration Example

redis:  
  serviceName: my-redis.dns  
  timeout: 2000  
  servicePort: 6379
  database: 1

Advanced Usage

The current default cache key is based on the GJSON PATH expression: messages.@reverse.0.content, meaning to get the content of the first item after reversing the messages array;
GJSON PATH supports conditional syntax, for instance, if you want to take the content of the last role as user as the key, it can be written as: messages.@reverse.#(role=="user").content;
If you want to concatenate all the content with role as user into an array as the key, it can be written as: messages.@reverse.#(role=="user")#.content;
It also supports pipeline syntax, for example, if you want to take the second role as user as the key, it can be written as: messages.@reverse.#(role=="user")#.content|1.
For more usage, you can refer to the official documentation and use the GJSON Playground for syntax testing.