Compare commits

...

8 Commits

Author SHA1 Message Date
HaoJie Liu
a57173ce28 feat(ai-proxy): support Amazon Bedrock (#2039) 2025-04-22 22:36:14 +08:00
mirror
3a8d8f5b94 update mcp descriptions (#2105) 2025-04-22 17:01:41 +08:00
Kent Dong
1c37c361e1 feat: Support extracting model argument from body in multipart/form-data format (#1940) 2025-04-22 13:52:50 +08:00
Se7en
b8133a95b2 feat: optimize elasticsearch ai-search plugin and update related docs" (#2100) 2025-04-22 13:33:38 +08:00
johnlanni
36d5d391b8 update README.md 2025-04-21 09:59:37 +08:00
johnlanni
1da9a07866 update README 2025-04-21 09:56:23 +08:00
ZeruiYang
8620838f8b fix: update module replacements (#2090) 2025-04-19 18:13:42 +08:00
waTErMo0n
e7d2005382 feat:Getting MatchLabels dynamically via gatewaySelectorKey/Value #1857 (#1883) 2025-04-18 17:46:47 +08:00
24 changed files with 1739 additions and 118 deletions

View File

@@ -22,13 +22,21 @@
English | <a href="README_ZH.md">中文<a/> | <a href="README_JP.md">日本語<a/>
</p>
## What is Higress?
Higress is a cloud-native API gateway based on Istio and Envoy, which can be extended with Wasm plugins written in Go/Rust/JS. It provides dozens of ready-to-use general-purpose plugins and an out-of-the-box console (try the [demo here](http://demo.higress.io/)).
Higress was born within Alibaba to solve the issues of Tengine reload affecting long-connection services and insufficient load balancing capabilities for gRPC/Dubbo.
### Core Use Cases
Alibaba Cloud has built its cloud-native API gateway product based on Higress, providing 99.99% gateway high availability guarantee service capabilities for a large number of enterprise customers.
Higress's AI gateway capabilities support all [mainstream model providers](https://github.com/alibaba/higress/tree/main/plugins/wasm-go/extensions/ai-proxy/provider) both domestic and international. It also supports hosting MCP (Model Context Protocol) Servers through its plugin mechanism, enabling AI Agents to easily call various tools and services. With the [openapi-to-mcp tool](https://github.com/higress-group/openapi-to-mcpserver), you can quickly convert OpenAPI specifications into remote MCP servers for hosting. Higress provides unified management for both LLM API and MCP API.
Higress's AI gateway capabilities support all [mainstream model providers](https://github.com/alibaba/higress/tree/main/plugins/wasm-go/extensions/ai-proxy/provider) both domestic and international, as well as self-built DeepSeek models based on vllm/ollama. Within Alibaba Cloud, it supports AI businesses such as Tongyi Qianwen APP, Bailian large model API, and machine learning PAI platform. It also serves leading AIGC enterprises (such as Zero One Infinite) and AI products (such as FastGPT).
**🌟 Try it now at [https://mcp.higress.ai/](https://mcp.higress.ai/)** to experience Higress-hosted Remote MCP Servers firsthand:
![Higress MCP Server Platform](https://img.alicdn.com/imgextra/i2/O1CN01nmVa0a1aChgpyyWOX_!!6000000003294-0-tps-3430-1742.jpg)
### Enterprise Adoption
Higress was born within Alibaba to solve the issues of Tengine reload affecting long-connection services and insufficient load balancing capabilities for gRPC/Dubbo. Within Alibaba Cloud, Higress's AI gateway capabilities support core AI applications such as Tongyi Bailian model studio, machine learning PAI platform, and other critical AI services. Alibaba Cloud has built its cloud-native API gateway product based on Higress, providing 99.99% gateway high availability guarantee service capabilities for a large number of enterprise customers.
## Summary
@@ -64,32 +72,28 @@ For other installation methods such as Helm deployment under K8s, please refer t
## Use Cases
- **AI Gateway**:
Higress can connect to all LLM model providers both domestic and international using a unified protocol, while also providing rich AI observability, multi-model load balancing/fallback, AI token rate limiting, AI caching, and other capabilities:
![](https://img.alicdn.com/imgextra/i2/O1CN01izmBNX1jbHT7lP3Yr_!!6000000004566-0-tps-1920-1080.jpg)
- **MCP Server Hosting**:
Higress, as an Envoy-based API gateway, supports hosting MCP Servers through its plugin mechanism. MCP (Model Context Protocol) is essentially an AI-friendly API that enables AI Agents to more easily call various tools and services. Higress provides unified capabilities for authentication, authorization, rate limiting, and observability for tool calls, simplifying the development and deployment of AI applications.
Higress hosts MCP Servers through its plugin mechanism, enabling AI Agents to easily call various tools and services. With the [openapi-to-mcp tool](https://github.com/higress-group/openapi-to-mcpserver), you can quickly convert OpenAPI specifications into remote MCP servers.
![](https://img.alicdn.com/imgextra/i1/O1CN01wv8H4g1mS4MUzC1QC_!!6000000004952-2-tps-1764-597.png)
**🌟 Try it now!** Experience Higress-hosted Remote MCP Servers at [https://mcp.higress.ai/](https://mcp.higress.ai/)
![Higress MCP Server Platform](https://img.alicdn.com/imgextra/i2/O1CN01nmVa0a1aChgpyyWOX_!!6000000003294-0-tps-3430-1742.jpg)
By hosting MCP Servers with Higress, you can achieve:
- Unified authentication and authorization mechanisms, ensuring the security of AI tool calls
- Fine-grained rate limiting to prevent abuse and resource exhaustion
- Comprehensive audit logs recording all tool call behaviors
- Rich observability for monitoring the performance and health of tool calls
- Simplified deployment and management through Higress's plugin mechanism for quickly adding new MCP Servers
- Dynamic updates without disruption: Thanks to Envoy's friendly handling of long connections and Wasm plugin's dynamic update mechanism, MCP Server logic can be updated on-the-fly without any traffic disruption or connection drops
Key benefits of hosting MCP Servers with Higress:
- Unified authentication and authorization mechanisms
- Fine-grained rate limiting to prevent abuse
- Comprehensive audit logs for all tool calls
- Rich observability for monitoring performance
- Simplified deployment through Higress's plugin mechanism
- Dynamic updates without disruption or connection drops
[Learn more...](https://higress.cn/en/ai/mcp-quick-start/?spm=36971b57.7beea2de.0.0.d85f20a94jsWGm)
- **AI Gateway**:
Higress connects to all LLM model providers using a unified protocol, with AI observability, multi-model load balancing, token rate limiting, and caching capabilities:
![](https://img.alicdn.com/imgextra/i2/O1CN01izmBNX1jbHT7lP3Yr_!!6000000004566-0-tps-1920-1080.jpg)
- **Kubernetes ingress controller**:
Higress can function as a feature-rich ingress controller, which is compatible with many annotations of K8s' nginx ingress controller.

View File

@@ -22,15 +22,21 @@
</p>
## Higressとは
Higressは、IstioとEnvoyをベースにしたクラウドネイティブAPIゲートウェイで、Go/Rust/JSなどを使用してWasmプラグインを作成できます。数十の既製の汎用プラグインと、すぐに使用できるコンソールを提供していますデモは[こちら](http://demo.higress.io/))。
Higressは、Tengineのリロードが長時間接続のビジネスに影響を与える問題や、gRPC/Dubboの負荷分散能力の不足を解決するために、Alibaba内部で誕生しました。
### 主な使用シナリオ
Alibaba Cloudは、Higressを基盤にクラウドネイティブAPIゲートウェイ製品を構築し、多くの企業顧客に99.99%のゲートウェイ高可用性保証サービスを提供しています。
HigressのAIゲートウェイ機能は、国内外のすべての[主要モデルプロバイダー](https://github.com/alibaba/higress/tree/main/plugins/wasm-go/extensions/ai-proxy/provider)をサポートし、vllm/ollamaなどに基づく自己構築DeepSeekモデルにも対応しています。また、プラグインメカニズムを通じてMCPModel Context Protocolサーバーをホストすることもでき、AI Agentが様々なツールやサービスを簡単に呼び出せるようにします。[openapi-to-mcpツール](https://github.com/higress-group/openapi-to-mcpserver)を使用すると、OpenAPI仕様を迅速にリモートMCPサーバーに変換してホスティングできます。HigressはLLM APIとMCP APIの統一管理を提供します。
Higressは、AIゲートウェイ機能を基盤に、Tongyi Qianwen APP、Bailian大規模モデルAPI、機械学習PAIプラットフォームなどのAIビジネスをサポートしています。また、国内の主要なAIGC企業ZeroOneやAI製品FastGPTにもサービスを提供しています
**🌟 今すぐ[https://mcp.higress.ai/](https://mcp.higress.ai/)で体験**してください。HigressがホストするリモートMCPサーバーを直接体験できます:
![](https://img.alicdn.com/imgextra/i2/O1CN011AbR8023V8R5N0HcA_!!6000000007260-2-tps-1080-606.png)
![Higress MCP Server Platform](https://img.alicdn.com/imgextra/i2/O1CN01nmVa0a1aChgpyyWOX_!!6000000003294-0-tps-3430-1742.jpg)
### 企業での採用
Higressは、Tengineのリロードが長時間接続のビジネスに影響を与える問題や、gRPC/Dubboの負荷分散能力の不足を解決するために、Alibaba内部で誕生しました。Alibaba Cloud内では、HigressのAIゲートウェイ機能がTongyi Qianwen APP、Tongyi Bailian Model Studio、機械学習PAIプラットフォームなどの中核的なAIアプリケーションをサポートしています。また、国内の主要なAIGC企業ZeroOneやAI製品FastGPTにもサービスを提供しています。Alibaba Cloudは、Higressを基盤にクラウドネイティブAPIゲートウェイ製品を構築し、多くの企業顧客に99.99%のゲートウェイ高可用性保証サービスを提供しています。
## 目次
@@ -79,10 +85,6 @@ K8sでのHelmデプロイなどの他のインストール方法については
![](https://img.alicdn.com/imgextra/i3/O1CN01K4qPUX1OliZa8KIPw_!!6000000001746-2-tps-1581-615.png)
**🌟 今すぐ試してみよう!** [https://mcp.higress.ai/](https://mcp.higress.ai/) でHigressがホストするリモートMCPサーバーを体験できます。このプラットフォームでは、HigressがどのようにリモートMCPサーバーをホストおよび管理するかを直接体験できます。
![Higress MCP サーバープラットフォーム](https://img.alicdn.com/imgextra/i2/O1CN01nmVa0a1aChgpyyWOX_!!6000000003294-0-tps-3430-1742.jpg)
Higressを使用してMCP Serverをホストすることで、以下のことが実現できます
- 統一された認証と認可メカニズム、AIツール呼び出しのセキュリティを確保
- きめ細かいレート制限、乱用やリソース枯渇を防止

View File

@@ -28,15 +28,21 @@
</p>
## Higress 是什么?
Higress 是一款云原生 API 网关,内核基于 Istio 和 Envoy可以用 Go/Rust/JS 等编写 Wasm 插件提供了数十个现成的通用插件以及开箱即用的控制台demo 点[这里](http://demo.higress.io/)
Higress 在阿里内部为解决 Tengine reload 对长连接业务有损,以及 gRPC/Dubbo 负载均衡能力不足而诞生。
### 核心使用场景
阿里云基于 Higress 构建了云原生 API 网关产品,为大量企业客户提供 99.99% 的网关高可用保障服务能力
Higress 的 AI 网关能力支持国内外所有[主流模型供应商](https://github.com/alibaba/higress/tree/main/plugins/wasm-go/extensions/ai-proxy/provider)和基于 vllm/ollama 等自建的 DeepSeek 模型。同时Higress 支持通过插件方式托管 MCP (Model Context Protocol) 服务器,使 AI Agent 能够更容易地调用各种工具和服务。借助 [openapi-to-mcp 工具](https://github.com/higress-group/openapi-to-mcpserver),您可以快速将 OpenAPI 规范转换为远程 MCP 服务器进行托管。Higress 提供了对 LLM API 和 MCP API 的统一管理
Higress 的 AI 网关能力支持国内外所有[主流模型供应商](https://github.com/alibaba/higress/tree/main/plugins/wasm-go/extensions/ai-proxy/provider)和基于 vllm/ollama 等自建的 DeepSeek 模型;在阿里云内部支撑了通义千问 APP、百炼大模型 API、机器学习 PAI 平台等 AI 业务。同时服务国内头部的 AIGC 企业(如零一万物),以及 AI 产品(如 FastGPT
**🌟 立即体验 [https://mcp.higress.ai/](https://mcp.higress.ai/)** 基于 Higress 托管的远程 MCP 服务器:
![](https://img.alicdn.com/imgextra/i2/O1CN011AbR8023V8R5N0HcA_!!6000000007260-2-tps-1080-606.png)
![Higress MCP 服务器平台](https://img.alicdn.com/imgextra/i2/O1CN01nmVa0a1aChgpyyWOX_!!6000000003294-0-tps-3430-1742.jpg)
### 生产环境采用
Higress 在阿里内部为解决 Tengine reload 对长连接业务有损,以及 gRPC/Dubbo 负载均衡能力不足而诞生。在阿里云内部Higress 的 AI 网关能力支撑了通义千问 APP、通义百炼模型工作室、机器学习 PAI 平台等核心 AI 应用。同时服务国内头部的 AIGC 企业(如零一万物),以及 AI 产品(如 FastGPT。阿里云基于 Higress 构建了云原生 API 网关产品,为大量企业客户提供 99.99% 的网关高可用保障服务能力。
## Summary
@@ -89,10 +95,6 @@ K8s 下使用 Helm 部署等其他安装方式可以参考官网 [Quick Start
![](https://img.alicdn.com/imgextra/i3/O1CN01K4qPUX1OliZa8KIPw_!!6000000001746-2-tps-1581-615.png)
**🌟 立即体验!** 在 [https://mcp.higress.ai/](https://mcp.higress.ai/) 体验 Higress 托管的远程 MCP 服务器。这个平台让您可以体验基于 Higress 托管的远程 MCP 服务器的效果。
![Higress MCP 服务器平台](https://img.alicdn.com/imgextra/i2/O1CN01nmVa0a1aChgpyyWOX_!!6000000003294-0-tps-3430-1742.jpg)
通过 Higress 托管 MCP Server可以实现
- 统一的认证和鉴权机制,确保 AI 工具调用的安全性
- 精细化的速率限制,防止滥用和资源耗尽

View File

@@ -152,6 +152,7 @@ type IngressConfig struct {
httpsConfigMgr *cert.ConfigMgr
commonOptions common.Options
// templateProcessor processes template variables in config
templateProcessor *TemplateProcessor
@@ -197,6 +198,7 @@ func NewIngressConfig(localKubeClient kube.Client, xdsUpdater istiomodel.XDSUpda
namespace: namespace,
wasmPlugins: make(map[string]*extensions.WasmPlugin),
http2rpcs: make(map[string]*higressv1.Http2Rpc),
commonOptions: options,
}
// Initialize secret config manager
@@ -904,7 +906,7 @@ func (m *IngressConfig) convertIstioWasmPlugin(obj *higressext.WasmPlugin) (*ext
result := &extensions.WasmPlugin{
Selector: &istiotype.WorkloadSelector{
MatchLabels: map[string]string{
"higress": m.namespace + "-higress-gateway",
m.commonOptions.GatewaySelectorKey: m.commonOptions.GatewaySelectorValue,
},
},
Url: obj.Url,

View File

@@ -55,6 +55,7 @@ constexpr std::string_view Host(":authority");
constexpr std::string_view Path(":path");
constexpr std::string_view EnvoyOriginalPath("x-envoy-original-path");
constexpr std::string_view Accept("accept");
constexpr std::string_view ContentDisposition("content-disposition");
constexpr std::string_view ContentMD5("content-md5");
constexpr std::string_view ContentType("content-type");
constexpr std::string_view ContentLength("content-length");
@@ -68,6 +69,7 @@ constexpr std::string_view StrictTransportSecurity("strict-transport-security");
namespace ContentTypeValues {
constexpr std::string_view Grpc{"application/grpc"};
constexpr std::string_view Json{"application/json"};
constexpr std::string_view MultipartFormData{"multipart/form-data"};
} // namespace ContentTypeValues
class PercentEncoding {

View File

@@ -16,6 +16,7 @@
#include <array>
#include <limits>
#include <regex>
#include "absl/strings/str_cat.h"
#include "absl/strings/str_split.h"
@@ -123,6 +124,7 @@ bool PluginRootContext::configure(size_t configuration_size) {
}
FilterHeadersStatus PluginRootContext::onHeader(
PluginContext& ctx,
const ModelRouterConfigRule& rule) {
if (!Wasm::Common::Http::hasRequestBody()) {
return FilterHeadersStatus::Continue;
@@ -150,19 +152,49 @@ FilterHeadersStatus PluginRootContext::onHeader(
if (!enable) {
return FilterHeadersStatus::Continue;
}
auto content_type_value =
auto content_type_ptr =
getRequestHeader(Wasm::Common::Http::Header::ContentType);
if (!absl::StrContains(content_type_value->view(),
auto content_type_value = content_type_ptr->view();
LOG_DEBUG(absl::StrCat("Content-Type: ", content_type_value));
if (absl::StrContains(content_type_value,
Wasm::Common::Http::ContentTypeValues::Json)) {
return FilterHeadersStatus::Continue;
ctx.mode_ = MODE_JSON;
LOG_DEBUG("Enable JSON mode.");
removeRequestHeader(Wasm::Common::Http::Header::ContentLength);
setFilterState(SetDecoderBufferLimitKey, DefaultMaxBodyBytes);
LOG_INFO(absl::StrCat("SetRequestBodyBufferLimit: ", DefaultMaxBodyBytes));
return FilterHeadersStatus::StopIteration;
}
removeRequestHeader(Wasm::Common::Http::Header::ContentLength);
setFilterState(SetDecoderBufferLimitKey, DefaultMaxBodyBytes);
LOG_INFO(absl::StrCat("SetRequestBodyBufferLimit: ", DefaultMaxBodyBytes));
return FilterHeadersStatus::StopIteration;
if (absl::StrContains(content_type_value,
Wasm::Common::Http::ContentTypeValues::MultipartFormData)) {
// Get the boundary from the content type
auto boundary_start = content_type_value.find("boundary=");
if (boundary_start == std::string::npos) {
LOG_WARN(absl::StrCat("No boundary found in a multipart/form-data content-type: ", content_type_value));
return FilterHeadersStatus::Continue;
}
boundary_start += 9;
auto boundary_end = content_type_value.find(';', boundary_start);
if (boundary_end == std::string::npos) {
boundary_end = content_type_value.size();
}
auto boundary_length = boundary_end - boundary_start;
if (boundary_length < 1 || boundary_length > 70) {
// See https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html
LOG_WARN(absl::StrCat("Invalid boundary value in a multipart/form-data content-type: ", content_type_value));
return FilterHeadersStatus::Continue;
}
auto boundary_value = content_type_value.substr(boundary_start, boundary_end - boundary_start);
ctx.mode_ = MODE_MULTIPART;
ctx.boundary_ = boundary_value;
LOG_DEBUG(absl::StrCat("Enable multipart/form-data mode. Boundary=", boundary_value));
removeRequestHeader(Wasm::Common::Http::Header::ContentLength);
return FilterHeadersStatus::StopIteration;
}
return FilterHeadersStatus::Continue;
}
FilterDataStatus PluginRootContext::onBody(const ModelRouterConfigRule& rule,
FilterDataStatus PluginRootContext::onJsonBody(const ModelRouterConfigRule& rule,
std::string_view body) {
const auto& model_key = rule.model_key_;
const auto& add_provider_header = rule.add_provider_header_;
@@ -198,10 +230,85 @@ FilterDataStatus PluginRootContext::onBody(const ModelRouterConfigRule& rule,
return FilterDataStatus::Continue;
}
FilterDataStatus PluginRootContext::onMultipartBody(
PluginContext& ctx,
const ModelRouterConfigRule& rule,
WasmDataPtr& body,
bool end_stream) {
const auto& add_provider_header = rule.add_provider_header_;
const auto& model_to_header = rule.model_to_header_;
const auto boundary = ctx.boundary_;
const auto body_view = body->view();
const auto model_param_header = absl::StrCat("Content-Disposition: form-data; name=\"", rule.model_key_, "\"");
for (size_t pos = 0; (pos = body_view.find(boundary, pos)) != std::string_view::npos;) {
LOG_DEBUG(absl::StrCat("Found boundary at ", pos));
pos += boundary.length();
size_t end_pos = body_view.find(boundary, pos);
if (end_pos == std::string_view::npos) {
end_pos = body_view.length();
}
std::string_view part = body_view.substr(pos, end_pos - pos);
LOG_DEBUG(absl::StrCat("Part: ", part));
auto part_pos = pos;
pos = end_pos;
// Check if this part contains the model parameter
if (!absl::StrContains(part, model_param_header)) {
LOG_DEBUG("Part does not contain model parameter");
continue;
}
size_t value_start = part.find(CRLF_CRLF);
if (value_start == std::string_view::npos) {
LOG_DEBUG("No value start found in part");
break;
}
value_start += 4; // Skip the "\r\n\r\n"
// model parameter should be only one line
size_t value_end = part.find(CRLF, value_start);
if (value_end == std::string_view::npos) {
LOG_DEBUG("No value end found in part");
break;
}
auto model_value = part.substr(value_start, value_end - value_start);
LOG_DEBUG(absl::StrCat("Model value: ", model_value));
if (!model_to_header.empty()) {
replaceRequestHeader(model_to_header, model_value);
}
if (!add_provider_header.empty()) {
auto pos = model_value.find('/');
if (pos != std::string::npos) {
const auto& provider = model_value.substr(0, pos);
const auto& model = model_value.substr(pos + 1);
replaceRequestHeader(add_provider_header, provider);
size_t new_size = 0;
auto new_buffer_data = absl::StrCat(body_view.substr(0, part_pos + value_start), model, body_view.substr(part_pos + value_end));
auto result = setBuffer(WasmBufferType::HttpRequestBody, 0, std::numeric_limits<size_t>::max(), new_buffer_data, &new_size);
LOG_DEBUG(absl::StrCat("model route to provider:", provider,
", model:", model));
LOG_DEBUG(absl::StrCat("result=", result, " new_size=", new_size));
} else {
LOG_DEBUG(absl::StrCat("model route to provider not work, model:",
model_value));
}
}
// We are done now. We can stop processing the body.
LOG_DEBUG(absl::StrCat("Done processing multipart body after caching ", body_view.length() , " bytes."));
ctx.mode_ = MODE_BYPASS;
return FilterDataStatus::Continue;
}
if (end_stream) {
LOG_DEBUG("No model parameter found in the body");
return FilterDataStatus::Continue;
}
return FilterDataStatus::StopIterationAndBuffer;
}
FilterHeadersStatus PluginContext::onRequestHeaders(uint32_t, bool) {
auto* rootCtx = rootContext();
return rootCtx->onHeaders([rootCtx, this](const auto& config) {
auto ret = rootCtx->onHeader(config);
auto ret = rootCtx->onHeader(*this, config);
if (ret == FilterHeadersStatus::StopIteration) {
this->config_ = &config;
}
@@ -214,14 +321,28 @@ FilterDataStatus PluginContext::onRequestBody(size_t body_size,
if (config_ == nullptr) {
return FilterDataStatus::Continue;
}
body_total_size_ += body_size;
if (!end_stream) {
return FilterDataStatus::StopIterationAndBuffer;
}
auto body =
getBufferBytes(WasmBufferType::HttpRequestBody, 0, body_total_size_);
auto* rootCtx = rootContext();
return rootCtx->onBody(*config_, body->view());
body_total_size_ += body_size;
switch (mode_) {
case MODE_JSON:
{
if (!end_stream) {
return FilterDataStatus::StopIterationAndBuffer;
}
auto body =
getBufferBytes(WasmBufferType::HttpRequestBody, 0, body_total_size_);
return rootCtx->onJsonBody(*config_, body->view());
}
case MODE_MULTIPART:
{
auto body =
getBufferBytes(WasmBufferType::HttpRequestBody, 0, body_total_size_);
return rootCtx->onMultipartBody(*this, *config_, body, end_stream);
}
case MODE_BYPASS:
default:
return FilterDataStatus::Continue;
}
}
#ifdef NULL_PLUGIN

View File

@@ -36,6 +36,13 @@ namespace model_router {
#endif
#define MODE_BYPASS 0
#define MODE_JSON 1
#define MODE_MULTIPART 2
#define CRLF ("\r\n")
#define CRLF_CRLF ("\r\n\r\n")
struct ModelRouterConfigRule {
std::string model_key_ = "model";
std::string add_provider_header_;
@@ -45,6 +52,8 @@ struct ModelRouterConfigRule {
"/audio/speech", "/fine_tuning/jobs", "/moderations"};
};
class PluginContext;
// PluginRootContext is the root context for all streams processed by the
// thread. It has the same lifetime as the worker thread and acts as target for
// interactions that outlives individual stream, e.g. timer, async calls.
@@ -55,8 +64,9 @@ class PluginRootContext : public RootContext,
: RootContext(id, root_id) {}
~PluginRootContext() {}
bool onConfigure(size_t) override;
FilterHeadersStatus onHeader(const ModelRouterConfigRule&);
FilterDataStatus onBody(const ModelRouterConfigRule&, std::string_view);
FilterHeadersStatus onHeader(PluginContext& ctx, const ModelRouterConfigRule&);
FilterDataStatus onJsonBody(const ModelRouterConfigRule&, std::string_view);
FilterDataStatus onMultipartBody(PluginContext& ctx, const ModelRouterConfigRule& rule, WasmDataPtr& body, bool end_stream);
bool configure(size_t);
private:
@@ -69,6 +79,8 @@ class PluginContext : public Context {
explicit PluginContext(uint32_t id, RootContext* root) : Context(id, root) {}
FilterHeadersStatus onRequestHeaders(uint32_t, bool) override;
FilterDataStatus onRequestBody(size_t, bool) override;
int mode_;
std::string boundary_;
private:
inline PluginRootContext* rootContext() {

View File

@@ -15,6 +15,7 @@
#include "extensions/model_router/plugin.h"
#include <cstddef>
#include <regex>
#include "gmock/gmock.h"
#include "gtest/gtest.h"
@@ -86,7 +87,7 @@ class ModelRouterTest : public ::testing::Test {
.WillByDefault([&](WasmHeaderMapType, std::string_view header,
std::string_view* result) {
if (header == "content-type") {
*result = "application/json";
*result = content_type_;
} else if (header == "content-length") {
*result = "1024";
} else if (header == ":path") {
@@ -125,6 +126,7 @@ class ModelRouterTest : public ::testing::Test {
std::unique_ptr<PluginContext> context_;
std::string route_name_;
std::string path_;
std::string content_type_ = "application/json";
BufferBase body_;
BufferBase config_;
};
@@ -133,7 +135,7 @@ TEST_F(ModelRouterTest, RewriteModelAndHeader) {
std::string configuration = R"(
{
"addProviderHeader": "x-higress-llm-provider"
})";
})";
config_.set(configuration);
EXPECT_TRUE(root_context_->configure(configuration.size()));
@@ -155,14 +157,14 @@ TEST_F(ModelRouterTest, RewriteModelAndHeader) {
body_.set(request_json);
EXPECT_EQ(context_->onRequestHeaders(0, false),
FilterHeadersStatus::StopIteration);
EXPECT_EQ(context_->onRequestBody(28, true), FilterDataStatus::Continue);
EXPECT_EQ(context_->onRequestBody(request_json.length(), true), FilterDataStatus::Continue);
}
TEST_F(ModelRouterTest, ModelToHeader) {
std::string configuration = R"(
{
"modelToHeader": "x-higress-llm-model"
})";
})";
config_.set(configuration);
EXPECT_TRUE(root_context_->configure(configuration.size()));
@@ -181,14 +183,14 @@ TEST_F(ModelRouterTest, ModelToHeader) {
body_.set(request_json);
EXPECT_EQ(context_->onRequestHeaders(0, false),
FilterHeadersStatus::StopIteration);
EXPECT_EQ(context_->onRequestBody(28, true), FilterDataStatus::Continue);
EXPECT_EQ(context_->onRequestBody(request_json.length(), true), FilterDataStatus::Continue);
}
TEST_F(ModelRouterTest, IgnorePath) {
std::string configuration = R"(
{
"addProviderHeader": "x-higress-llm-provider"
})";
})";
config_.set(configuration);
EXPECT_TRUE(root_context_->configure(configuration.size()));
@@ -208,7 +210,7 @@ TEST_F(ModelRouterTest, IgnorePath) {
body_.set(request_json);
EXPECT_EQ(context_->onRequestHeaders(0, false),
FilterHeadersStatus::Continue);
EXPECT_EQ(context_->onRequestBody(28, true), FilterDataStatus::Continue);
EXPECT_EQ(context_->onRequestBody(request_json.length(), true), FilterDataStatus::Continue);
}
TEST_F(ModelRouterTest, RouteLevelRewriteModelAndHeader) {
@@ -242,7 +244,178 @@ TEST_F(ModelRouterTest, RouteLevelRewriteModelAndHeader) {
route_name_ = "route-a";
EXPECT_EQ(context_->onRequestHeaders(0, false),
FilterHeadersStatus::StopIteration);
EXPECT_EQ(context_->onRequestBody(28, true), FilterDataStatus::Continue);
EXPECT_EQ(context_->onRequestBody(request_json.length(), true), FilterDataStatus::Continue);
}
TEST_F(ModelRouterTest, RewriteModelAndHeaderMultipartFormData) {
std::string configuration = R"({
"addProviderHeader": "x-higress-llm-provider"
})";
config_.set(configuration);
EXPECT_TRUE(root_context_->configure(configuration.size()));
path_ = "/v1/chat/completions";
content_type_ = "multipart/form-data; boundary=--------------------------100751621174704322650451";
std::string request_data = std::regex_replace(R"(
----------------------------100751621174704322650451
Content-Disposition: form-data; name="purpose"
batch
----------------------------100751621174704322650451
Content-Disposition: form-data; name="model"
qwen/qwen-turbo
----------------------------100751621174704322650451
Content-Disposition: form-data; name="file"; filename="test-data.json"
Content-Type: application/json
[
]
----------------------------100751621174704322650451--
)", std::regex("\n"), "\r\n"); // Multipart data requires CRLF line endings
EXPECT_CALL(*mock_context_,
setBuffer(testing::_, testing::_, testing::_, testing::_))
.WillOnce([&](WasmBufferType, size_t start, size_t length, std::string_view body) {
std::cerr << "===============" << "\n";
std::cerr << body << "\n";
std::cerr << "===============" << "\n";
EXPECT_EQ(start, 0);
EXPECT_EQ(length, std::numeric_limits<size_t>::max());
auto expected_body= std::regex_replace(R"(
----------------------------100751621174704322650451
Content-Disposition: form-data; name="purpose"
batch
----------------------------100751621174704322650451
Content-Disposition: form-data; name="model"
qwen-turbo
)", std::regex("\n"), "\r\n"); // Multipart data requires CRLF line endings
EXPECT_EQ(body, expected_body);
return WasmResult::Ok;
});
EXPECT_CALL(*mock_context_,
replaceHeaderMapValue(testing::_,
std::string_view("x-higress-llm-provider"),
std::string_view("qwen")));
body_.set(request_data);
EXPECT_EQ(context_->onRequestHeaders(0, false),
FilterHeadersStatus::StopIteration);
auto last_body_size = 0;
auto body = request_data.substr(0, request_data.find("batch") + 5 + 2 /* batch + CRLF */);
body_.set(body);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, false), FilterDataStatus::StopIterationAndBuffer);
last_body_size = body.size();
body = request_data.substr(0, request_data.find("\"model\"") + 5 + 2 + 2 /* "model" + CRLF + CRLF */);
body_.set(body);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, false), FilterDataStatus::StopIterationAndBuffer);
last_body_size = body.size();
body = request_data.substr(0, request_data.find("qwen") + 4 /* "qwen" */);
body_.set(body);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, false), FilterDataStatus::StopIterationAndBuffer);
last_body_size = body.size();
body = request_data.substr(0, request_data.find("qwen-turbo") + 10 /* "qwen-turbo" */);
body_.set(body);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, false), FilterDataStatus::StopIterationAndBuffer);
last_body_size = body.size();
body = request_data.substr(0, request_data.find("qwen-turbo") + 10 + 2 /* "qwen-turbo" + CRLF */);
body_.set(body);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, false), FilterDataStatus::Continue);
last_body_size = body.size();
body = request_data.substr(0, request_data.find("qwen-turbo") + 10 + 2 + 50 /* "qwen-turbo" + CRLF + boundary */);
body_.set(body);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, false), FilterDataStatus::Continue);
last_body_size = body.size();
body_.set(request_data);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, true), FilterDataStatus::Continue);
}
TEST_F(ModelRouterTest, ModelToHeaderMultipartFormData) {
std::string configuration = R"(
{
"modelToHeader": "x-higress-llm-model"
})";
config_.set(configuration);
EXPECT_TRUE(root_context_->configure(configuration.size()));
path_ = "/v1/chat/completions";
content_type_ = "multipart/form-data; boundary=--------------------------100751621174704322650451";
std::string request_data = std::regex_replace(R"(
----------------------------100751621174704322650451
Content-Disposition: form-data; name="purpose"
batch
----------------------------100751621174704322650451
Content-Disposition: form-data; name="model"
qwen-max
----------------------------100751621174704322650451
Content-Disposition: form-data; name="file"; filename="test-data.json"
Content-Type: application/json
[
]
----------------------------100751621174704322650451--
)", std::regex("\n"), "\r\n"); // Multipart data requires CRLF line endings
EXPECT_CALL(*mock_context_,
setBuffer(testing::_, testing::_, testing::_, testing::_))
.Times(0);
EXPECT_CALL(
*mock_context_,
replaceHeaderMapValue(testing::_, std::string_view("x-higress-llm-model"),
std::string_view("qwen-max")));
EXPECT_EQ(context_->onRequestHeaders(0, false),
FilterHeadersStatus::StopIteration);
auto last_body_size = 0;
auto body = request_data.substr(0, request_data.find("batch") + 5 + 2 /* batch + CRLF */);
body_.set(body);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, false), FilterDataStatus::StopIterationAndBuffer);
last_body_size = body.size();
body = request_data.substr(0, request_data.find("\"model\"") + 5 + 2 + 2 /* "model" + CRLF + CRLF */);
body_.set(body);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, false), FilterDataStatus::StopIterationAndBuffer);
last_body_size = body.size();
body = request_data.substr(0, request_data.find("qwen") + 4 /* "qwen" */);
body_.set(body);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, false), FilterDataStatus::StopIterationAndBuffer);
last_body_size = body.size();
body = request_data.substr(0, request_data.find("qwen-max") + 8 /* "qwen-max" */);
body_.set(body);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, false), FilterDataStatus::StopIterationAndBuffer);
last_body_size = body.size();
body = request_data.substr(0, request_data.find("qwen-max") + 8 + 2 /* "qwen-max" + CRLF */);
body_.set(body);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, false), FilterDataStatus::Continue);
last_body_size = body.size();
body = request_data.substr(0, request_data.find("qwen-max") + 8 + 2 + 50 /* "qwen-max" + CRLF */);
body_.set(body);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, false), FilterDataStatus::Continue);
last_body_size = body.size();
body_.set(request_data);
EXPECT_EQ(context_->onRequestBody(body.size() - last_body_size, true), FilterDataStatus::Continue);
}
} // namespace model_router

View File

@@ -0,0 +1,817 @@
package provider
import (
"bytes"
"crypto/hmac"
"crypto/sha256"
"encoding/binary"
"encoding/hex"
"encoding/json"
"errors"
"fmt"
"hash"
"hash/crc32"
"io"
"net/http"
"strings"
"time"
"github.com/alibaba/higress/plugins/wasm-go/extensions/ai-proxy/util"
"github.com/alibaba/higress/plugins/wasm-go/pkg/log"
"github.com/alibaba/higress/plugins/wasm-go/pkg/wrapper"
"github.com/higress-group/proxy-wasm-go-sdk/proxywasm"
"github.com/higress-group/proxy-wasm-go-sdk/proxywasm/types"
)
const (
httpPostMethod = "POST"
awsService = "bedrock"
// bedrock-runtime.{awsRegion}.amazonaws.com
bedrockDefaultDomain = "bedrock-runtime.%s.amazonaws.com"
// converse路径 /model/{modelId}/converse
bedrockChatCompletionPath = "/model/%s/converse"
// converseStream路径 /model/{modelId}/converse-stream
bedrockStreamChatCompletionPath = "/model/%s/converse-stream"
bedrockSignedHeaders = "host;x-amz-date"
)
type bedrockProviderInitializer struct {
}
func (b *bedrockProviderInitializer) ValidateConfig(config *ProviderConfig) error {
if len(config.awsAccessKey) == 0 || len(config.awsSecretKey) == 0 {
return errors.New("missing bedrock access authentication parameters")
}
if len(config.awsRegion) == 0 {
return errors.New("missing bedrock region parameters")
}
return nil
}
func (b *bedrockProviderInitializer) DefaultCapabilities() map[string]string {
return map[string]string{
string(ApiNameChatCompletion): bedrockChatCompletionPath,
}
}
func (b *bedrockProviderInitializer) CreateProvider(config ProviderConfig) (Provider, error) {
config.setDefaultCapabilities(b.DefaultCapabilities())
return &bedrockProvider{
config: config,
contextCache: createContextCache(&config),
}, nil
}
type bedrockProvider struct {
config ProviderConfig
contextCache *contextCache
}
func (b *bedrockProvider) OnStreamingResponseBody(ctx wrapper.HttpContext, name ApiName, chunk []byte, isLastChunk bool) ([]byte, error) {
events := extractAmazonEventStreamEvents(ctx, chunk)
if len(events) == 0 {
return chunk, fmt.Errorf("No events are extracted ")
}
var responseBuilder strings.Builder
for _, event := range events {
outputEvent, err := b.convertEventFromBedrockToOpenAI(ctx, event)
if err != nil {
log.Errorf("[onStreamingResponseBody] failed to process streaming event: %v\n%s", err, chunk)
return chunk, err
}
responseBuilder.WriteString(string(outputEvent))
}
return []byte(responseBuilder.String()), nil
}
func (b *bedrockProvider) convertEventFromBedrockToOpenAI(ctx wrapper.HttpContext, bedrockEvent ConverseStreamEvent) ([]byte, error) {
choices := make([]chatCompletionChoice, 0)
chatChoice := chatCompletionChoice{
Delta: &chatMessage{},
}
if bedrockEvent.Role != nil {
chatChoice.Delta.Role = *bedrockEvent.Role
}
if bedrockEvent.Delta != nil {
chatChoice.Delta = &chatMessage{Content: bedrockEvent.Delta.Text}
}
if bedrockEvent.StopReason != nil {
chatChoice.FinishReason = stopReasonBedrock2OpenAI(*bedrockEvent.StopReason)
}
choices = append(choices, chatChoice)
requestId := ctx.GetStringContext("X-Amzn-Requestid", "")
openAIFormattedChunk := &chatCompletionResponse{
Id: requestId,
Created: time.Now().UnixMilli() / 1000,
Model: ctx.GetStringContext(ctxKeyFinalRequestModel, ""),
SystemFingerprint: "",
Object: objectChatCompletion,
Choices: choices,
}
if bedrockEvent.Usage != nil {
openAIFormattedChunk.Choices = choices[:0]
openAIFormattedChunk.Usage = usage{
CompletionTokens: bedrockEvent.Usage.OutputTokens,
PromptTokens: bedrockEvent.Usage.InputTokens,
TotalTokens: bedrockEvent.Usage.TotalTokens,
}
}
openAIFormattedChunkBytes, _ := json.Marshal(openAIFormattedChunk)
var openAIChunk strings.Builder
openAIChunk.WriteString(ssePrefix)
openAIChunk.WriteString(string(openAIFormattedChunkBytes))
openAIChunk.WriteString("\n\n")
return []byte(openAIChunk.String()), nil
}
type ConverseStreamEvent struct {
ContentBlockIndex int `json:"contentBlockIndex,omitempty"`
Delta *converseStreamEventContentBlockDelta `json:"delta,omitempty"`
Role *string `json:"role,omitempty"`
StopReason *string `json:"stopReason,omitempty"`
Usage *tokenUsage `json:"usage,omitempty"`
Start *contentBlockStart `json:"start,omitempty"`
}
type converseStreamEventContentBlockDelta struct {
Text *string `json:"text,omitempty"`
ToolUse *toolUseBlockDelta `json:"toolUse,omitempty"`
}
type toolUseBlockStart struct {
Name string `json:"name"`
ToolUseID string `json:"toolUseId"`
}
type contentBlockStart struct {
ToolUse *toolUseBlockStart `json:"toolUse,omitempty"`
}
type toolUseBlockDelta struct {
Input string `json:"input"`
}
func extractAmazonEventStreamEvents(ctx wrapper.HttpContext, chunk []byte) []ConverseStreamEvent {
body := chunk
if bufferedStreamingBody, has := ctx.GetContext(ctxKeyStreamingBody).([]byte); has {
body = append(bufferedStreamingBody, chunk...)
}
r := bytes.NewReader(body)
var events []ConverseStreamEvent
var lastRead int64 = -1
messageBuffer := make([]byte, 1024)
defer func() {
log.Infof("extractAmazonEventStreamEvents: lastRead=%d, r.Size=%d", lastRead, r.Size())
ctx.SetContext(ctxKeyStreamingBody, nil)
}()
for {
msg, err := decodeMessage(r, messageBuffer)
if err != nil {
if err == io.EOF {
break
}
log.Errorf("failed to decode message: %v", err)
break
}
var event ConverseStreamEvent
if err = json.Unmarshal(msg.Payload, &event); err == nil {
events = append(events, event)
}
lastRead = r.Size() - int64(r.Len())
}
return events
}
type bedrockStreamMessage struct {
Headers headers
Payload []byte
}
type EventFrame struct {
TotalLength uint32
HeadersLength uint32
PreludeCRC uint32
Headers map[string]interface{}
Payload []byte
PayloadCRC uint32
}
type headers []header
type header struct {
Name string
Value Value
}
func (hs *headers) Set(name string, value Value) {
var i int
for ; i < len(*hs); i++ {
if (*hs)[i].Name == name {
(*hs)[i].Value = value
return
}
}
*hs = append(*hs, header{
Name: name, Value: value,
})
}
func decodeMessage(reader io.Reader, payloadBuf []byte) (m bedrockStreamMessage, err error) {
crc := crc32.New(crc32.MakeTable(crc32.IEEE))
hashReader := io.TeeReader(reader, crc)
prelude, err := decodePrelude(hashReader, crc)
if err != nil {
return bedrockStreamMessage{}, err
}
if prelude.HeadersLen > 0 {
lr := io.LimitReader(hashReader, int64(prelude.HeadersLen))
m.Headers, err = decodeHeaders(lr)
if err != nil {
return bedrockStreamMessage{}, err
}
}
if payloadLen := prelude.PayloadLen(); payloadLen > 0 {
buf, err := decodePayload(payloadBuf, io.LimitReader(hashReader, int64(payloadLen)))
if err != nil {
return bedrockStreamMessage{}, err
}
m.Payload = buf
}
msgCRC := crc.Sum32()
if err := validateCRC(reader, msgCRC); err != nil {
return bedrockStreamMessage{}, err
}
return m, nil
}
func decodeHeaders(r io.Reader) (headers, error) {
hs := headers{}
for {
name, err := decodeHeaderName(r)
if err != nil {
if err == io.EOF {
// EOF while getting header name means no more headers
break
}
return nil, err
}
value, err := decodeHeaderValue(r)
if err != nil {
return nil, err
}
hs.Set(name, value)
}
return hs, nil
}
func decodeHeaderValue(r io.Reader) (Value, error) {
var raw rawValue
typ, err := decodeUint8(r)
if err != nil {
return nil, err
}
raw.Type = valueType(typ)
var v Value
switch raw.Type {
case stringValueType:
var tv StringValue
err = tv.decode(r)
v = tv
default:
log.Errorf("unknown value type %d", raw.Type)
}
// Error could be EOF, let caller deal with it
return v, err
}
type Value interface {
Get() interface{}
}
type StringValue string
func (v StringValue) Get() interface{} {
return string(v)
}
func (v *StringValue) decode(r io.Reader) error {
s, err := decodeStringValue(r)
if err != nil {
return err
}
*v = StringValue(s)
return nil
}
func decodeBytesValue(r io.Reader) ([]byte, error) {
var raw rawValue
var err error
raw.Len, err = decodeUint16(r)
if err != nil {
return nil, err
}
buf := make([]byte, raw.Len)
_, err = io.ReadFull(r, buf)
if err != nil {
return nil, err
}
return buf, nil
}
func decodeUint16(r io.Reader) (uint16, error) {
var b [2]byte
bs := b[:]
_, err := io.ReadFull(r, bs)
if err != nil {
return 0, err
}
return binary.BigEndian.Uint16(bs), nil
}
func decodeStringValue(r io.Reader) (string, error) {
v, err := decodeBytesValue(r)
return string(v), err
}
type rawValue struct {
Type valueType
Len uint16 // Only set for variable length slices
Value []byte // byte representation of value, BigEndian encoding.
}
type valueType uint8
const (
trueValueType valueType = iota
falseValueType
int8ValueType // Byte
int16ValueType // Short
int32ValueType // Integer
int64ValueType // Long
bytesValueType
stringValueType
timestampValueType
uuidValueType
)
func decodeHeaderName(r io.Reader) (string, error) {
var n headerName
var err error
n.Len, err = decodeUint8(r)
if err != nil {
return "", err
}
name := n.Name[:n.Len]
if _, err := io.ReadFull(r, name); err != nil {
return "", err
}
return string(name), nil
}
func decodeUint8(r io.Reader) (uint8, error) {
type byteReader interface {
ReadByte() (byte, error)
}
if br, ok := r.(byteReader); ok {
v, err := br.ReadByte()
return v, err
}
var b [1]byte
_, err := io.ReadFull(r, b[:])
return b[0], err
}
const maxHeaderNameLen = 255
type headerName struct {
Len uint8
Name [maxHeaderNameLen]byte
}
func decodePayload(buf []byte, r io.Reader) ([]byte, error) {
w := bytes.NewBuffer(buf[0:0])
_, err := io.Copy(w, r)
return w.Bytes(), err
}
type messagePrelude struct {
Length uint32
HeadersLen uint32
PreludeCRC uint32
}
func (p messagePrelude) ValidateLens() error {
if p.Length == 0 {
return fmt.Errorf("message prelude want: 16, have: %v", int(p.Length))
}
return nil
}
func (p messagePrelude) PayloadLen() uint32 {
return p.Length - p.HeadersLen - 16
}
func decodePrelude(r io.Reader, crc hash.Hash32) (messagePrelude, error) {
var p messagePrelude
var err error
p.Length, err = decodeUint32(r)
if err != nil {
return messagePrelude{}, err
}
p.HeadersLen, err = decodeUint32(r)
if err != nil {
return messagePrelude{}, err
}
if err := p.ValidateLens(); err != nil {
return messagePrelude{}, err
}
preludeCRC := crc.Sum32()
if err := validateCRC(r, preludeCRC); err != nil {
return messagePrelude{}, err
}
p.PreludeCRC = preludeCRC
return p, nil
}
func decodeUint32(r io.Reader) (uint32, error) {
var b [4]byte
bs := b[:]
_, err := io.ReadFull(r, bs)
if err != nil {
return 0, err
}
return binary.BigEndian.Uint32(bs), nil
}
func validateCRC(r io.Reader, expect uint32) error {
msgCRC, err := decodeUint32(r)
if err != nil {
return err
}
if msgCRC != expect {
return fmt.Errorf("message checksum mismatch")
}
return nil
}
func (b *bedrockProvider) TransformResponseHeaders(ctx wrapper.HttpContext, apiName ApiName, headers http.Header) {
ctx.SetContext("X-Amzn-Requestid", headers.Get("X-Amzn-Requestid"))
if headers.Get("Content-Type") == "application/vnd.amazon.eventstream" {
headers.Set("Content-Type", "text/event-stream; charset=utf-8")
}
headers.Del("Content-Length")
}
func (b *bedrockProvider) GetProviderType() string {
return providerTypeBedrock
}
func (b *bedrockProvider) OnRequestHeaders(ctx wrapper.HttpContext, apiName ApiName) error {
b.config.handleRequestHeaders(b, ctx, apiName)
return nil
}
func (b *bedrockProvider) TransformRequestHeaders(ctx wrapper.HttpContext, apiName ApiName, headers http.Header) {
util.OverwriteRequestHostHeader(headers, fmt.Sprintf(bedrockDefaultDomain, b.config.awsRegion))
}
func (b *bedrockProvider) OnRequestBody(ctx wrapper.HttpContext, apiName ApiName, body []byte) (types.Action, error) {
if !b.config.isSupportedAPI(apiName) {
return types.ActionContinue, errUnsupportedApiName
}
return b.config.handleRequestBody(b, b.contextCache, ctx, apiName, body)
}
func (b *bedrockProvider) insertHttpContextMessage(body []byte, content string, onlyOneSystemBeforeFile bool) ([]byte, error) {
request := &bedrockTextGenRequest{}
if err := json.Unmarshal(body, request); err != nil {
return nil, fmt.Errorf("unable to unmarshal request: %v", err)
}
if len(request.System) > 0 {
request.System = append(request.System, systemContentBlock{Text: content})
} else {
request.System = []systemContentBlock{{Text: content}}
}
requestBytes, err := json.Marshal(request)
b.setAuthHeaders(requestBytes, nil)
return requestBytes, err
}
func (b *bedrockProvider) TransformRequestBodyHeaders(ctx wrapper.HttpContext, apiName ApiName, body []byte, headers http.Header) ([]byte, error) {
switch apiName {
case ApiNameChatCompletion:
return b.onChatCompletionRequestBody(ctx, body, headers)
default:
return b.config.defaultTransformRequestBody(ctx, apiName, body)
}
}
func (b *bedrockProvider) TransformResponseBody(ctx wrapper.HttpContext, apiName ApiName, body []byte) ([]byte, error) {
if apiName == ApiNameChatCompletion {
return b.onChatCompletionResponseBody(ctx, body)
}
return nil, errUnsupportedApiName
}
func (b *bedrockProvider) onChatCompletionResponseBody(ctx wrapper.HttpContext, body []byte) ([]byte, error) {
bedrockResponse := &bedrockConverseResponse{}
if err := json.Unmarshal(body, bedrockResponse); err != nil {
log.Errorf("unable to unmarshal bedrock response: %v", err)
return nil, fmt.Errorf("unable to unmarshal bedrock response: %v", err)
}
response := b.buildChatCompletionResponse(ctx, bedrockResponse)
return json.Marshal(response)
}
func (b *bedrockProvider) onChatCompletionRequestBody(ctx wrapper.HttpContext, body []byte, headers http.Header) ([]byte, error) {
request := &chatCompletionRequest{}
err := b.config.parseRequestAndMapModel(ctx, request, body)
if err != nil {
return nil, err
}
streaming := request.Stream
headers.Set("Accept", "*/*")
if streaming {
util.OverwriteRequestPathHeader(headers, fmt.Sprintf(bedrockStreamChatCompletionPath, request.Model))
} else {
util.OverwriteRequestPathHeader(headers, fmt.Sprintf(bedrockChatCompletionPath, request.Model))
}
return b.buildBedrockTextGenerationRequest(request, headers)
}
func (b *bedrockProvider) buildBedrockTextGenerationRequest(origRequest *chatCompletionRequest, headers http.Header) ([]byte, error) {
messages := make([]bedrockMessage, 0, len(origRequest.Messages))
for i := range origRequest.Messages {
messages = append(messages, chatMessage2BedrockMessage(origRequest.Messages[i]))
}
request := &bedrockTextGenRequest{
Messages: messages,
InferenceConfig: bedrockInferenceConfig{
MaxTokens: origRequest.MaxTokens,
Temperature: origRequest.Temperature,
TopP: origRequest.TopP,
},
AdditionalModelRequestFields: map[string]interface{}{},
PerformanceConfig: PerformanceConfiguration{
Latency: "standard",
},
}
requestBytes, err := json.Marshal(request)
b.setAuthHeaders(requestBytes, headers)
return requestBytes, err
}
func (b *bedrockProvider) buildChatCompletionResponse(ctx wrapper.HttpContext, bedrockResponse *bedrockConverseResponse) *chatCompletionResponse {
var outputContent string
if len(bedrockResponse.Output.Message.Content) > 0 {
outputContent = bedrockResponse.Output.Message.Content[0].Text
}
choices := []chatCompletionChoice{
{
Index: 0,
Message: &chatMessage{
Role: bedrockResponse.Output.Message.Role,
Content: outputContent,
},
FinishReason: stopReasonBedrock2OpenAI(bedrockResponse.StopReason),
},
}
requestId := ctx.GetStringContext("X-Amzn-Requestid", "")
return &chatCompletionResponse{
Id: requestId,
Created: time.Now().UnixMilli() / 1000,
Model: ctx.GetStringContext(ctxKeyFinalRequestModel, ""),
SystemFingerprint: "",
Object: objectChatCompletion,
Choices: choices,
Usage: usage{
PromptTokens: bedrockResponse.Usage.InputTokens,
CompletionTokens: bedrockResponse.Usage.OutputTokens,
TotalTokens: bedrockResponse.Usage.TotalTokens,
},
}
}
func stopReasonBedrock2OpenAI(reason string) string {
switch reason {
case "end_turn":
return finishReasonStop
case "stop_sequence":
return finishReasonStop
case "max_tokens":
return finishReasonLength
default:
return reason
}
}
type bedrockTextGenRequest struct {
Messages []bedrockMessage `json:"messages"`
System []systemContentBlock `json:"system,omitempty"`
InferenceConfig bedrockInferenceConfig `json:"inferenceConfig,omitempty"`
AdditionalModelRequestFields map[string]interface{} `json:"additionalModelRequestFields,omitempty"`
PerformanceConfig PerformanceConfiguration `json:"performanceConfig,omitempty"`
}
type PerformanceConfiguration struct {
Latency string `json:"latency,omitempty"`
}
type bedrockMessage struct {
Role string `json:"role"`
Content []bedrockMessageContent `json:"content"`
}
type bedrockMessageContent struct {
Text string `json:"text,omitempty"`
Image *imageBlock `json:"image,omitempty"`
}
type systemContentBlock struct {
Text string `json:"text,omitempty"`
}
type imageBlock struct {
Format string `json:"format,omitempty"`
Source imageSource `json:"source,omitempty"`
}
type imageSource struct {
Bytes string `json:"bytes,omitempty"`
}
type bedrockInferenceConfig struct {
StopSequences []string `json:"stopSequences,omitempty"`
MaxTokens int `json:"maxTokens,omitempty"`
Temperature float64 `json:"temperature,omitempty"`
TopP float64 `json:"topP,omitempty"`
}
type bedrockConverseResponse struct {
Metrics converseMetrics `json:"metrics"`
Output converseOutputMemberMessage `json:"output"`
StopReason string `json:"stopReason"`
Usage tokenUsage `json:"usage"`
}
type converseMetrics struct {
LatencyMs int `json:"latencyMs"`
}
type converseOutputMemberMessage struct {
Message message `json:"message"`
}
type message struct {
Content []contentBlockMemberText `json:"content"`
Role string `json:"role"`
}
type contentBlockMemberText struct {
Text string `json:"text"`
}
type tokenUsage struct {
InputTokens int `json:"inputTokens,omitempty"`
OutputTokens int `json:"outputTokens,omitempty"`
TotalTokens int `json:"totalTokens"`
}
func chatMessage2BedrockMessage(chatMessage chatMessage) bedrockMessage {
if chatMessage.IsStringContent() {
return bedrockMessage{
Role: chatMessage.Role,
Content: []bedrockMessageContent{{Text: chatMessage.StringContent()}},
}
} else {
var contents []bedrockMessageContent
openaiContent := chatMessage.ParseContent()
for _, part := range openaiContent {
var content bedrockMessageContent
if part.Type == contentTypeText {
content.Text = part.Text
} else {
log.Warnf("imageUrl is not supported: %s", part.Type)
continue
}
contents = append(contents, content)
}
return bedrockMessage{
Role: chatMessage.Role,
Content: contents,
}
}
}
func (b *bedrockProvider) setAuthHeaders(body []byte, headers http.Header) {
t := time.Now().UTC()
amzDate := t.Format("20060102T150405Z")
dateStamp := t.Format("20060102")
path, _ := proxywasm.GetHttpRequestHeader(":path")
if headers != nil {
path = headers.Get(":path")
}
signature := b.generateSignature(path, amzDate, dateStamp, body)
if headers != nil {
headers.Set("X-Amz-Date", amzDate)
headers.Set("Authorization", fmt.Sprintf("AWS4-HMAC-SHA256 Credential=%s/%s/%s/%s/aws4_request, SignedHeaders=%s, Signature=%s", b.config.awsAccessKey, dateStamp, b.config.awsRegion, awsService, bedrockSignedHeaders, signature))
} else {
_ = proxywasm.ReplaceHttpRequestHeader("X-Amz-Date", amzDate)
_ = proxywasm.ReplaceHttpRequestHeader("Authorization", fmt.Sprintf("AWS4-HMAC-SHA256 Credential=%s/%s/%s/%s/aws4_request, SignedHeaders=%s, Signature=%s", b.config.awsAccessKey, dateStamp, b.config.awsRegion, awsService, bedrockSignedHeaders, signature))
}
}
func (b *bedrockProvider) generateSignature(path, amzDate, dateStamp string, body []byte) string {
hashedPayload := sha256Hex(body)
path = urlEncoding(path)
endpoint := fmt.Sprintf(bedrockDefaultDomain, b.config.awsRegion)
canonicalHeaders := fmt.Sprintf("host:%s\nx-amz-date:%s\n", endpoint, amzDate)
canonicalRequest := fmt.Sprintf("%s\n%s\n\n%s\n%s\n%s",
httpPostMethod, path, canonicalHeaders, bedrockSignedHeaders, hashedPayload)
credentialScope := fmt.Sprintf("%s/%s/%s/aws4_request", dateStamp, b.config.awsRegion, awsService)
hashedCanonReq := sha256Hex([]byte(canonicalRequest))
stringToSign := fmt.Sprintf("AWS4-HMAC-SHA256\n%s\n%s\n%s",
amzDate, credentialScope, hashedCanonReq)
signingKey := getSignatureKey(b.config.awsSecretKey, dateStamp, b.config.awsRegion, awsService)
signature := hmacHex(signingKey, stringToSign)
return signature
}
func urlEncoding(rawStr string) string {
encodedStr := strings.ReplaceAll(rawStr, ":", "%3A")
encodedStr = strings.ReplaceAll(encodedStr, "+", "%2B")
encodedStr = strings.ReplaceAll(encodedStr, "=", "%3D")
encodedStr = strings.ReplaceAll(encodedStr, "&", "%26")
encodedStr = strings.ReplaceAll(encodedStr, "$", "%24")
encodedStr = strings.ReplaceAll(encodedStr, "@", "%40")
return encodedStr
}
func getSignatureKey(key, dateStamp, region, service string) []byte {
kDate := hmacSha256([]byte("AWS4"+key), dateStamp)
kRegion := hmacSha256(kDate, region)
kService := hmacSha256(kRegion, service)
kSigning := hmacSha256(kService, "aws4_request")
return kSigning
}
func hmacSha256(key []byte, data string) []byte {
h := hmac.New(sha256.New, key)
h.Write([]byte(data))
return h.Sum(nil)
}
func sha256Hex(data []byte) string {
h := sha256.New()
h.Write(data)
return hex.EncodeToString(h.Sum(nil))
}
func hmacHex(key []byte, data string) string {
h := hmac.New(sha256.New, key)
h.Write([]byte(data))
return hex.EncodeToString(h.Sum(nil))
}

View File

@@ -68,6 +68,7 @@ const (
providerTypeCoze = "coze"
providerTypeTogetherAI = "together-ai"
providerTypeDify = "dify"
providerTypeBedrock = "bedrock"
protocolOpenAI = "openai"
protocolOriginal = "original"
@@ -138,6 +139,7 @@ var (
providerTypeCoze: &cozeProviderInitializer{},
providerTypeTogetherAI: &togetherAIProviderInitializer{},
providerTypeDify: &difyProviderInitializer{},
providerTypeBedrock: &bedrockProviderInitializer{},
}
)
@@ -242,6 +244,15 @@ type ProviderConfig struct {
// @Title zh-CN hunyuan api id for authorization
// @Description zh-CN 仅适用于Hun Yuan AI服务鉴权
hunyuanAuthId string `required:"false" yaml:"hunyuanAuthId" json:"hunyuanAuthId"`
// @Title zh-CN Amazon Bedrock AccessKey for authorization
// @Description zh-CN 仅适用于Amazon Bedrock服务鉴权API key/id 参考https://docs.aws.amazon.com/zh_cn/IAM/latest/UserGuide/reference_sigv.html
awsAccessKey string `required:"false" yaml:"awsAccessKey" json:"awsAccessKey"`
// @Title zh-CN Amazon Bedrock SecretKey for authorization
// @Description zh-CN 仅适用于Amazon Bedrock服务鉴权
awsSecretKey string `required:"false" yaml:"awsSecretKey" json:"awsSecretKey"`
// @Title zh-CN Amazon Bedrock Region
// @Description zh-CN 仅适用于Amazon Bedrock服务访问
awsRegion string `required:"false" yaml:"awsRegion" json:"awsRegion"`
// @Title zh-CN minimax API type
// @Description zh-CN 仅适用于 minimax 服务。minimax API 类型v2 和 pro 中选填一项,默认值为 v2
minimaxApiType string `required:"false" yaml:"minimaxApiType" json:"minimaxApiType"`
@@ -346,6 +357,9 @@ func (c *ProviderConfig) FromJson(json gjson.Result) {
c.claudeVersion = json.Get("claudeVersion").String()
c.hunyuanAuthId = json.Get("hunyuanAuthId").String()
c.hunyuanAuthKey = json.Get("hunyuanAuthKey").String()
c.awsAccessKey = json.Get("awsAccessKey").String()
c.awsSecretKey = json.Get("awsSecretKey").String()
c.awsRegion = json.Get("awsRegion").String()
c.minimaxApiType = json.Get("minimaxApiType").String()
c.minimaxGroupId = json.Get("minimaxGroupId").String()
c.cloudflareAccountId = json.Get("cloudflareAccountId").String()

View File

@@ -75,18 +75,22 @@ description: higress 支持通过集成搜索引擎Google/Bing/Arxiv/Elastics
## Elasticsearch 特定配置
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
|------|----------|----------|--------|-----------------------|
| index | string | 必填 | - | 要搜索的Elasticsearch索引名称 |
| contentField | string | 必填 | - | 要查询的内容字段名称 |
| semanticTextField | string | 必填 | - | 要查询的 embedding 字段名称 |
| linkField | string | 必填 | - | 结果链接字段名称 |
| titleField | string | 必填 | - | 结果标题字段名称 |
| username | string | 选填 | - | Elasticsearch 用户名 |
| password | string | 选填 | - | Elasticsearch 密码 |
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
|------|----------|------|--------|------------------------------------|
| index | string | 必填 | - | 要搜索的 Elasticsearch 索引名称 |
| contentField | string | 必填 | - | 要查询的内容字段名称 |
| semanticTextField | string | 必填 | - | 要查询的 embedding 字段名称 |
| linkField | string | 选填 | - | 结果链接字段名称,当配置 `needReference` 时需要填写 |
| titleField | string | 选填 | - | 结果标题字段名称,当配置 `needReference` 时需要填写 |
| username | string | 选填 | - | Elasticsearch 用户名 |
| password | string | 选填 | - | Elasticsearch 密码 |
混合搜索中使用的 [Reciprocal Rank Fusion (RRF)](https://www.elastic.co/guide/en/elasticsearch/reference/8.17/rrf.html) 查询要求 Elasticsearch 的版本在 8.8 及以上。
目前文档向量化依赖于 Elasticsearch 的 Embedding 模型,该功能需要 Elasticsearch 企业版 License或可使用 30 天的 Trial License。安装 Elasticsearch 内置 Embedding 模型的步骤可参考[该文档](https://www.elastic.co/docs/explore-analyze/machine-learning/nlp/ml-nlp-elser#alternative-download-deploy);若需安装第三方 Embedding 模型,可参考[该文档](https://www.elastic.co/docs/explore-analyze/machine-learning/nlp/ml-nlp-text-emb-vector-search-example)。
有关 ai-search 插件集成 Elasticsearch 的完整教程,请参考:[使用 LangChain + Higress + Elasticsearch 构建 RAG 应用](https://cr7258.github.io/blogs/original/2025/15-rag-higress-es-langchain)。
## Quark 特定配置
| 名称 | 数据类型 | 填写要求 | 默认值 | 描述 |
@@ -204,13 +208,9 @@ searchFrom:
searchFrom:
- type: elasticsearch
serviceName: "es-svc.static"
# 固定地址服务的端口默认是80
servicePort: 80
index: "knowledge_base"
contentField: "content"
semanticTextField: "semantic_text"
linkField: "url"
titleField: "title"
# username: "elastic"
# password: "password"
```

View File

@@ -80,13 +80,17 @@ It is strongly recommended to enable this feature when using Arxiv or Elasticsea
| index | string | Required | - | Elasticsearch index name to search |
| contentField | string | Required | - | Content field name to query |
| semanticTextField | string | Required | - | Embedding field name to query |
| linkField | string | Required | - | Result link field name |
| titleField | string | Required | - | Result title field name |
| linkField | string | Optional | - | Result link field name, needed when `needReference` is configured |
| titleField | string | Optional | - | Result title field name, needed when `needReference` is configured |
| username | string | Optional | - | Elasticsearch username |
| password | string | Optional | - | Elasticsearch password |
The [Reciprocal Rank Fusion (RRF)](https://www.elastic.co/guide/en/elasticsearch/reference/8.17/rrf.html) query used in hybrid search requires Elasticsearch version 8.8 or higher.
Currently, document vectorization relies on Elasticsearch's embedding model, which requires an Elasticsearch Enterprise license or a 30-day Trial license. To install the built-in embedding model in Elasticsearch, please refer to [this documentation](https://www.elastic.co/docs/explore-analyze/machine-learning/nlp/ml-nlp-elser#alternative-download-deploy). If you want to install a third-party embedding model, please refer to [this guide](https://www.elastic.co/docs/explore-analyze/machine-learning/nlp/ml-nlp-text-emb-vector-search-example).
For a complete tutorial on integrating the ai-search plugin with Elasticsearch, please refer to: [Building a RAG Application with LangChain + Higress + Elasticsearch](https://cr7258.github.io/blogs/original/2025/15-rag-higress-es-langchain).
## Quark Specific Configuration
| Name | Data Type | Requirement | Default Value | Description |
@@ -203,13 +207,9 @@ Note that excessive concurrency may lead to rate limiting, adjust according to a
searchFrom:
- type: elasticsearch
serviceName: "es-svc.static"
# static ip service use 80 as default port
servicePort: 80
index: "knowledge_base"
contentField: "content"
semanticTextField: "semantic_text"
linkField: "url"
titleField: "title"
# username: "elastic"
# password: "password"
```

View File

@@ -27,7 +27,7 @@ type ElasticsearchSearch struct {
password string
}
func NewElasticsearchSearch(config *gjson.Result) (*ElasticsearchSearch, error) {
func NewElasticsearchSearch(config *gjson.Result, needReference bool) (*ElasticsearchSearch, error) {
engine := &ElasticsearchSearch{}
serviceName := config.Get("serviceName").String()
if serviceName == "" {
@@ -35,7 +35,13 @@ func NewElasticsearchSearch(config *gjson.Result) (*ElasticsearchSearch, error)
}
servicePort := config.Get("servicePort").Int()
if servicePort == 0 {
return nil, errors.New("servicePort not found")
if strings.HasSuffix(serviceName, ".static") {
servicePort = 80
} else if strings.HasSuffix(serviceName, ".dns") {
servicePort = 443
} else {
return nil, errors.New("servicePort not found")
}
}
engine.client = wrapper.NewClusterClient(wrapper.FQDNCluster{
FQDN: serviceName,
@@ -54,14 +60,18 @@ func NewElasticsearchSearch(config *gjson.Result) (*ElasticsearchSearch, error)
if engine.semanticTextField == "" {
return nil, errors.New("semanticTextField not found")
}
engine.linkField = config.Get("linkField").String()
if engine.linkField == "" {
return nil, errors.New("linkField not found")
}
engine.titleField = config.Get("titleField").String()
if engine.titleField == "" {
return nil, errors.New("titleField not found")
if needReference {
engine.linkField = config.Get("linkField").String()
if engine.linkField == "" {
return nil, errors.New("linkField not found")
}
engine.titleField = config.Get("titleField").String()
if engine.titleField == "" {
return nil, errors.New("titleField not found")
}
}
engine.timeoutMillisecond = uint32(config.Get("timeoutMillisecond").Uint())
if engine.timeoutMillisecond == 0 {
engine.timeoutMillisecond = 5000
@@ -93,6 +103,9 @@ func (e ElasticsearchSearch) generateAuthorizationHeader() string {
func (e ElasticsearchSearch) generateQueryBody(ctx engine.SearchContext) string {
queryText := strings.Join(ctx.Querys, " ")
return fmt.Sprintf(`{
"_source":{
"excludes": "%s"
},
"retriever": {
"rrf": {
"retrievers": [
@@ -118,7 +131,7 @@ func (e ElasticsearchSearch) generateQueryBody(ctx engine.SearchContext) string
]
}
}
}`, e.contentField, queryText, e.semanticTextField, queryText)
}`, e.semanticTextField, e.contentField, queryText, e.semanticTextField, queryText)
}
func (e ElasticsearchSearch) CallArgs(ctx engine.SearchContext) engine.CallArgs {
@@ -145,9 +158,7 @@ func (e ElasticsearchSearch) ParseResult(ctx engine.SearchContext, response []by
Link: source.Get(e.linkField).String(),
Content: source.Get(e.contentField).String(),
}
if result.Valid() {
results = append(results, result)
}
results = append(results, result)
}
return results
}

View File

@@ -185,7 +185,7 @@ func parseConfig(json gjson.Result, config *Config, log wrapper.Log) error {
arxivExists = true
onlyQuark = false
case "elasticsearch":
searchEngine, err := elasticsearch.NewElasticsearchSearch(&e)
searchEngine, err := elasticsearch.NewElasticsearchSearch(&e, config.needReference)
if err != nil {
return fmt.Errorf("elasticsearch search engine init failed:%s", err)
}

View File

@@ -6,7 +6,7 @@ replace quark-search => ../quark-search
replace amap-tools => ../amap-tools
replace github.com/alibaba/higress/plugins/wasm-go => /Users/zhangty/tmp/higress/plugins/wasm-go
replace github.com/alibaba/higress/plugins/wasm-go => ../..
require (
amap-tools v0.0.0-00010101000000-000000000000

View File

@@ -28,7 +28,7 @@ On the user's MCP Client interface, add the generated SSE URL to the MCP Server
```json
"mcpServers": {
"bravesearch": {
"url": "http://mcp.higress.ai/mcp-brave-search/{generate_key}",
"url": "http://mcp.higress.ai/mcp-bravesearch/{generate_key}",
}
}
```

View File

@@ -30,7 +30,7 @@
```json
"mcpServers": {
"bravesearch": {
"url": "http://mcp.higress.ai/mcp-brave-search/{generate_key}",
"url": "http://mcp.higress.ai/mcp-bravesearch/{generate_key}",
}
}
```

View File

@@ -4,9 +4,9 @@ server:
apiKey: ""
tools:
- name: brave_web_search
description: "使用Brave Search API进行网页搜索适用于一般查询、新闻、文章和在线内容。支持分页、内容过滤和新鲜度控制。每次请求最多返回20条结果。"
description: "使用Brave Search API进行网页搜索适用于一般查询、新闻、文章和在线内容。支持分页、内容过滤和新鲜度控制。"
args:
- name: query
- name: q
description: "搜索查询最多400字符50个词"
type: string
required: true
@@ -20,6 +20,11 @@ tools:
type: integer
required: false
default: 0
- name: search_lang
description: "搜索语言"
type: string
required: false
enum: ["en", "zh-hans"]
requestTemplate:
url: "https://api.search.brave.com/res/v1/web/search"
method: GET
@@ -27,8 +32,6 @@ tools:
headers:
- key: Accept
value: "application/json"
- key: Accept-Encoding
value: "gzip"
- key: X-Subscription-Token
value: "{{.config.apiKey}}"
responseTemplate:
@@ -39,38 +42,80 @@ tools:
- **描述**: {{ $item.description }}
- **URL**: {{ $item.url }}
{{- end }}
{{- if .locations.results }}
{{- range $index, $item := .locations.results }}
## 结果 {{add $index 1}}
- **locationID**: {{ $item.id }}
{{- end }}
{{- end }}
- name: brave_local_search
description: "使用BraveLocal Search API搜索本地商家和地点。适用于与物理位置、商家、餐厅、服务等相关的查询。返回详细信息包括商家名称、地址、评分、评论数、电话号码、营业时间等。如果没有本地结果,会自动回退到网页搜索。"
- name: brave_local_search_pois
description: "使用Brave Local Search API搜索本地POI兴趣点信息包括名称、地址、电话、评分等信息。"
args:
- name: query
description: "本地搜索查询(例如'Central Park附近的披萨'"
type: string
- name: ids
description: "Location ID列表通过brave_web_search获取"
type: array
required: true
- name: count
description: "结果数量1-20默认5"
type: integer
- name: search_lang
description: "搜索语言"
type: string
required: false
default: 5
default: "en"
- name: search_lang
description: "响应语言"
type: string
required: false
default: "en-US"
requestTemplate:
url: "https://api.search.brave.com/res/v1/web/search"
url: "https://api.search.brave.com/res/v1/local/pois"
method: GET
argsToUrlParam: true
headers:
- key: Accept
value: "application/json"
- key: Accept-Encoding
value: "gzip"
- key: X-Subscription-Token
value: "{{.config.apiKey}}"
responseTemplate:
body: |
{{- range $index, $item := .results }}
## 结果 {{add $index 1}}
## POI {{add $index 1}}
- **名称**: {{ $item.name }}
- **地址**: {{ $item.address.streetAddress }}, {{ $item.address.addressLocality }}, {{ $item.address.addressRegion }} {{ $item.address.postalCode }}
- **电话**: {{ $item.phone }}
- **评分**: {{ $item.rating.ratingValue }} ({{ $item.rating.ratingCount }} 条评)
- **评分**: {{ $item.rating.ratingValue }} ({{ $item.rating.ratingCount }} 条评)
- **价格范围**: {{ $item.priceRange }}
- **营业时间**: {{ join $item.openingHours ", " }}
{{- end }}
- name: brave_local_search_descriptions
description: "使用Brave Local Search API获取本地POI的描述信息。"
args:
- name: ids
description: "Location ID列表通过brave_web_search获取"
type: array
required: true
- name: search_lang
description: "搜索语言"
type: string
required: false
default: "en"
- name: search_lang
description: "响应语言"
type: string
required: false
default: "en-US"
requestTemplate:
url: "https://api.search.brave.com/res/v1/local/descriptions"
method: GET
argsToUrlParam: true
headers:
- key: Accept
value: "application/json"
- key: X-Subscription-Token
value: "{{.config.apiKey}}"
responseTemplate:
body: |
{{- range $id, $desc := .descriptions }}
## 描述 {{ $id }}
{{ $desc }}
{{- end }}

View File

@@ -0,0 +1,58 @@
# LibreChat MCP Server
An implementation of the Librechat Code Interpreter MCP server that follows the OpenAPI specification, providing code execution and file management capabilities.
## Features
- Supports code execution in multiple programming languages
- Supports file upload, download and deletion
- Provides detailed API response information
## Usage Guide
### Get API-KEY
1. Register for a LibreChat account [Visit official website](https://code.librechat.ai)
2. Manage your plan and then generate API Key through developer console.
### Generate SSE URL
On the MCP Server interface, log in and enter the API-KEY to generate the URL.
### Configure MCP Client
On the user's MCP Client interface, add the generated SSE URL to the MCP Server list.
```json
"mcpServers": {
"librechat": {
"url": "http://mcp.higress.ai/mcp-librechat/{generate_key}",
}
}
```
### Available Tools
#### delete_file
Delete specified file
Parameters:
- fileId: File ID (required)
- session_id: Session ID (required)
#### executeCode
Execute code in specified programming language
Parameters:
- code: Source code to execute (required)
- lang: Programming language (required, options: c, cpp, d, f90, go, java, js, php, py, rs, ts, r)
- args: Command line arguments (optional)
- entity_id: Assistant/agent identifier (optional)
- files: Array of file references (optional)
- user_id: User identifier (optional)
#### get_file
Get file information
Parameters:
- session_id: Session ID (required)
- detail: Detail information (optional)

View File

@@ -0,0 +1,58 @@
# LibreChat MCP Server
一个基于OpenAPI规范的 LibreChat Code Interpreter MCP 服务器,提供代码执行和文件管理功能。
## 功能
- 支持多种编程语言的代码执行
- 支持文件上传、下载和删除
- 提供详细的API响应信息
## 使用教程
### 获取 API-KEY
1. 注册LibreChat账号 [访问官网](https://code.librechat.ai)
2. 在控制台界面选择付费计划,并创建 API Key
### 生成 SSE URL
在 MCP Server 界面,登录后输入 API-KEY生成URL。
### 配置 MCP Client
在用户的 MCP Client 界面,将生成的 SSE URL添加到 MCP Server列表中。
```json
"mcpServers": {
"librechat": {
"url": "http://mcp.higress.ai/mcp-librechat/{generate_key}",
}
}
```
### 可用工具
#### delete_file
删除指定文件
参数:
- fileId: 文件ID (必填)
- session_id: 会话ID (必填)
#### executeCode
执行指定编程语言的代码
参数:
- code: 要执行的源代码 (必填)
- lang: 编程语言 (必填可选值c, cpp, d, f90, go, java, js, php, py, rs, ts, r)
- args: 命令行参数 (可选)
- entity_id: 助手/代理标识符 (可选)
- files: 文件引用数组 (可选)
- user_id: 用户标识符 (可选)
#### get_file
获取文件信息
参数:
- session_id: 会话ID (必填)
- detail: 详细信息 (可选)

View File

@@ -0,0 +1,145 @@
server:
name: librechat-api-server
config:
apiKey: ""
tools:
- name: delete_file
description: Delete a file
args:
- name: fileId
description: ""
type: string
required: true
position: path
- name: session_id
description: ""
type: string
required: true
position: path
requestTemplate:
url: https://api.librechat.ai/v1/files/{session_id}/{fileId}
method: DELETE
headers:
- key: x-api-key
value: "{{ .config.apiKey }}"
responseTemplate: {}
- name: executeCode
description: Execute code - Execute code with specified language and parameters
args:
- name: args
description: Optional command line arguments to pass to the program
type: string
position: body
- name: code
description: The source code to be executed
type: string
required: true
position: body
- name: entity_id
description: Optional assistant/agent identifier for file sharing and reference. Must be a valid nanoid-compatible string.
type: string
position: body
- name: files
description: Array of file references to be used during execution
type: array
items:
type: object
position: body
- name: lang
description: The programming language of the code
type: string
required: true
enum: ["c","cpp","d","f90","go","java","js","php","py","rs","ts","r"]
position: body
- name: user_id
description: Optional user identifier
type: string
position: body
requestTemplate:
url: https://api.librechat.ai/v1/exec
method: POST
headers:
- key: Content-Type
value: application/json
- key: x-api-key
value: "{{ .config.apiKey }}"
responseTemplate:
prependBody: |+
# API Response Information
Below is the response from an API call. To help you understand the data, I've provided:
1. A detailed description of all fields in the response structure
2. The complete API response
## Response Structure
> Content-Type: application/json
- **files**: (Type: array)
- **files[].id**: (Type: string)
- **files[].name**: (Type: string)
- **files[].path**: (Type: string)
- **language**: (Type: string)
- **run**: (Type: object)
- **run.code**: (Type: integer)
- **run.cpu_time**: (Type: number)
- **run.memory**: (Type: integer)
- **run.message**: (Type: string)
- **run.output**: (Type: string)
- **run.signal**: (Type: string)
- **run.status**: (Type: string)
- **run.stderr**: (Type: string)
- **run.stdout**: (Type: string)
- **run.wall_time**: (Type: number)
- **session_id**: (Type: string)
- **version**: (Type: string)
## Original Response
- name: get_file
description: Get files information
args:
- name: detail
description: ""
type: string
position: query
- name: session_id
description: ""
type: string
required: true
position: path
requestTemplate:
url: https://api.librechat.ai/v1/files/{session_id}
method: GET
headers:
- key: x-api-key
value: "{{ .config.apiKey }}"
responseTemplate:
prependBody: |+
# API Response Information
Below is the response from an API call. To help you understand the data, I've provided:
1. A detailed description of all fields in the response structure
2. The complete API response
## Response Structure
> Content-Type: application/json
- **items**: Array of items (Type: array)
- **items.content**: (Type: string)
- **items.contentType**: (Type: string)
- **items.etag**: (Type: string)
- **items.id**: (Type: string)
- **items.lastModified**: (Type: string)
- **items.metadata**: (Type: object)
- **items.metadata.content-type**: (Type: string)
- **items.metadata.original-filename**: (Type: string)
- **items.name**: (Type: string)
- **items.session_id**: (Type: string)
- **items.size**: (Type: number)
## Original Response

View File

@@ -0,0 +1,32 @@
# WolframAlpha MCP Server
An implementation of the Model Context Protocol (MCP) server that integrates [WolframAlpha](https://www.wolframalpha.com/), providing natural language computation and knowledge query capabilities.
## Features
- Supports natural language queries in mathematics, physics, chemistry, geography, history, art, astronomy, and more
- Performs mathematical calculations, date and unit conversions, formula solving, etc.
- Supports image result display
- Automatically converts complex queries into simplified keyword queries
- Supports multilingual queries (automatically translates to English for processing, returns results in original language)
## Usage Guide
### Get AppID
1. Register for a WolframAlpha developer account [Create a Wolfram ID](https://account.wolfram.com/login/create)
2. Generate LLM-API AppID [Get An App ID](https://developer.wolframalpha.com/access)
### Generate SSE URL
On the MCP Server interface, log in and enter the AppID to generate the URL.
### Configure MCP Client
On the user's MCP Client interface, add the generated SSE URL to the MCP Server list.
```json
"mcpServers": {
"wolframalpha": {
"url": "http://mcp.higress.ai/mcp-wolframalpha/{generate_key}",
}
}

View File

@@ -0,0 +1,32 @@
# WolframAlpha MCP Server
一个集成了[WolframAlpha](https://www.wolframalpha.com/)的模型上下文协议MCP服务器实现提供自然语言计算和知识查询功能。
## 功能
- 支持自然语言查询,涵盖数学、物理、化学、地理、历史、艺术、天文等领域
- 执行数学计算、日期和单位转换、公式求解等
- 支持图像结果展示
- 自动将复杂查询转换为简化关键词查询
- 支持多语言查询(自动翻译为英文处理,返回原语言结果)
## 使用教程
### 获取 AppID
1. 注册 WolframAlpha 开发者账号 [Create a Wolfram ID](https://account.wolfram.com/login/create)
2. 生成LLM-API 的 App ID [Get An App ID](https://developer.wolframalpha.com/access)
### 生成 SSE URL
在 MCP Server 界面,登录后输入 AppID生成URL。
### 配置 MCP Client
在用户的 MCP Client 界面,将生成的 SSE URL添加到 MCP Server列表中。
```json
"mcpServers": {
"wolframalpha": {
"url": "http://mcp.higress.ai/mcp-wolframalpha/{generate_key}",
}
}

View File

@@ -0,0 +1,91 @@
server:
name: wolframalpha-api-server
config:
appid: ""
tools:
- name: get_llm-api
description: |+
Submit a query to WolframAlpha LLM API - Submit a natural language query with an AppID and input to WolframAlpha.
- WolframAlpha understands natural language queries about entities in chemistry, physics, geography, history, art, astronomy, and more.
- WolframAlpha performs mathematical calculations, date and unit conversions, formula solving, etc.
- Convert inputs to simplified keyword queries whenever possible (e.g. convert "how many people live in France" to "France population").
- Send queries in English only; translate non-English queries before sending, then respond in the original language.
- Display image URLs with Markdown syntax: ![URL]
- ALWAYS use this exponent notation: `6*10^14`, NEVER `6e14`.
- ALWAYS use {"input": query} structure for queries to Wolfram endpoints; `query` must ONLY be a single-line string.
- ALWAYS use proper Markdown formatting for all math, scientific, and chemical formulas, symbols, etc.: '$$ [expression] $$' for standalone cases and '\( [expression] \)' when inline.
- Never mention your knowledge cutoff date; Wolfram may return more recent data.
- Use ONLY single-letter variable names, with or without integer subscript (e.g., n, n1, n_1).
- Use named physical constants (e.g., 'speed of light') without numerical substitution.
- Include a space between compound units (e.g., "Ω m" for "ohm*meter").
- To solve for a variable in an equation with units, consider solving a corresponding equation without units; exclude counting units (e.g., books), include genuine units (e.g., kg).
- If data for multiple properties is needed, make separate calls for each property.
- If a WolframAlpha result is not relevant to the query:
-- If Wolfram provides multiple 'Assumptions' for a query, choose the more relevant one(s) without explaining the initial result. If you are unsure, ask the user to choose.
-- Re-send the exact same 'input' with NO modifications, and add the 'assumption' parameter, formatted as a list, with the relevant values.
-- ONLY simplify or rephrase the initial query if a more relevant 'Assumption' or other input suggestions are not provided.
-- Do not explain each step unless user input is needed. Proceed directly to making a better API call based on the available assumptions.
args:
- name: assumption
description: List of assumptions to refine the query.
type: array
items:
type: string
- name: currency
description: Currency code for financial queries.
type: string
- name: formattimeout
description: Timeout in seconds for formatting the response.
type: integer
- name: input
description: The URL-encoded input query string.
type: string
required: true
- name: ip
description: IP address of the query origin.
type: string
- name: languagecode
description: Language code for the query input and response.
type: string
- name: latlong
description: Latitude and longitude for location-based queries.
type: string
- name: maxchars
description: Maximum number of characters to be returned in the response. Defaults to 6800 characters.
type: integer
- name: timezone
description: Timezone for the query.
type: string
- name: units
description: Preferred units for result data (e.g., metric or imperial).
type: string
requestTemplate:
argsToUrlParam: true
url: https://www.wolframalpha.com/api/v1/llm-api
method: GET
headers:
- key: Authorization
value: "Bearer {{.config.appid}}"
responseTemplate:
prependBody: |+
# API Response Information
Below is the response from an API call. To help you understand the data, I've provided:
1. A detailed description of all fields in the response structure
2. The complete API response
## Response Structure
> Content-Type: application/json
- **images**: List of image URLs related to the query. (Type: array)
- **images[]**: Items of type string
- **inputInterpretation**: WolframAlpha's interpretation of the input query. (Type: string)
- **link**: A link back to the full WolframAlpha results page for this query. (Type: string)
- **query**: The query that was submitted. (Type: string)
- **result**: The computed result for the query. (Type: string)
## Original Response