higress

jiazhizhong/higress

Fork 0

mirror of https://github.com/alibaba/higress.git synced 2026-02-06 23:21:08 +08:00

Files

History

rinfx 76ada0b844 add trace_span_key & as_seperate_log_field configuration for ai-statistics (#2488 )

2025-06-25 09:28:14 +08:00

.gitignore

add plugin: ai-statistics (#1011 )

2024-06-18 17:53:04 +08:00

go.mod

Enhance the compatibility of AI observability plugins with different LLM suppliers (#2088 )

2025-04-18 16:19:59 +08:00

go.sum

Enhance the compatibility of AI observability plugins with different LLM suppliers (#2088 )

2025-04-18 16:19:59 +08:00

main.go

add trace_span_key & as_seperate_log_field configuration for ai-statistics (#2488 )

2025-06-25 09:28:14 +08:00

README_EN.md

add trace_span_key & as_seperate_log_field configuration for ai-statistics (#2488 )

2025-06-25 09:28:14 +08:00

README.md

add trace_span_key & as_seperate_log_field configuration for ai-statistics (#2488 )

2025-06-25 09:28:14 +08:00

VERSION

upgrade to istio 1.19 (#1211 )

2024-08-26 09:51:47 +08:00

README_EN.md

title, keywords, description

title

keywords

description

AI Statistics

higress

observability

AI Statistics plugin configuration reference

Introduction

Provides basic AI observability capabilities, including metric, log, and trace. The ai-proxy plug-in needs to be connected afterwards. If the ai-proxy plug-in is not connected, the user needs to configure it accordingly to take effect.

Runtime Properties

Plugin Phase: CUSTOM Plugin Priority: 200

Configuration instructions

The default request of the plug-in conforms to the openai protocol format and provides the following basic observable values. Users do not need special configuration:

metric: It provides indicators such as input token, output token, rt of the first token (streaming request), total request rt, etc., and supports observation in the four dimensions of gateway, routing, service, and model.
log: Provides input_token, output_token, model, llm_service_duration, llm_first_token_duration and other fields

Users can also expand observable values through configuration:

Name	Type	Required	Default	Description
`attributes`	[]Attribute	optional	-	Information that the user wants to record in log/span
`disable_openai_usage`	bool	optional	false	When using a non-OpenAI-compatible protocol, the support for model and token is non-standard. Setting the configuration to true can prevent errors.

Attribute Configuration instructions:

Name	Type	Required	Default	Description
`key`	string	required	-	attribute key
`value_source`	string	required	-	attribute value source, optional values are `fixed_value`, `request_header`, `request_body`, `response_header`, `response_body`, `response_streaming_body`
`value`	string	required	-	how to get attribute value
`default_value`	string	optional	-	default value for attribute
`rule`	string	optional	-	Rule to extract attribute from streaming response, optional values are `first`, `replace`, `append`
`apply_to_log`	bool	optional	false	Whether to record the extracted information in the log
`apply_to_span`	bool	optional	false	Whether to record the extracted information in the link tracking span
`trace_span_key`	string	optional	-	span attribute key, default is the value of `key`
`as_separate_log_field`	bool	optional	false	Whether to use a separate log field, the field name is equal to the value of `key`

The meanings of various values for value_source are as follows:

fixed_value: fixed value
request_header: The attribute is obtained through the http request header
request_body: The attribute is obtained through the http request body
response_header: The attribute is obtained through the http response header
response_body: The attribute is obtained through the http response body
response_streaming_body: The attribute is obtained through the http streaming response body

When value_source is response_streaming_body, rule should be configured to specify how to obtain the specified value from the streaming body. The meaning of the value is as follows:

first: extract value from the first valid chunk
replace: extract value from the last valid chunk
append: join value pieces from all valid chunks

Configuration example

If you want to record ai-statistic related statistical values in the gateway access log, you need to modify log_format and add a new field based on the original log_format. The example is as follows:

'{"ai_log":"%FILTER_STATE(wasm.ai_log:PLAIN)%"}'

If the field is set with as_separate_log_field, for example:

attributes:
  - key: consumer
    value_source: request_header
    value: x-mse-consumer
    apply_to_log: true
    as_separate_log_field: true

Then to print in the log, you need to set log_format additionally:

'{"consumer":"%FILTER_STATE(wasm.consumer:PLAIN)%"}'

Empty

Metric

# counter, cumulative count of input tokens
route_upstream_model_consumer_metric_input_token{ai_route="ai-route-aliyun.internal",ai_cluster="outbound|443||llm-aliyun.internal.dns",ai_model="qwen-turbo",ai_consumer="none"} 24

# counter, cumulative count of output tokens
route_upstream_model_consumer_metric_output_token{ai_route="ai-route-aliyun.internal",ai_cluster="outbound|443||llm-aliyun.internal.dns",ai_model="qwen-turbo",ai_consumer="none"} 507

# counter, cumulative total duration of both streaming and non-streaming requests
route_upstream_model_consumer_metric_llm_service_duration{ai_route="ai-route-aliyun.internal",ai_cluster="outbound|443||llm-aliyun.internal.dns",ai_model="qwen-turbo",ai_consumer="none"} 6470

# counter, cumulative count of both streaming and non-streaming requests
route_upstream_model_consumer_metric_llm_duration_count{ai_route="ai-route-aliyun.internal",ai_cluster="outbound|443||llm-aliyun.internal.dns",ai_model="qwen-turbo",ai_consumer="none"} 2

# counter, cumulative latency of the first token in streaming requests
route_upstream_model_consumer_metric_llm_first_token_duration{ai_route="ai-route-aliyun.internal",ai_cluster="outbound|443||llm-aliyun.internal.dns",ai_model="qwen-turbo",ai_consumer="none"} 340

# counter, cumulative count of streaming requests
route_upstream_model_consumer_metric_llm_stream_duration_count{ai_route="ai-route-aliyun.internal",ai_cluster="outbound|443||llm-aliyun.internal.dns",ai_model="qwen-turbo",ai_consumer="none"} 1

Below are some example usages of these metrics:

Average latency of the first token in streaming requests:

irate(route_upstream_model_consumer_metric_llm_first_token_duration[2m])
/
irate(route_upstream_model_consumer_metric_llm_stream_duration_count[2m])

Average process duration of both streaming and non-streaming requests:

irate(route_upstream_model_consumer_metric_llm_service_duration[2m])
/
irate(route_upstream_model_consumer_metric_llm_duration_count[2m])

Log

{
  "ai_log":"{\"model\":\"qwen-turbo\",\"input_token\":\"10\",\"output_token\":\"69\",\"llm_first_token_duration\":\"309\",\"llm_service_duration\":\"1955\"}"
}

Trace

When the configuration is empty, no additional attributes will be added to the span.

Extract token usage information from non-openai protocols

When setting the protocol to original in ai-proxy, taking Alibaba Cloud Bailian as an example, you can make the following configuration to specify how to extract model, input_token, output_token

attributes:
  - key: model
    value_source: response_body
    value: usage.models.0.model_id
    apply_to_log: true
    apply_to_span: false
  - key: input_token
    value_source: response_body
    value: usage.models.0.input_tokens
    apply_to_log: true
    apply_to_span: false
  - key: output_token
    value_source: response_body
    value: usage.models.0.output_tokens
    apply_to_log: true
    apply_to_span: false

Metric

route_upstream_model_consumer_metric_input_token{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 343
route_upstream_model_consumer_metric_output_token{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 153
route_upstream_model_consumer_metric_llm_service_duration{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 3725
route_upstream_model_consumer_metric_llm_duration_count{ai_route="bailian",ai_cluster="qwen",ai_model="qwen-max"} 1

Log

{
  "ai_log": "{\"model\":\"qwen-max\",\"input_token\":\"343\",\"output_token\":\"153\",\"llm_service_duration\":\"19110\"}"
}

Trace

Three additional attributes model, input_token, and output_token can be seen in the trace spans.

Cooperate with authentication and authentication record consumer

attributes:
  - key: consumer
    value_source: request_header
    value: x-mse-consumer
    apply_to_log: true

Record questions and answers

attributes:
  - key: question
    value_source: request_body
    value: messages.@reverse.0.content
    apply_to_log: true
  - key: answer
    value_source: response_streaming_body
    value: choices.0.delta.content
    rule: append
    apply_to_log: true
  - key: answer
    value_source: response_body
    value: choices.0.message.content
    apply_to_log: true

README_EN.md Unescape Escape

Introduction

Runtime Properties

Configuration instructions

Configuration example

Empty

Metric

Log

Trace

Extract token usage information from non-openai protocols

Metric

Log

Trace

Cooperate with authentication and authentication record consumer

Record questions and answers

README_EN.md