higress/plugins/wasm-go/extensions/ai-image-reader/README_EN.md at main

jiazhizhong/higress

Fork 0

mirror of https://github.com/alibaba/higress.git synced 2026-02-06 23:21:08 +08:00

Files

kai2321 1fe5eb6e13 Implement AI-image-reader plugin (#1925 )

2025-06-25 19:28:02 +08:00

3.5 KiB

Raw Permalink Blame History

title, keywords, description

title

keywords

description

AI IMAGE READER

AI GATEWAY

AI IMAGE READER

AI IMAGE READER Plugin Configuration Reference

Function Description

By integrating with OCR services to implement AI-IMAGE-READER, currently, it supports Alibaba Cloud's qwen-vl-ocr model under Dashscope for OCR services, and the process is shown in the figure below:

Running Attributes

Plugin execution phase：Default Phase Plugin execution priority：400

Configuration Description

Name	Data Type	Requirement	Default Value	Description
`apiKey`	string	Required	-	Token for authenticating access to OCR services.
`type`	string	Required	-	Provider type of the backend OCR service type(e.g. dashscope).
`serviceHost`	string	Required	-	Host of the backend OCR service.
`serviceName`	string	Required	-	Name of the backend OCR service.
`servicePort`	int	Required	-	Port of the backend OCR service.
`model`	string	Required	-	Model name of the backend OCR service (e.g., qwen-vl-ocr).
`timeout`	int	Required	10000	API call timeout duration (milliseconds).

Example

"apiKey": "YOUR_API_KEY",
"type": "dashscope",
"model": "qwen-vl-ocr",
"timeout": 10000,
"serviceHost": "dashscope.aliyuncs.com",
"serviceName": "dashscope",
"servicePort": "443"

Request to follow the OpenAI API protocol specifications:

Pass images via URL:

messages=[{
    "role": "user",
    "content": [
        {"type": "text", "text": "What's in this image?"},
        {
            "type": "image_url",
            "image_url": {
                "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg",
            },
        },
    ],
}],

Pass images via Base64:

messages=[
    {
        "role": "user",
        "content": [
            { "type": "text", "text": "what's in this image?" },
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}",
                },
            },
        ],
    }
],

The following is an example of using ai-image-reader for enhancement. The original request was:

What is the content of the image?

The result returned by the LLM without processing from the ai-image-reader plugin is:

Sorry, as a text-based AI assistant, I cannot view image content. You can describe the content of the image, and I will do my best to help you identify it.

The result returned by the LLM after processing by the ai-image-reader plugin is:

Thank you for sharing the image! Mastering shell scripting is highly beneficial for Linux system administrators as it automates tasks, boosts efficiency, and cuts down manual work. For home Linux users, command-line skills are equally important for quick and efficient operations. This book will teach you to handle system management tasks with shell scripts and operate in the Linux command line. Hope it aids your Linux system management learning! Feel free to ask if you have more questions.

3.5 KiB Raw Permalink Blame History Unescape Escape

Function Description

Running Attributes

Configuration Description

Example

3.5 KiB

Raw Permalink Blame History