Files

title, keywords, description
title keywords description
AI IMAGE READER
AI GATEWAY
AI IMAGE READER
AI IMAGE READER Plugin Configuration Reference

Function Description

By integrating with OCR services to implement AI-IMAGE-READER, currently, it supports Alibaba Cloud's qwen-vl-ocr model under Dashscope for OCR services, and the process is shown in the figure below:

Running Attributes

Plugin execution phaseDefault Phase Plugin execution priority400

Configuration Description

Name Data Type Requirement Default Value Description
apiKey string Required - Token for authenticating access to OCR services.
type string Required - Provider type of the backend OCR service type(e.g. dashscope).
serviceHost string Required - Host of the backend OCR service.
serviceName string Required - Name of the backend OCR service.
servicePort int Required - Port of the backend OCR service.
model string Required - Model name of the backend OCR service (e.g., qwen-vl-ocr).
timeout int Required 10000 API call timeout duration (milliseconds).

Example

"apiKey": "YOUR_API_KEY",
"type": "dashscope",
"model": "qwen-vl-ocr",
"timeout": 10000,
"serviceHost": "dashscope.aliyuncs.com",
"serviceName": "dashscope",
"servicePort": "443"

Request to follow the OpenAI API protocol specifications:

Pass images via URL:

messages=[{
    "role": "user",
    "content": [
        {"type": "text", "text": "What's in this image?"},
        {
            "type": "image_url",
            "image_url": {
                "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/ctdzex/biaozhun.jpg",
            },
        },
    ],
}],

Pass images via Base64:

messages=[
    {
        "role": "user",
        "content": [
            { "type": "text", "text": "what's in this image?" },
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/jpeg;base64,{base64_image}",
                },
            },
        ],
    }
],

The following is an example of using ai-image-reader for enhancement. The original request was:

What is the content of the image?

The result returned by the LLM without processing from the ai-image-reader plugin is:

Sorry, as a text-based AI assistant, I cannot view image content. You can describe the content of the image, and I will do my best to help you identify it.

The result returned by the LLM after processing by the ai-image-reader plugin is:

Thank you for sharing the image! Mastering shell scripting is highly beneficial for Linux system administrators as it automates tasks, boosts efficiency, and cuts down manual work. For home Linux users, command-line skills are equally important for quick and efficient operations. This book will teach you to handle system management tasks with shell scripts and operate in the Linux command line. Hope it aids your Linux system management learning! Feel free to ask if you have more questions.