Files
higress/.claude/skills/higress-openclaw-integration/references/TROUBLESHOOTING.md

5.9 KiB

Higress AI Gateway - Troubleshooting

Common issues and solutions for Higress AI Gateway deployment and operation.

Container Issues

Container fails to start

Check Docker is running:

docker info

Check port availability:

netstat -tlnp | grep 8080

View container logs:

docker logs higress-ai-gateway

Gateway not responding

Check container status:

docker ps -a

Verify port mapping:

docker port higress-ai-gateway

Test locally:

curl http://localhost:8080/v1/models

File System Issues

"too many open files" error from API server

Symptom:

panic: unable to create REST storage for a resource due to too many open files, will die

or

command failed err="failed to create shared file watcher: too many open files"

Root Cause:

The system's fs.inotify.max_user_instances limit is too low. This commonly occurs on systems with many Docker containers, as each container can consume inotify instances.

Check current limit:

cat /proc/sys/fs/inotify/max_user_instances

Default is often 128, which is insufficient when running multiple containers.

Solution:

Increase the inotify instance limit to 8192:

# Temporarily (until next reboot)
sudo sysctl -w fs.inotify.max_user_instances=8192

# Permanently (survives reboots)
echo "fs.inotify.max_user_instances = 8192" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Verify:

cat /proc/sys/fs/inotify/max_user_instances
# Should output: 8192

Restart the container:

docker restart higress-ai-gateway

Additional inotify tunables (if still experiencing issues):

# Increase max watches per user
sudo sysctl -w fs.inotify.max_user_watches=524288

# Increase max queued events
sudo sysctl -w fs.inotify.max_queued_events=32768

To make these permanent as well:

echo "fs.inotify.max_user_watches = 524288" | sudo tee -a /etc/sysctl.conf
echo "fs.inotify.max_queued_events = 32768" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p

Plugin Issues

Plugin not recognized

Verify plugin installation:

For Clawdbot:

ls -la ~/.clawdbot/extensions/higress-ai-gateway

For OpenClaw:

ls -la ~/.openclaw/extensions/higress-ai-gateway

Check package.json:

Ensure package.json contains the correct extension field:

  • Clawdbot: "clawdbot.extensions"
  • OpenClaw: "openclaw.extensions"

Restart the runtime:

# Restart Clawdbot gateway
clawdbot gateway restart

# Or OpenClaw gateway
openclaw gateway restart

Routing Issues

Auto-routing not working

Confirm model is in list:

# Check if higress/auto is available
clawdbot models list | grep "higress/auto"

Check routing rules exist:

./get-ai-gateway.sh route list

Verify default model is configured:

./get-ai-gateway.sh config list

Check gateway logs:

docker logs higress-ai-gateway | grep -i routing

View access logs:

tail -f ./higress/logs/access.log

Configuration Issues

Timezone detection fails

Manually check timezone:

timedatectl show --property=Timezone --value

Or check timezone file:

cat /etc/timezone

Fallback behavior:

  • If detection fails, defaults to Hangzhou mirror
  • Manual override: Set IMAGE_REPO environment variable

Manual repository selection:

# For China/Asia
IMAGE_REPO="higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/all-in-one"

# For Southeast Asia
IMAGE_REPO="higress-registry.ap-southeast-7.cr.aliyuncs.com/higress/all-in-one"

# For North America
IMAGE_REPO="higress-registry.us-west-1.cr.aliyuncs.com/higress/all-in-one"

# Use in deployment
IMAGE_REPO="$IMAGE_REPO" ./get-ai-gateway.sh start --non-interactive ...

Performance Issues

Slow image downloads

Check selected repository:

echo $IMAGE_REPO

Manually select closest mirror:

See Configuration Issues → Timezone detection fails for manual repository selection.

High memory usage

Check container stats:

docker stats higress-ai-gateway

View resource limits:

docker inspect higress-ai-gateway | grep -A 10 "HostConfig"

Set memory limits:

# Stop container
./get-ai-gateway.sh stop

# Manually restart with limits
docker run -d \
  --name higress-ai-gateway \
  --memory="4g" \
  --memory-swap="4g" \
  ...

Log Analysis

Access logs location

# Default location
./higress/logs/access.log

# View real-time logs
tail -f ./higress/logs/access.log

Container logs

# View all logs
docker logs higress-ai-gateway

# Follow logs
docker logs -f higress-ai-gateway

# Last 100 lines
docker logs --tail 100 higress-ai-gateway

# With timestamps
docker logs -t higress-ai-gateway

Network Issues

Cannot connect to gateway

Verify container is running:

docker ps | grep higress-ai-gateway

Check port bindings:

docker port higress-ai-gateway

Test from inside container:

docker exec higress-ai-gateway curl localhost:8080/v1/models

Check firewall rules:

# Check if port is accessible
sudo ufw status | grep 8080

# Allow port (if needed)
sudo ufw allow 8080/tcp

DNS resolution issues

Test from container:

docker exec higress-ai-gateway ping -c 3 api.openai.com

Check DNS settings:

docker exec higress-ai-gateway cat /etc/resolv.conf

Getting Help

If you're still experiencing issues:

  1. Collect logs:

    docker logs higress-ai-gateway > gateway.log 2>&1
    cat ./higress/logs/access.log > access.log
    
  2. Check system info:

    docker version
    docker info
    uname -a
    cat /proc/sys/fs/inotify/max_user_instances
    
  3. Report issue: