5.9 KiB
Higress AI Gateway - Troubleshooting
Common issues and solutions for Higress AI Gateway deployment and operation.
Container Issues
Container fails to start
Check Docker is running:
docker info
Check port availability:
netstat -tlnp | grep 8080
View container logs:
docker logs higress-ai-gateway
Gateway not responding
Check container status:
docker ps -a
Verify port mapping:
docker port higress-ai-gateway
Test locally:
curl http://localhost:8080/v1/models
File System Issues
"too many open files" error from API server
Symptom:
panic: unable to create REST storage for a resource due to too many open files, will die
or
command failed err="failed to create shared file watcher: too many open files"
Root Cause:
The system's fs.inotify.max_user_instances limit is too low. This commonly occurs on systems with many Docker containers, as each container can consume inotify instances.
Check current limit:
cat /proc/sys/fs/inotify/max_user_instances
Default is often 128, which is insufficient when running multiple containers.
Solution:
Increase the inotify instance limit to 8192:
# Temporarily (until next reboot)
sudo sysctl -w fs.inotify.max_user_instances=8192
# Permanently (survives reboots)
echo "fs.inotify.max_user_instances = 8192" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
Verify:
cat /proc/sys/fs/inotify/max_user_instances
# Should output: 8192
Restart the container:
docker restart higress-ai-gateway
Additional inotify tunables (if still experiencing issues):
# Increase max watches per user
sudo sysctl -w fs.inotify.max_user_watches=524288
# Increase max queued events
sudo sysctl -w fs.inotify.max_queued_events=32768
To make these permanent as well:
echo "fs.inotify.max_user_watches = 524288" | sudo tee -a /etc/sysctl.conf
echo "fs.inotify.max_queued_events = 32768" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
Plugin Issues
Plugin not recognized
Verify plugin installation:
For Clawdbot:
ls -la ~/.clawdbot/extensions/higress-ai-gateway
For OpenClaw:
ls -la ~/.openclaw/extensions/higress-ai-gateway
Check package.json:
Ensure package.json contains the correct extension field:
- Clawdbot:
"clawdbot.extensions" - OpenClaw:
"openclaw.extensions"
Restart the runtime:
# Restart Clawdbot gateway
clawdbot gateway restart
# Or OpenClaw gateway
openclaw gateway restart
Routing Issues
Auto-routing not working
Confirm model is in list:
# Check if higress/auto is available
clawdbot models list | grep "higress/auto"
Check routing rules exist:
./get-ai-gateway.sh route list
Verify default model is configured:
./get-ai-gateway.sh config list
Check gateway logs:
docker logs higress-ai-gateway | grep -i routing
View access logs:
tail -f ./higress/logs/access.log
Configuration Issues
Timezone detection fails
Manually check timezone:
timedatectl show --property=Timezone --value
Or check timezone file:
cat /etc/timezone
Fallback behavior:
- If detection fails, defaults to Hangzhou mirror
- Manual override: Set
IMAGE_REPOenvironment variable
Manual repository selection:
# For China/Asia
IMAGE_REPO="higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/all-in-one"
# For Southeast Asia
IMAGE_REPO="higress-registry.ap-southeast-7.cr.aliyuncs.com/higress/all-in-one"
# For North America
IMAGE_REPO="higress-registry.us-west-1.cr.aliyuncs.com/higress/all-in-one"
# Use in deployment
IMAGE_REPO="$IMAGE_REPO" ./get-ai-gateway.sh start --non-interactive ...
Performance Issues
Slow image downloads
Check selected repository:
echo $IMAGE_REPO
Manually select closest mirror:
See Configuration Issues → Timezone detection fails for manual repository selection.
High memory usage
Check container stats:
docker stats higress-ai-gateway
View resource limits:
docker inspect higress-ai-gateway | grep -A 10 "HostConfig"
Set memory limits:
# Stop container
./get-ai-gateway.sh stop
# Manually restart with limits
docker run -d \
--name higress-ai-gateway \
--memory="4g" \
--memory-swap="4g" \
...
Log Analysis
Access logs location
# Default location
./higress/logs/access.log
# View real-time logs
tail -f ./higress/logs/access.log
Container logs
# View all logs
docker logs higress-ai-gateway
# Follow logs
docker logs -f higress-ai-gateway
# Last 100 lines
docker logs --tail 100 higress-ai-gateway
# With timestamps
docker logs -t higress-ai-gateway
Network Issues
Cannot connect to gateway
Verify container is running:
docker ps | grep higress-ai-gateway
Check port bindings:
docker port higress-ai-gateway
Test from inside container:
docker exec higress-ai-gateway curl localhost:8080/v1/models
Check firewall rules:
# Check if port is accessible
sudo ufw status | grep 8080
# Allow port (if needed)
sudo ufw allow 8080/tcp
DNS resolution issues
Test from container:
docker exec higress-ai-gateway ping -c 3 api.openai.com
Check DNS settings:
docker exec higress-ai-gateway cat /etc/resolv.conf
Getting Help
If you're still experiencing issues:
-
Collect logs:
docker logs higress-ai-gateway > gateway.log 2>&1 cat ./higress/logs/access.log > access.log -
Check system info:
docker version docker info uname -a cat /proc/sys/fs/inotify/max_user_instances -
Report issue:
- Repository: https://github.com/higress-group/higress-standalone
- Include: logs, system info, deployment command used