mirror of
https://github.com/alibaba/higress.git
synced 2026-05-28 22:57:31 +08:00
326 lines
5.9 KiB
Markdown
326 lines
5.9 KiB
Markdown
# Higress AI Gateway - Troubleshooting
|
|
|
|
Common issues and solutions for Higress AI Gateway deployment and operation.
|
|
|
|
## Container Issues
|
|
|
|
### Container fails to start
|
|
|
|
**Check Docker is running:**
|
|
```bash
|
|
docker info
|
|
```
|
|
|
|
**Check port availability:**
|
|
```bash
|
|
netstat -tlnp | grep 8080
|
|
```
|
|
|
|
**View container logs:**
|
|
```bash
|
|
docker logs higress-ai-gateway
|
|
```
|
|
|
|
### Gateway not responding
|
|
|
|
**Check container status:**
|
|
```bash
|
|
docker ps -a
|
|
```
|
|
|
|
**Verify port mapping:**
|
|
```bash
|
|
docker port higress-ai-gateway
|
|
```
|
|
|
|
**Test locally:**
|
|
```bash
|
|
curl http://localhost:8080/v1/models
|
|
```
|
|
|
|
## File System Issues
|
|
|
|
### "too many open files" error from API server
|
|
|
|
**Symptom:**
|
|
```
|
|
panic: unable to create REST storage for a resource due to too many open files, will die
|
|
```
|
|
or
|
|
```
|
|
command failed err="failed to create shared file watcher: too many open files"
|
|
```
|
|
|
|
**Root Cause:**
|
|
|
|
The system's `fs.inotify.max_user_instances` limit is too low. This commonly occurs on systems with many Docker containers, as each container can consume inotify instances.
|
|
|
|
**Check current limit:**
|
|
```bash
|
|
cat /proc/sys/fs/inotify/max_user_instances
|
|
```
|
|
|
|
Default is often 128, which is insufficient when running multiple containers.
|
|
|
|
**Solution:**
|
|
|
|
Increase the inotify instance limit to 8192:
|
|
|
|
```bash
|
|
# Temporarily (until next reboot)
|
|
sudo sysctl -w fs.inotify.max_user_instances=8192
|
|
|
|
# Permanently (survives reboots)
|
|
echo "fs.inotify.max_user_instances = 8192" | sudo tee -a /etc/sysctl.conf
|
|
sudo sysctl -p
|
|
```
|
|
|
|
**Verify:**
|
|
```bash
|
|
cat /proc/sys/fs/inotify/max_user_instances
|
|
# Should output: 8192
|
|
```
|
|
|
|
**Restart the container:**
|
|
```bash
|
|
docker restart higress-ai-gateway
|
|
```
|
|
|
|
**Additional inotify tunables** (if still experiencing issues):
|
|
```bash
|
|
# Increase max watches per user
|
|
sudo sysctl -w fs.inotify.max_user_watches=524288
|
|
|
|
# Increase max queued events
|
|
sudo sysctl -w fs.inotify.max_queued_events=32768
|
|
```
|
|
|
|
To make these permanent as well:
|
|
```bash
|
|
echo "fs.inotify.max_user_watches = 524288" | sudo tee -a /etc/sysctl.conf
|
|
echo "fs.inotify.max_queued_events = 32768" | sudo tee -a /etc/sysctl.conf
|
|
sudo sysctl -p
|
|
```
|
|
|
|
## Plugin Issues
|
|
|
|
### Plugin not recognized
|
|
|
|
**Verify plugin installation:**
|
|
|
|
For Clawdbot:
|
|
```bash
|
|
ls -la ~/.clawdbot/extensions/higress-ai-gateway
|
|
```
|
|
|
|
For OpenClaw:
|
|
```bash
|
|
ls -la ~/.openclaw/extensions/higress-ai-gateway
|
|
```
|
|
|
|
**Check package.json:**
|
|
|
|
Ensure `package.json` contains the correct extension field:
|
|
- Clawdbot: `"clawdbot.extensions"`
|
|
- OpenClaw: `"openclaw.extensions"`
|
|
|
|
**Restart the runtime:**
|
|
```bash
|
|
# Restart Clawdbot gateway
|
|
clawdbot gateway restart
|
|
|
|
# Or OpenClaw gateway
|
|
openclaw gateway restart
|
|
```
|
|
|
|
## Routing Issues
|
|
|
|
### Auto-routing not working
|
|
|
|
**Confirm model is in list:**
|
|
```bash
|
|
# Check if higress/auto is available
|
|
clawdbot models list | grep "higress/auto"
|
|
```
|
|
|
|
**Check routing rules exist:**
|
|
```bash
|
|
./get-ai-gateway.sh route list
|
|
```
|
|
|
|
**Verify default model is configured:**
|
|
```bash
|
|
./get-ai-gateway.sh config list
|
|
```
|
|
|
|
**Check gateway logs:**
|
|
```bash
|
|
docker logs higress-ai-gateway | grep -i routing
|
|
```
|
|
|
|
**View access logs:**
|
|
```bash
|
|
tail -f ./higress/logs/access.log
|
|
```
|
|
|
|
## Configuration Issues
|
|
|
|
### Timezone detection fails
|
|
|
|
**Manually check timezone:**
|
|
```bash
|
|
timedatectl show --property=Timezone --value
|
|
```
|
|
|
|
**Or check timezone file:**
|
|
```bash
|
|
cat /etc/timezone
|
|
```
|
|
|
|
**Fallback behavior:**
|
|
- If detection fails, defaults to Hangzhou mirror
|
|
- Manual override: Set `IMAGE_REPO` environment variable
|
|
|
|
**Manual repository selection:**
|
|
```bash
|
|
# For China/Asia
|
|
IMAGE_REPO="higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/all-in-one"
|
|
|
|
# For Southeast Asia
|
|
IMAGE_REPO="higress-registry.ap-southeast-7.cr.aliyuncs.com/higress/all-in-one"
|
|
|
|
# For North America
|
|
IMAGE_REPO="higress-registry.us-west-1.cr.aliyuncs.com/higress/all-in-one"
|
|
|
|
# Use in deployment
|
|
IMAGE_REPO="$IMAGE_REPO" ./get-ai-gateway.sh start --non-interactive ...
|
|
```
|
|
|
|
## Performance Issues
|
|
|
|
### Slow image downloads
|
|
|
|
**Check selected repository:**
|
|
```bash
|
|
echo $IMAGE_REPO
|
|
```
|
|
|
|
**Manually select closest mirror:**
|
|
|
|
See [Configuration Issues → Timezone detection fails](#timezone-detection-fails) for manual repository selection.
|
|
|
|
### High memory usage
|
|
|
|
**Check container stats:**
|
|
```bash
|
|
docker stats higress-ai-gateway
|
|
```
|
|
|
|
**View resource limits:**
|
|
```bash
|
|
docker inspect higress-ai-gateway | grep -A 10 "HostConfig"
|
|
```
|
|
|
|
**Set memory limits:**
|
|
```bash
|
|
# Stop container
|
|
./get-ai-gateway.sh stop
|
|
|
|
# Manually restart with limits
|
|
docker run -d \
|
|
--name higress-ai-gateway \
|
|
--memory="4g" \
|
|
--memory-swap="4g" \
|
|
...
|
|
```
|
|
|
|
## Log Analysis
|
|
|
|
### Access logs location
|
|
|
|
```bash
|
|
# Default location
|
|
./higress/logs/access.log
|
|
|
|
# View real-time logs
|
|
tail -f ./higress/logs/access.log
|
|
```
|
|
|
|
### Container logs
|
|
|
|
```bash
|
|
# View all logs
|
|
docker logs higress-ai-gateway
|
|
|
|
# Follow logs
|
|
docker logs -f higress-ai-gateway
|
|
|
|
# Last 100 lines
|
|
docker logs --tail 100 higress-ai-gateway
|
|
|
|
# With timestamps
|
|
docker logs -t higress-ai-gateway
|
|
```
|
|
|
|
## Network Issues
|
|
|
|
### Cannot connect to gateway
|
|
|
|
**Verify container is running:**
|
|
```bash
|
|
docker ps | grep higress-ai-gateway
|
|
```
|
|
|
|
**Check port bindings:**
|
|
```bash
|
|
docker port higress-ai-gateway
|
|
```
|
|
|
|
**Test from inside container:**
|
|
```bash
|
|
docker exec higress-ai-gateway curl localhost:8080/v1/models
|
|
```
|
|
|
|
**Check firewall rules:**
|
|
```bash
|
|
# Check if port is accessible
|
|
sudo ufw status | grep 8080
|
|
|
|
# Allow port (if needed)
|
|
sudo ufw allow 8080/tcp
|
|
```
|
|
|
|
### DNS resolution issues
|
|
|
|
**Test from container:**
|
|
```bash
|
|
docker exec higress-ai-gateway ping -c 3 api.openai.com
|
|
```
|
|
|
|
**Check DNS settings:**
|
|
```bash
|
|
docker exec higress-ai-gateway cat /etc/resolv.conf
|
|
```
|
|
|
|
## Getting Help
|
|
|
|
If you're still experiencing issues:
|
|
|
|
1. **Collect logs:**
|
|
```bash
|
|
docker logs higress-ai-gateway > gateway.log 2>&1
|
|
cat ./higress/logs/access.log > access.log
|
|
```
|
|
|
|
2. **Check system info:**
|
|
```bash
|
|
docker version
|
|
docker info
|
|
uname -a
|
|
cat /proc/sys/fs/inotify/max_user_instances
|
|
```
|
|
|
|
3. **Report issue:**
|
|
- Repository: https://github.com/higress-group/higress-standalone
|
|
- Include: logs, system info, deployment command used
|