mirror of
https://github.com/alibaba/higress.git
synced 2026-05-28 06:37:26 +08:00
docs: add inotify max_user_instances troubleshooting to higress-clawdbot-integration skill (#3440)
This commit is contained in:
@@ -420,29 +420,12 @@ Selected plugin registry: higress-registry.us-west-1.cr.aliyuncs.com
|
|||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
### Container fails to start
|
For detailed troubleshooting guides, see [TROUBLESHOOTING.md](references/TROUBLESHOOTING.md).
|
||||||
- Check Docker is running: `docker info`
|
|
||||||
- Check port availability: `netstat -tlnp | grep 8080`
|
|
||||||
- View container logs: `docker logs higress-ai-gateway`
|
|
||||||
|
|
||||||
### Gateway not responding
|
Common issues:
|
||||||
- Check container status: `docker ps -a`
|
- **Container fails to start**: Check Docker status, port availability, and container logs
|
||||||
- Verify port mapping: `docker port higress-ai-gateway`
|
- **"too many open files" error**: Increase `fs.inotify.max_user_instances` to 8192
|
||||||
- Test locally: `curl http://localhost:8080/v1/models`
|
- **Gateway not responding**: Verify container status and port mapping
|
||||||
|
- **Plugin not recognized**: Check installation path and restart runtime
|
||||||
### Plugin not recognized
|
- **Auto-routing not working**: Verify model list and routing rules
|
||||||
- Verify plugin is installed at `~/.clawdbot/extensions/higress-ai-gateway` or `~/.openclaw/extensions/higress-ai-gateway`
|
- **Timezone detection fails**: Manually set `IMAGE_REPO` environment variable
|
||||||
- Check `package.json` contains correct extension field (`clawdbot.extensions` or `openclaw.extensions`)
|
|
||||||
- Restart Clawdbot/OpenClaw after installation
|
|
||||||
|
|
||||||
### Auto-routing not working
|
|
||||||
- Confirm `higress/auto` is in your model list
|
|
||||||
- Check routing rules exist: `./get-ai-gateway.sh route list`
|
|
||||||
- Verify default model is configured
|
|
||||||
- Check gateway logs for routing decisions
|
|
||||||
|
|
||||||
### Timezone detection fails
|
|
||||||
- Manually check timezone: `timedatectl show --property=Timezone --value`
|
|
||||||
- Or check `/etc/timezone` file
|
|
||||||
- Fallback to default Hangzhou mirror if detection fails
|
|
||||||
- Consider manually setting `IMAGE_REPO` environment variable if auto-detection is incorrect
|
|
||||||
|
|||||||
@@ -0,0 +1,325 @@
|
|||||||
|
# Higress AI Gateway - Troubleshooting
|
||||||
|
|
||||||
|
Common issues and solutions for Higress AI Gateway deployment and operation.
|
||||||
|
|
||||||
|
## Container Issues
|
||||||
|
|
||||||
|
### Container fails to start
|
||||||
|
|
||||||
|
**Check Docker is running:**
|
||||||
|
```bash
|
||||||
|
docker info
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check port availability:**
|
||||||
|
```bash
|
||||||
|
netstat -tlnp | grep 8080
|
||||||
|
```
|
||||||
|
|
||||||
|
**View container logs:**
|
||||||
|
```bash
|
||||||
|
docker logs higress-ai-gateway
|
||||||
|
```
|
||||||
|
|
||||||
|
### Gateway not responding
|
||||||
|
|
||||||
|
**Check container status:**
|
||||||
|
```bash
|
||||||
|
docker ps -a
|
||||||
|
```
|
||||||
|
|
||||||
|
**Verify port mapping:**
|
||||||
|
```bash
|
||||||
|
docker port higress-ai-gateway
|
||||||
|
```
|
||||||
|
|
||||||
|
**Test locally:**
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8080/v1/models
|
||||||
|
```
|
||||||
|
|
||||||
|
## File System Issues
|
||||||
|
|
||||||
|
### "too many open files" error from API server
|
||||||
|
|
||||||
|
**Symptom:**
|
||||||
|
```
|
||||||
|
panic: unable to create REST storage for a resource due to too many open files, will die
|
||||||
|
```
|
||||||
|
or
|
||||||
|
```
|
||||||
|
command failed err="failed to create shared file watcher: too many open files"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Root Cause:**
|
||||||
|
|
||||||
|
The system's `fs.inotify.max_user_instances` limit is too low. This commonly occurs on systems with many Docker containers, as each container can consume inotify instances.
|
||||||
|
|
||||||
|
**Check current limit:**
|
||||||
|
```bash
|
||||||
|
cat /proc/sys/fs/inotify/max_user_instances
|
||||||
|
```
|
||||||
|
|
||||||
|
Default is often 128, which is insufficient when running multiple containers.
|
||||||
|
|
||||||
|
**Solution:**
|
||||||
|
|
||||||
|
Increase the inotify instance limit to 8192:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Temporarily (until next reboot)
|
||||||
|
sudo sysctl -w fs.inotify.max_user_instances=8192
|
||||||
|
|
||||||
|
# Permanently (survives reboots)
|
||||||
|
echo "fs.inotify.max_user_instances = 8192" | sudo tee -a /etc/sysctl.conf
|
||||||
|
sudo sysctl -p
|
||||||
|
```
|
||||||
|
|
||||||
|
**Verify:**
|
||||||
|
```bash
|
||||||
|
cat /proc/sys/fs/inotify/max_user_instances
|
||||||
|
# Should output: 8192
|
||||||
|
```
|
||||||
|
|
||||||
|
**Restart the container:**
|
||||||
|
```bash
|
||||||
|
docker restart higress-ai-gateway
|
||||||
|
```
|
||||||
|
|
||||||
|
**Additional inotify tunables** (if still experiencing issues):
|
||||||
|
```bash
|
||||||
|
# Increase max watches per user
|
||||||
|
sudo sysctl -w fs.inotify.max_user_watches=524288
|
||||||
|
|
||||||
|
# Increase max queued events
|
||||||
|
sudo sysctl -w fs.inotify.max_queued_events=32768
|
||||||
|
```
|
||||||
|
|
||||||
|
To make these permanent as well:
|
||||||
|
```bash
|
||||||
|
echo "fs.inotify.max_user_watches = 524288" | sudo tee -a /etc/sysctl.conf
|
||||||
|
echo "fs.inotify.max_queued_events = 32768" | sudo tee -a /etc/sysctl.conf
|
||||||
|
sudo sysctl -p
|
||||||
|
```
|
||||||
|
|
||||||
|
## Plugin Issues
|
||||||
|
|
||||||
|
### Plugin not recognized
|
||||||
|
|
||||||
|
**Verify plugin installation:**
|
||||||
|
|
||||||
|
For Clawdbot:
|
||||||
|
```bash
|
||||||
|
ls -la ~/.clawdbot/extensions/higress-ai-gateway
|
||||||
|
```
|
||||||
|
|
||||||
|
For OpenClaw:
|
||||||
|
```bash
|
||||||
|
ls -la ~/.openclaw/extensions/higress-ai-gateway
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check package.json:**
|
||||||
|
|
||||||
|
Ensure `package.json` contains the correct extension field:
|
||||||
|
- Clawdbot: `"clawdbot.extensions"`
|
||||||
|
- OpenClaw: `"openclaw.extensions"`
|
||||||
|
|
||||||
|
**Restart the runtime:**
|
||||||
|
```bash
|
||||||
|
# Restart Clawdbot gateway
|
||||||
|
clawdbot gateway restart
|
||||||
|
|
||||||
|
# Or OpenClaw gateway
|
||||||
|
openclaw gateway restart
|
||||||
|
```
|
||||||
|
|
||||||
|
## Routing Issues
|
||||||
|
|
||||||
|
### Auto-routing not working
|
||||||
|
|
||||||
|
**Confirm model is in list:**
|
||||||
|
```bash
|
||||||
|
# Check if higress/auto is available
|
||||||
|
clawdbot models list | grep "higress/auto"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check routing rules exist:**
|
||||||
|
```bash
|
||||||
|
./get-ai-gateway.sh route list
|
||||||
|
```
|
||||||
|
|
||||||
|
**Verify default model is configured:**
|
||||||
|
```bash
|
||||||
|
./get-ai-gateway.sh config list
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check gateway logs:**
|
||||||
|
```bash
|
||||||
|
docker logs higress-ai-gateway | grep -i routing
|
||||||
|
```
|
||||||
|
|
||||||
|
**View access logs:**
|
||||||
|
```bash
|
||||||
|
tail -f ./higress/logs/access.log
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration Issues
|
||||||
|
|
||||||
|
### Timezone detection fails
|
||||||
|
|
||||||
|
**Manually check timezone:**
|
||||||
|
```bash
|
||||||
|
timedatectl show --property=Timezone --value
|
||||||
|
```
|
||||||
|
|
||||||
|
**Or check timezone file:**
|
||||||
|
```bash
|
||||||
|
cat /etc/timezone
|
||||||
|
```
|
||||||
|
|
||||||
|
**Fallback behavior:**
|
||||||
|
- If detection fails, defaults to Hangzhou mirror
|
||||||
|
- Manual override: Set `IMAGE_REPO` environment variable
|
||||||
|
|
||||||
|
**Manual repository selection:**
|
||||||
|
```bash
|
||||||
|
# For China/Asia
|
||||||
|
IMAGE_REPO="higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/all-in-one"
|
||||||
|
|
||||||
|
# For Southeast Asia
|
||||||
|
IMAGE_REPO="higress-registry.ap-southeast-7.cr.aliyuncs.com/higress/all-in-one"
|
||||||
|
|
||||||
|
# For North America
|
||||||
|
IMAGE_REPO="higress-registry.us-west-1.cr.aliyuncs.com/higress/all-in-one"
|
||||||
|
|
||||||
|
# Use in deployment
|
||||||
|
IMAGE_REPO="$IMAGE_REPO" ./get-ai-gateway.sh start --non-interactive ...
|
||||||
|
```
|
||||||
|
|
||||||
|
## Performance Issues
|
||||||
|
|
||||||
|
### Slow image downloads
|
||||||
|
|
||||||
|
**Check selected repository:**
|
||||||
|
```bash
|
||||||
|
echo $IMAGE_REPO
|
||||||
|
```
|
||||||
|
|
||||||
|
**Manually select closest mirror:**
|
||||||
|
|
||||||
|
See [Configuration Issues → Timezone detection fails](#timezone-detection-fails) for manual repository selection.
|
||||||
|
|
||||||
|
### High memory usage
|
||||||
|
|
||||||
|
**Check container stats:**
|
||||||
|
```bash
|
||||||
|
docker stats higress-ai-gateway
|
||||||
|
```
|
||||||
|
|
||||||
|
**View resource limits:**
|
||||||
|
```bash
|
||||||
|
docker inspect higress-ai-gateway | grep -A 10 "HostConfig"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Set memory limits:**
|
||||||
|
```bash
|
||||||
|
# Stop container
|
||||||
|
./get-ai-gateway.sh stop
|
||||||
|
|
||||||
|
# Manually restart with limits
|
||||||
|
docker run -d \
|
||||||
|
--name higress-ai-gateway \
|
||||||
|
--memory="4g" \
|
||||||
|
--memory-swap="4g" \
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
## Log Analysis
|
||||||
|
|
||||||
|
### Access logs location
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Default location
|
||||||
|
./higress/logs/access.log
|
||||||
|
|
||||||
|
# View real-time logs
|
||||||
|
tail -f ./higress/logs/access.log
|
||||||
|
```
|
||||||
|
|
||||||
|
### Container logs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# View all logs
|
||||||
|
docker logs higress-ai-gateway
|
||||||
|
|
||||||
|
# Follow logs
|
||||||
|
docker logs -f higress-ai-gateway
|
||||||
|
|
||||||
|
# Last 100 lines
|
||||||
|
docker logs --tail 100 higress-ai-gateway
|
||||||
|
|
||||||
|
# With timestamps
|
||||||
|
docker logs -t higress-ai-gateway
|
||||||
|
```
|
||||||
|
|
||||||
|
## Network Issues
|
||||||
|
|
||||||
|
### Cannot connect to gateway
|
||||||
|
|
||||||
|
**Verify container is running:**
|
||||||
|
```bash
|
||||||
|
docker ps | grep higress-ai-gateway
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check port bindings:**
|
||||||
|
```bash
|
||||||
|
docker port higress-ai-gateway
|
||||||
|
```
|
||||||
|
|
||||||
|
**Test from inside container:**
|
||||||
|
```bash
|
||||||
|
docker exec higress-ai-gateway curl localhost:8080/v1/models
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check firewall rules:**
|
||||||
|
```bash
|
||||||
|
# Check if port is accessible
|
||||||
|
sudo ufw status | grep 8080
|
||||||
|
|
||||||
|
# Allow port (if needed)
|
||||||
|
sudo ufw allow 8080/tcp
|
||||||
|
```
|
||||||
|
|
||||||
|
### DNS resolution issues
|
||||||
|
|
||||||
|
**Test from container:**
|
||||||
|
```bash
|
||||||
|
docker exec higress-ai-gateway ping -c 3 api.openai.com
|
||||||
|
```
|
||||||
|
|
||||||
|
**Check DNS settings:**
|
||||||
|
```bash
|
||||||
|
docker exec higress-ai-gateway cat /etc/resolv.conf
|
||||||
|
```
|
||||||
|
|
||||||
|
## Getting Help
|
||||||
|
|
||||||
|
If you're still experiencing issues:
|
||||||
|
|
||||||
|
1. **Collect logs:**
|
||||||
|
```bash
|
||||||
|
docker logs higress-ai-gateway > gateway.log 2>&1
|
||||||
|
cat ./higress/logs/access.log > access.log
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Check system info:**
|
||||||
|
```bash
|
||||||
|
docker version
|
||||||
|
docker info
|
||||||
|
uname -a
|
||||||
|
cat /proc/sys/fs/inotify/max_user_instances
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Report issue:**
|
||||||
|
- Repository: https://github.com/higress-group/higress-standalone
|
||||||
|
- Include: logs, system info, deployment command used
|
||||||
Reference in New Issue
Block a user