mirror of
https://github.com/alibaba/higress.git
synced 2026-03-05 09:00:47 +08:00
docs: add inotify max_user_instances troubleshooting to higress-clawdbot-integration skill (#3440)
This commit is contained in:
@@ -420,29 +420,12 @@ Selected plugin registry: higress-registry.us-west-1.cr.aliyuncs.com
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Container fails to start
|
||||
- Check Docker is running: `docker info`
|
||||
- Check port availability: `netstat -tlnp | grep 8080`
|
||||
- View container logs: `docker logs higress-ai-gateway`
|
||||
For detailed troubleshooting guides, see [TROUBLESHOOTING.md](references/TROUBLESHOOTING.md).
|
||||
|
||||
### Gateway not responding
|
||||
- Check container status: `docker ps -a`
|
||||
- Verify port mapping: `docker port higress-ai-gateway`
|
||||
- Test locally: `curl http://localhost:8080/v1/models`
|
||||
|
||||
### Plugin not recognized
|
||||
- Verify plugin is installed at `~/.clawdbot/extensions/higress-ai-gateway` or `~/.openclaw/extensions/higress-ai-gateway`
|
||||
- Check `package.json` contains correct extension field (`clawdbot.extensions` or `openclaw.extensions`)
|
||||
- Restart Clawdbot/OpenClaw after installation
|
||||
|
||||
### Auto-routing not working
|
||||
- Confirm `higress/auto` is in your model list
|
||||
- Check routing rules exist: `./get-ai-gateway.sh route list`
|
||||
- Verify default model is configured
|
||||
- Check gateway logs for routing decisions
|
||||
|
||||
### Timezone detection fails
|
||||
- Manually check timezone: `timedatectl show --property=Timezone --value`
|
||||
- Or check `/etc/timezone` file
|
||||
- Fallback to default Hangzhou mirror if detection fails
|
||||
- Consider manually setting `IMAGE_REPO` environment variable if auto-detection is incorrect
|
||||
Common issues:
|
||||
- **Container fails to start**: Check Docker status, port availability, and container logs
|
||||
- **"too many open files" error**: Increase `fs.inotify.max_user_instances` to 8192
|
||||
- **Gateway not responding**: Verify container status and port mapping
|
||||
- **Plugin not recognized**: Check installation path and restart runtime
|
||||
- **Auto-routing not working**: Verify model list and routing rules
|
||||
- **Timezone detection fails**: Manually set `IMAGE_REPO` environment variable
|
||||
|
||||
@@ -0,0 +1,325 @@
|
||||
# Higress AI Gateway - Troubleshooting
|
||||
|
||||
Common issues and solutions for Higress AI Gateway deployment and operation.
|
||||
|
||||
## Container Issues
|
||||
|
||||
### Container fails to start
|
||||
|
||||
**Check Docker is running:**
|
||||
```bash
|
||||
docker info
|
||||
```
|
||||
|
||||
**Check port availability:**
|
||||
```bash
|
||||
netstat -tlnp | grep 8080
|
||||
```
|
||||
|
||||
**View container logs:**
|
||||
```bash
|
||||
docker logs higress-ai-gateway
|
||||
```
|
||||
|
||||
### Gateway not responding
|
||||
|
||||
**Check container status:**
|
||||
```bash
|
||||
docker ps -a
|
||||
```
|
||||
|
||||
**Verify port mapping:**
|
||||
```bash
|
||||
docker port higress-ai-gateway
|
||||
```
|
||||
|
||||
**Test locally:**
|
||||
```bash
|
||||
curl http://localhost:8080/v1/models
|
||||
```
|
||||
|
||||
## File System Issues
|
||||
|
||||
### "too many open files" error from API server
|
||||
|
||||
**Symptom:**
|
||||
```
|
||||
panic: unable to create REST storage for a resource due to too many open files, will die
|
||||
```
|
||||
or
|
||||
```
|
||||
command failed err="failed to create shared file watcher: too many open files"
|
||||
```
|
||||
|
||||
**Root Cause:**
|
||||
|
||||
The system's `fs.inotify.max_user_instances` limit is too low. This commonly occurs on systems with many Docker containers, as each container can consume inotify instances.
|
||||
|
||||
**Check current limit:**
|
||||
```bash
|
||||
cat /proc/sys/fs/inotify/max_user_instances
|
||||
```
|
||||
|
||||
Default is often 128, which is insufficient when running multiple containers.
|
||||
|
||||
**Solution:**
|
||||
|
||||
Increase the inotify instance limit to 8192:
|
||||
|
||||
```bash
|
||||
# Temporarily (until next reboot)
|
||||
sudo sysctl -w fs.inotify.max_user_instances=8192
|
||||
|
||||
# Permanently (survives reboots)
|
||||
echo "fs.inotify.max_user_instances = 8192" | sudo tee -a /etc/sysctl.conf
|
||||
sudo sysctl -p
|
||||
```
|
||||
|
||||
**Verify:**
|
||||
```bash
|
||||
cat /proc/sys/fs/inotify/max_user_instances
|
||||
# Should output: 8192
|
||||
```
|
||||
|
||||
**Restart the container:**
|
||||
```bash
|
||||
docker restart higress-ai-gateway
|
||||
```
|
||||
|
||||
**Additional inotify tunables** (if still experiencing issues):
|
||||
```bash
|
||||
# Increase max watches per user
|
||||
sudo sysctl -w fs.inotify.max_user_watches=524288
|
||||
|
||||
# Increase max queued events
|
||||
sudo sysctl -w fs.inotify.max_queued_events=32768
|
||||
```
|
||||
|
||||
To make these permanent as well:
|
||||
```bash
|
||||
echo "fs.inotify.max_user_watches = 524288" | sudo tee -a /etc/sysctl.conf
|
||||
echo "fs.inotify.max_queued_events = 32768" | sudo tee -a /etc/sysctl.conf
|
||||
sudo sysctl -p
|
||||
```
|
||||
|
||||
## Plugin Issues
|
||||
|
||||
### Plugin not recognized
|
||||
|
||||
**Verify plugin installation:**
|
||||
|
||||
For Clawdbot:
|
||||
```bash
|
||||
ls -la ~/.clawdbot/extensions/higress-ai-gateway
|
||||
```
|
||||
|
||||
For OpenClaw:
|
||||
```bash
|
||||
ls -la ~/.openclaw/extensions/higress-ai-gateway
|
||||
```
|
||||
|
||||
**Check package.json:**
|
||||
|
||||
Ensure `package.json` contains the correct extension field:
|
||||
- Clawdbot: `"clawdbot.extensions"`
|
||||
- OpenClaw: `"openclaw.extensions"`
|
||||
|
||||
**Restart the runtime:**
|
||||
```bash
|
||||
# Restart Clawdbot gateway
|
||||
clawdbot gateway restart
|
||||
|
||||
# Or OpenClaw gateway
|
||||
openclaw gateway restart
|
||||
```
|
||||
|
||||
## Routing Issues
|
||||
|
||||
### Auto-routing not working
|
||||
|
||||
**Confirm model is in list:**
|
||||
```bash
|
||||
# Check if higress/auto is available
|
||||
clawdbot models list | grep "higress/auto"
|
||||
```
|
||||
|
||||
**Check routing rules exist:**
|
||||
```bash
|
||||
./get-ai-gateway.sh route list
|
||||
```
|
||||
|
||||
**Verify default model is configured:**
|
||||
```bash
|
||||
./get-ai-gateway.sh config list
|
||||
```
|
||||
|
||||
**Check gateway logs:**
|
||||
```bash
|
||||
docker logs higress-ai-gateway | grep -i routing
|
||||
```
|
||||
|
||||
**View access logs:**
|
||||
```bash
|
||||
tail -f ./higress/logs/access.log
|
||||
```
|
||||
|
||||
## Configuration Issues
|
||||
|
||||
### Timezone detection fails
|
||||
|
||||
**Manually check timezone:**
|
||||
```bash
|
||||
timedatectl show --property=Timezone --value
|
||||
```
|
||||
|
||||
**Or check timezone file:**
|
||||
```bash
|
||||
cat /etc/timezone
|
||||
```
|
||||
|
||||
**Fallback behavior:**
|
||||
- If detection fails, defaults to Hangzhou mirror
|
||||
- Manual override: Set `IMAGE_REPO` environment variable
|
||||
|
||||
**Manual repository selection:**
|
||||
```bash
|
||||
# For China/Asia
|
||||
IMAGE_REPO="higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/all-in-one"
|
||||
|
||||
# For Southeast Asia
|
||||
IMAGE_REPO="higress-registry.ap-southeast-7.cr.aliyuncs.com/higress/all-in-one"
|
||||
|
||||
# For North America
|
||||
IMAGE_REPO="higress-registry.us-west-1.cr.aliyuncs.com/higress/all-in-one"
|
||||
|
||||
# Use in deployment
|
||||
IMAGE_REPO="$IMAGE_REPO" ./get-ai-gateway.sh start --non-interactive ...
|
||||
```
|
||||
|
||||
## Performance Issues
|
||||
|
||||
### Slow image downloads
|
||||
|
||||
**Check selected repository:**
|
||||
```bash
|
||||
echo $IMAGE_REPO
|
||||
```
|
||||
|
||||
**Manually select closest mirror:**
|
||||
|
||||
See [Configuration Issues → Timezone detection fails](#timezone-detection-fails) for manual repository selection.
|
||||
|
||||
### High memory usage
|
||||
|
||||
**Check container stats:**
|
||||
```bash
|
||||
docker stats higress-ai-gateway
|
||||
```
|
||||
|
||||
**View resource limits:**
|
||||
```bash
|
||||
docker inspect higress-ai-gateway | grep -A 10 "HostConfig"
|
||||
```
|
||||
|
||||
**Set memory limits:**
|
||||
```bash
|
||||
# Stop container
|
||||
./get-ai-gateway.sh stop
|
||||
|
||||
# Manually restart with limits
|
||||
docker run -d \
|
||||
--name higress-ai-gateway \
|
||||
--memory="4g" \
|
||||
--memory-swap="4g" \
|
||||
...
|
||||
```
|
||||
|
||||
## Log Analysis
|
||||
|
||||
### Access logs location
|
||||
|
||||
```bash
|
||||
# Default location
|
||||
./higress/logs/access.log
|
||||
|
||||
# View real-time logs
|
||||
tail -f ./higress/logs/access.log
|
||||
```
|
||||
|
||||
### Container logs
|
||||
|
||||
```bash
|
||||
# View all logs
|
||||
docker logs higress-ai-gateway
|
||||
|
||||
# Follow logs
|
||||
docker logs -f higress-ai-gateway
|
||||
|
||||
# Last 100 lines
|
||||
docker logs --tail 100 higress-ai-gateway
|
||||
|
||||
# With timestamps
|
||||
docker logs -t higress-ai-gateway
|
||||
```
|
||||
|
||||
## Network Issues
|
||||
|
||||
### Cannot connect to gateway
|
||||
|
||||
**Verify container is running:**
|
||||
```bash
|
||||
docker ps | grep higress-ai-gateway
|
||||
```
|
||||
|
||||
**Check port bindings:**
|
||||
```bash
|
||||
docker port higress-ai-gateway
|
||||
```
|
||||
|
||||
**Test from inside container:**
|
||||
```bash
|
||||
docker exec higress-ai-gateway curl localhost:8080/v1/models
|
||||
```
|
||||
|
||||
**Check firewall rules:**
|
||||
```bash
|
||||
# Check if port is accessible
|
||||
sudo ufw status | grep 8080
|
||||
|
||||
# Allow port (if needed)
|
||||
sudo ufw allow 8080/tcp
|
||||
```
|
||||
|
||||
### DNS resolution issues
|
||||
|
||||
**Test from container:**
|
||||
```bash
|
||||
docker exec higress-ai-gateway ping -c 3 api.openai.com
|
||||
```
|
||||
|
||||
**Check DNS settings:**
|
||||
```bash
|
||||
docker exec higress-ai-gateway cat /etc/resolv.conf
|
||||
```
|
||||
|
||||
## Getting Help
|
||||
|
||||
If you're still experiencing issues:
|
||||
|
||||
1. **Collect logs:**
|
||||
```bash
|
||||
docker logs higress-ai-gateway > gateway.log 2>&1
|
||||
cat ./higress/logs/access.log > access.log
|
||||
```
|
||||
|
||||
2. **Check system info:**
|
||||
```bash
|
||||
docker version
|
||||
docker info
|
||||
uname -a
|
||||
cat /proc/sys/fs/inotify/max_user_instances
|
||||
```
|
||||
|
||||
3. **Report issue:**
|
||||
- Repository: https://github.com/higress-group/higress-standalone
|
||||
- Include: logs, system info, deployment command used
|
||||
Reference in New Issue
Block a user