Commit Graph

8 Commits

Author SHA1 Message Date
程序员阿江(Relakkes)
8e93438fe5 Keep PR 900 overrides bounded and opt-in
The PR adds API limit overrides and static proxy support, but the review found that the default proxy provider changed to an invalid static placeholder and the new API fields accepted unbounded values. This keeps the existing proxy default intact, makes static proxy explicit via config or CLI, validates API limit ranges, and adds focused regression coverage for both paths.

Constraint: PR branch must remain contributor-branch compatible and avoid adding dependencies

Rejected: Keep static as the default provider | breaks existing --enable_ip_proxy defaults with an invalid placeholder URL

Rejected: Accept arbitrary integer limits | lets API callers request negative or excessive crawl sizes

Confidence: high

Scope-risk: narrow

Directive: Do not change proxy provider defaults when adding new providers; new providers should be opt-in and covered by provider-specific tests

Tested: uv run pytest tests/test_api_limits.py tests/test_static_proxy_provider.py

Tested: uv run pytest tests

Tested: uv run pytest test/test_utils.py

Tested: uv run python -m compileall api cmd_arg config proxy tests

Tested: git diff --cached --check

Not-tested: Live crawler run against external platforms or real proxy vendor endpoints
2026-05-29 21:27:52 +08:00
钟保罗
ec432eb63e feat: 启动任务接口添加帖子/视频数量与评论数量覆盖支持 2026-05-19 20:57:07 +08:00
程序员阿江(Relakkes)
0282e626c9 feat: 新增 JSONL 存储格式支持,默认存储格式改为 jsonl
JSONL(JSON Lines)每行一个 JSON 对象,采用 append 模式写入,
无需读取已有数据,大数据量下性能远优于 JSON 格式。

- 新增 AsyncFileWriter.write_to_jsonl() 核心方法
- 7 个平台新增 JsonlStoreImplement 类并注册到工厂
- 配置默认值从 json 改为 jsonl,CLI/API 枚举同步更新
- db_session.py 守卫条件加入 jsonl,避免误触 ValueError
- 词云生成支持读取 JSONL 文件,优先 jsonl 回退 json
- 原有 json 选项完全保留,向后兼容
- 更新相关文档和测试
2026-03-03 23:31:07 +08:00
Doiiars
70a6ca55bb feat(database): add PostgreSQL support and fix Windows subprocess encoding 2026-01-09 00:41:59 +08:00
程序员阿江(Relakkes)
57b688fea4 feat: webui support light theme 2026-01-06 11:16:48 +08:00
程序员阿江(Relakkes)
157ddfb21b i18n: translate all Chinese comments, docstrings, and logger messages to English
Comprehensive translation of Chinese text to English across the entire codebase:

- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation

Preserved: Chinese disclaimer header (lines 10-18) for legal compliance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
11500ef57a fix: #799 2025-12-24 11:45:07 +08:00
程序员阿江(Relakkes)
508675a251 feat(api): add WebUI API server with built frontend
- Add FastAPI server with WebSocket support for real-time logs
- Add crawler management API endpoints (start/stop/status)
- Add data browsing API endpoints (list files, preview, download)
- Include pre-built WebUI assets for serving frontend

API endpoints:
- POST /api/crawler/start - Start crawler task
- POST /api/crawler/stop - Stop crawler task
- GET /api/crawler/status - Get crawler status
- WS /api/ws/logs - Real-time log streaming
- GET /api/data/files - List data files
- GET /api/data/stats - Get data statistics

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 00:02:08 +08:00