程序员阿江-Relakkes
51a7d94de8
Merge pull request #821 from wanzirong/feature/max-concurrency-param
...
feat: 添加并发爬虫数量控制参数 --max_concurrency_num
2026-01-31 00:31:15 +08:00
wanzirong
df39d293de
修改--max_concurrency为--max_concurrency_num,保持命名一致
2026-01-30 11:15:06 +08:00
wanzirong
79048e265e
feat: 添加并发爬虫数量控制参数
...
- 新增 --max_concurrency 命令行参数
- 用于控制并发爬虫数量
- 默认值为 1
2026-01-30 11:15:05 +08:00
程序员阿江-Relakkes
94553fd818
Merge pull request #817 from wanzirong/dev
...
feat: 添加命令行参数控制评论爬取数量
2026-01-21 16:49:13 +08:00
wanzirong
90f72536ba
refactor: 简化命令行参数命名
...
- 将 --max_comments_per_post 重命名为 --max_comments_count_singlenotes,与配置项名称保持一致
- 移除 --xhs_sort_type 参数(暂不需要)
- 保持代码简洁,减少不必要的功能
2026-01-21 16:30:07 +08:00
wanzirong
f7d27ab43a
feat: 添加命令行参数支持
...
- 添加 --max_comments_per_post 参数用于控制每个帖子爬取的评论数量
- 添加 --xhs_sort_type 参数用于控制小红书排序方式
- 修复小红书 core.py 中 CRAWLER_MAX_COMMENTS_COUNT_SINGLENOTES 的导入方式
从直接导入改为通过 config 模块访问,使命令行参数能正确生效
2026-01-21 16:23:47 +08:00
程序员阿江(Relakkes)
be5b786a74
docs: update docs
2026-01-19 12:23:04 +08:00
程序员阿江-Relakkes
04fb716a44
Merge pull request #815 from 2470370075g-ux/fix-typo
...
修复拼写错误
2026-01-18 22:24:57 +08:00
WangXX
1f89713b90
修复拼写错误
2026-01-18 22:22:31 +08:00
程序员阿江-Relakkes
00a9e19139
Merge pull request #809 from orbisai0security/fix-cve-2023-50447-requirements.txt
...
[Security] Fix CRITICAL vulnerability: CVE-2023-50447
2026-01-13 14:40:23 +08:00
orbisai0security
8a2c349d67
fix: resolve critical vulnerability CVE-2023-50447
...
Automatically generated security fix
2026-01-12 15:10:10 +00:00
程序员阿江(Relakkes)
4de2a325a9
feat: ks comment api upgrade to v2
2026-01-09 21:09:39 +08:00
程序员阿江-Relakkes
2517e51ed4
Merge pull request #805 from MissMyDearBear/feature-bear
...
fix the login status error after scan the QR code
2026-01-09 14:18:16 +08:00
Alen Bear
e3d7fa7bed
Merge branch 'NanmiCoder:main' into feature-bear
2026-01-09 14:14:37 +08:00
bear
a59b385615
fix the login status error after scan the QR code
2026-01-09 14:11:47 +08:00
程序员阿江-Relakkes
7c240747b6
Merge pull request #807 from DoiiarX/main
...
feat(database): add PostgreSQL support and fix Windows subprocess encoding
2026-01-09 10:53:57 +08:00
Doiiars
70a6ca55bb
feat(database): add PostgreSQL support and fix Windows subprocess encoding
2026-01-09 00:41:59 +08:00
程序员阿江(Relakkes)
57b688fea4
feat: webui support light theme
2026-01-06 11:16:48 +08:00
程序员阿江(Relakkes)
ee4539c8fa
chore: stop tracking .DS_Store
2026-01-06 11:11:49 +08:00
程序员阿江(Relakkes)
c895f53e22
fix : #803
2026-01-05 22:29:34 +08:00
程序员阿江(Relakkes)
99db95c499
fix: 'utf-8' codec can't decode error
2026-01-04 10:48:15 +08:00
程序员阿江-Relakkes
483c5ec8c6
Merge pull request #802 from Cae1anSou/fix/douyin-concurrent-comments
...
fix: fetch Douyin comments concurrently after each page instead of waiting for all pages
2026-01-03 22:38:26 +08:00
Caelan_Windows
c56b8c4c5d
fix(douyin): fetch comments concurrently after each page instead of waiting for all pages
...
- Moved batch_get_note_comments call inside the pagination loop
- Comments are now fetched immediately after each page of videos is processed
- This allows real-time observation of comment crawling progress
- Improves data availability by not waiting for all video data to be collected first
2026-01-03 01:47:24 +08:00
程序员阿江(Relakkes)
a47c119303
docs: update
2025-12-30 17:10:13 +08:00
程序员阿江(Relakkes)
157ddfb21b
i18n: translate all Chinese comments, docstrings, and logger messages to English
...
Comprehensive translation of Chinese text to English across the entire codebase:
- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation
Preserved: Chinese disclaimer header (lines 10-18) for legal compliance
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
1544d13dd5
docs: update README.md
2025-12-26 22:41:32 +08:00
程序员阿江(Relakkes)
55d8c7783f
feat: webo full context support
2025-12-26 19:22:24 +08:00
程序员阿江(Relakkes)
ff1b681311
fix: weibo get note image fixed
2025-12-26 00:47:20 +08:00
程序员阿江(Relakkes)
11500ef57a
fix : #799
2025-12-24 11:45:07 +08:00
程序员阿江(Relakkes)
b9663c6a6d
fix : #798
2025-12-22 17:44:35 +08:00
程序员阿江(Relakkes)
1a38ae12bd
docs: update README.md
2025-12-19 00:23:55 +08:00
程序员阿江(Relakkes)
4ceb94f9c8
docs: webui 支持文档
2025-12-19 00:15:53 +08:00
程序员阿江(Relakkes)
508675a251
feat(api): add WebUI API server with built frontend
...
- Add FastAPI server with WebSocket support for real-time logs
- Add crawler management API endpoints (start/stop/status)
- Add data browsing API endpoints (list files, preview, download)
- Include pre-built WebUI assets for serving frontend
API endpoints:
- POST /api/crawler/start - Start crawler task
- POST /api/crawler/stop - Stop crawler task
- GET /api/crawler/status - Get crawler status
- WS /api/ws/logs - Real-time log streaming
- GET /api/data/files - List data files
- GET /api/data/stats - Get data statistics
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-19 00:02:08 +08:00
程序员阿江(Relakkes)
eb66e57f60
feat(cmd): add --headless, --specified_id, --creator_id CLI options
...
- Add --headless option to control headless mode for Playwright and CDP
- Add --specified_id option for detail mode video/post IDs (comma-separated)
- Add --creator_id option for creator mode IDs (comma-separated)
- Auto-configure platform-specific ID lists (XHS, Bilibili, Douyin, Weibo, Kuaishou)
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-18 23:59:14 +08:00
程序员阿江(Relakkes)
a8930555ac
style: increase aside width and ad image size
...
- 增加右侧 aside 宽度从 256px 到 300px
- 增加广告图片宽度从 200px 到 280px
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-18 13:29:36 +08:00
程序员阿江(Relakkes)
fb66ef016d
docs: add vitepress-plugin-mermaid for Mermaid diagram rendering
...
- 添加 vitepress-plugin-mermaid 和 mermaid 依赖
- 更新 VitePress 配置以支持 Mermaid 图表渲染
- 在 sidebar 中添加项目架构文档链接
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-18 13:25:21 +08:00
程序员阿江(Relakkes)
26c511e35f
docs: add project architecture documentation with Mermaid diagrams
...
添加项目架构文档,包含:
- 系统架构总览图
- 数据流向图
- 爬虫基类体系和生命周期图
- 存储层架构图
- 代理、登录、缓存系统图
- 模块依赖关系图
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-18 13:16:32 +08:00
程序员阿江(Relakkes)
08fcf68b98
docs: update README.md
2025-12-17 12:12:29 +08:00
程序员阿江(Relakkes)
2426095123
docs: update README.md
2025-12-17 11:04:26 +08:00
程序员阿江(Relakkes)
3c75d4f1d0
docs: update docs style
2025-12-16 14:49:14 +08:00
程序员阿江(Relakkes)
332a07ce62
docs: update docs
2025-12-16 14:41:28 +08:00
程序员阿江(Relakkes)
8a0fd49b96
refactor: 抽离应用 runner 并优化退出清理
...
- 新增 tools/app_runner.py 统一信号/取消/清理超时逻辑
- main.py 精简为业务入口与资源清理实现
- CDPBrowserManager 不再覆盖已有 SIGINT/SIGTERM 处理器
2025-12-15 18:06:57 +08:00
程序员阿江(Relakkes)
9ade3b3eef
chore: use playwright sign xhs and update dependency
2025-12-09 14:47:48 +08:00
程序员阿江-Relakkes
2600c48359
fix: xhs sub comment sign error
...
fix: params参数以及路径问题
2025-12-03 11:02:52 +08:00
MEI
ff9a1624f1
fix: params参数以及路径问题
2025-12-03 10:31:32 +08:00
程序员阿江-Relakkes
630d4c1614
Merge pull request #789 from NanmiCoder/feature/test_new_rule_01
...
docs: update data store
2025-11-28 22:35:45 +08:00
程序员阿江(Relakkes)
f14242c239
docs: update data store
2025-11-28 22:21:20 +08:00
程序员阿江(Relakkes)
29832ded91
chore: add coderowner rules
2025-11-28 22:17:40 +08:00
程序员阿江(Relakkes)
11f2802624
docs: update README.md
2025-11-28 18:16:04 +08:00
程序员阿江-Relakkes
ab19494883
Merge pull request #785 from hsparks-codes/feat/update_readme
...
docs: Move data storage section to separate guide
2025-11-28 18:07:56 +08:00