Commit Graph

717 Commits

Author SHA1 Message Date
程序员阿江-Relakkes
51a7d94de8 Merge pull request #821 from wanzirong/feature/max-concurrency-param
feat: 添加并发爬虫数量控制参数 --max_concurrency_num
2026-01-31 00:31:15 +08:00
wanzirong
df39d293de 修改--max_concurrency为--max_concurrency_num,保持命名一致 2026-01-30 11:15:06 +08:00
wanzirong
79048e265e feat: 添加并发爬虫数量控制参数
- 新增 --max_concurrency 命令行参数
- 用于控制并发爬虫数量
- 默认值为 1
2026-01-30 11:15:05 +08:00
程序员阿江-Relakkes
94553fd818 Merge pull request #817 from wanzirong/dev
feat: 添加命令行参数控制评论爬取数量
2026-01-21 16:49:13 +08:00
wanzirong
90f72536ba refactor: 简化命令行参数命名
- 将 --max_comments_per_post 重命名为 --max_comments_count_singlenotes,与配置项名称保持一致
- 移除 --xhs_sort_type 参数(暂不需要)
- 保持代码简洁,减少不必要的功能
2026-01-21 16:30:07 +08:00
wanzirong
f7d27ab43a feat: 添加命令行参数支持
- 添加 --max_comments_per_post 参数用于控制每个帖子爬取的评论数量
- 添加 --xhs_sort_type 参数用于控制小红书排序方式
- 修复小红书 core.py 中 CRAWLER_MAX_COMMENTS_COUNT_SINGLENOTES 的导入方式
  从直接导入改为通过 config 模块访问,使命令行参数能正确生效
2026-01-21 16:23:47 +08:00
程序员阿江(Relakkes)
be5b786a74 docs: update docs 2026-01-19 12:23:04 +08:00
程序员阿江-Relakkes
04fb716a44 Merge pull request #815 from 2470370075g-ux/fix-typo
修复拼写错误
2026-01-18 22:24:57 +08:00
WangXX
1f89713b90 修复拼写错误 2026-01-18 22:22:31 +08:00
程序员阿江-Relakkes
00a9e19139 Merge pull request #809 from orbisai0security/fix-cve-2023-50447-requirements.txt
[Security] Fix CRITICAL vulnerability: CVE-2023-50447
2026-01-13 14:40:23 +08:00
orbisai0security
8a2c349d67 fix: resolve critical vulnerability CVE-2023-50447
Automatically generated security fix
2026-01-12 15:10:10 +00:00
程序员阿江(Relakkes)
4de2a325a9 feat: ks comment api upgrade to v2 2026-01-09 21:09:39 +08:00
程序员阿江-Relakkes
2517e51ed4 Merge pull request #805 from MissMyDearBear/feature-bear
fix the login status error after scan the QR code
2026-01-09 14:18:16 +08:00
Alen Bear
e3d7fa7bed Merge branch 'NanmiCoder:main' into feature-bear 2026-01-09 14:14:37 +08:00
bear
a59b385615 fix the login status error after scan the QR code 2026-01-09 14:11:47 +08:00
程序员阿江-Relakkes
7c240747b6 Merge pull request #807 from DoiiarX/main
feat(database): add PostgreSQL support and fix Windows subprocess encoding
2026-01-09 10:53:57 +08:00
Doiiars
70a6ca55bb feat(database): add PostgreSQL support and fix Windows subprocess encoding 2026-01-09 00:41:59 +08:00
程序员阿江(Relakkes)
57b688fea4 feat: webui support light theme 2026-01-06 11:16:48 +08:00
程序员阿江(Relakkes)
ee4539c8fa chore: stop tracking .DS_Store 2026-01-06 11:11:49 +08:00
程序员阿江(Relakkes)
c895f53e22 fix: #803 2026-01-05 22:29:34 +08:00
程序员阿江(Relakkes)
99db95c499 fix: 'utf-8' codec can't decode error 2026-01-04 10:48:15 +08:00
程序员阿江-Relakkes
483c5ec8c6 Merge pull request #802 from Cae1anSou/fix/douyin-concurrent-comments
fix: fetch Douyin comments concurrently after each page instead of waiting for all pages
2026-01-03 22:38:26 +08:00
Caelan_Windows
c56b8c4c5d fix(douyin): fetch comments concurrently after each page instead of waiting for all pages
- Moved batch_get_note_comments call inside the pagination loop
- Comments are now fetched immediately after each page of videos is processed
- This allows real-time observation of comment crawling progress
- Improves data availability by not waiting for all video data to be collected first
2026-01-03 01:47:24 +08:00
程序员阿江(Relakkes)
a47c119303 docs: update 2025-12-30 17:10:13 +08:00
程序员阿江(Relakkes)
157ddfb21b i18n: translate all Chinese comments, docstrings, and logger messages to English
Comprehensive translation of Chinese text to English across the entire codebase:

- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation

Preserved: Chinese disclaimer header (lines 10-18) for legal compliance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
1544d13dd5 docs: update README.md 2025-12-26 22:41:32 +08:00
程序员阿江(Relakkes)
55d8c7783f feat: webo full context support 2025-12-26 19:22:24 +08:00
程序员阿江(Relakkes)
ff1b681311 fix: weibo get note image fixed 2025-12-26 00:47:20 +08:00
程序员阿江(Relakkes)
11500ef57a fix: #799 2025-12-24 11:45:07 +08:00
程序员阿江(Relakkes)
b9663c6a6d fix: #798 2025-12-22 17:44:35 +08:00
程序员阿江(Relakkes)
1a38ae12bd docs: update README.md 2025-12-19 00:23:55 +08:00
程序员阿江(Relakkes)
4ceb94f9c8 docs: webui 支持文档 2025-12-19 00:15:53 +08:00
程序员阿江(Relakkes)
508675a251 feat(api): add WebUI API server with built frontend
- Add FastAPI server with WebSocket support for real-time logs
- Add crawler management API endpoints (start/stop/status)
- Add data browsing API endpoints (list files, preview, download)
- Include pre-built WebUI assets for serving frontend

API endpoints:
- POST /api/crawler/start - Start crawler task
- POST /api/crawler/stop - Stop crawler task
- GET /api/crawler/status - Get crawler status
- WS /api/ws/logs - Real-time log streaming
- GET /api/data/files - List data files
- GET /api/data/stats - Get data statistics

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 00:02:08 +08:00
程序员阿江(Relakkes)
eb66e57f60 feat(cmd): add --headless, --specified_id, --creator_id CLI options
- Add --headless option to control headless mode for Playwright and CDP
- Add --specified_id option for detail mode video/post IDs (comma-separated)
- Add --creator_id option for creator mode IDs (comma-separated)
- Auto-configure platform-specific ID lists (XHS, Bilibili, Douyin, Weibo, Kuaishou)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 23:59:14 +08:00
程序员阿江(Relakkes)
a8930555ac style: increase aside width and ad image size
- 增加右侧 aside 宽度从 256px 到 300px
- 增加广告图片宽度从 200px 到 280px

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 13:29:36 +08:00
程序员阿江(Relakkes)
fb66ef016d docs: add vitepress-plugin-mermaid for Mermaid diagram rendering
- 添加 vitepress-plugin-mermaid 和 mermaid 依赖
- 更新 VitePress 配置以支持 Mermaid 图表渲染
- 在 sidebar 中添加项目架构文档链接

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 13:25:21 +08:00
程序员阿江(Relakkes)
26c511e35f docs: add project architecture documentation with Mermaid diagrams
添加项目架构文档,包含:
- 系统架构总览图
- 数据流向图
- 爬虫基类体系和生命周期图
- 存储层架构图
- 代理、登录、缓存系统图
- 模块依赖关系图

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 13:16:32 +08:00
程序员阿江(Relakkes)
08fcf68b98 docs: update README.md 2025-12-17 12:12:29 +08:00
程序员阿江(Relakkes)
2426095123 docs: update README.md 2025-12-17 11:04:26 +08:00
程序员阿江(Relakkes)
3c75d4f1d0 docs: update docs style 2025-12-16 14:49:14 +08:00
程序员阿江(Relakkes)
332a07ce62 docs: update docs 2025-12-16 14:41:28 +08:00
程序员阿江(Relakkes)
8a0fd49b96 refactor: 抽离应用 runner 并优化退出清理
- 新增 tools/app_runner.py 统一信号/取消/清理超时逻辑
- main.py 精简为业务入口与资源清理实现
- CDPBrowserManager 不再覆盖已有 SIGINT/SIGTERM 处理器
2025-12-15 18:06:57 +08:00
程序员阿江(Relakkes)
9ade3b3eef chore: use playwright sign xhs and update dependency 2025-12-09 14:47:48 +08:00
程序员阿江-Relakkes
2600c48359 fix: xhs sub comment sign error
fix: params参数以及路径问题
2025-12-03 11:02:52 +08:00
MEI
ff9a1624f1 fix: params参数以及路径问题 2025-12-03 10:31:32 +08:00
程序员阿江-Relakkes
630d4c1614 Merge pull request #789 from NanmiCoder/feature/test_new_rule_01
docs: update data store
2025-11-28 22:35:45 +08:00
程序员阿江(Relakkes)
f14242c239 docs: update data store 2025-11-28 22:21:20 +08:00
程序员阿江(Relakkes)
29832ded91 chore: add coderowner rules 2025-11-28 22:17:40 +08:00
程序员阿江(Relakkes)
11f2802624 docs: update README.md 2025-11-28 18:16:04 +08:00
程序员阿江-Relakkes
ab19494883 Merge pull request #785 from hsparks-codes/feat/update_readme
docs: Move data storage section to separate guide
2025-11-28 18:07:56 +08:00