Commit Graph

701 Commits

Author SHA1 Message Date
Doiiars
70a6ca55bb feat(database): add PostgreSQL support and fix Windows subprocess encoding 2026-01-09 00:41:59 +08:00
程序员阿江(Relakkes)
57b688fea4 feat: webui support light theme 2026-01-06 11:16:48 +08:00
程序员阿江(Relakkes)
ee4539c8fa chore: stop tracking .DS_Store 2026-01-06 11:11:49 +08:00
程序员阿江(Relakkes)
c895f53e22 fix: #803 2026-01-05 22:29:34 +08:00
程序员阿江(Relakkes)
99db95c499 fix: 'utf-8' codec can't decode error 2026-01-04 10:48:15 +08:00
程序员阿江-Relakkes
483c5ec8c6 Merge pull request #802 from Cae1anSou/fix/douyin-concurrent-comments
fix: fetch Douyin comments concurrently after each page instead of waiting for all pages
2026-01-03 22:38:26 +08:00
Caelan_Windows
c56b8c4c5d fix(douyin): fetch comments concurrently after each page instead of waiting for all pages
- Moved batch_get_note_comments call inside the pagination loop
- Comments are now fetched immediately after each page of videos is processed
- This allows real-time observation of comment crawling progress
- Improves data availability by not waiting for all video data to be collected first
2026-01-03 01:47:24 +08:00
程序员阿江(Relakkes)
a47c119303 docs: update 2025-12-30 17:10:13 +08:00
程序员阿江(Relakkes)
157ddfb21b i18n: translate all Chinese comments, docstrings, and logger messages to English
Comprehensive translation of Chinese text to English across the entire codebase:

- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation

Preserved: Chinese disclaimer header (lines 10-18) for legal compliance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
1544d13dd5 docs: update README.md 2025-12-26 22:41:32 +08:00
程序员阿江(Relakkes)
55d8c7783f feat: webo full context support 2025-12-26 19:22:24 +08:00
程序员阿江(Relakkes)
ff1b681311 fix: weibo get note image fixed 2025-12-26 00:47:20 +08:00
程序员阿江(Relakkes)
11500ef57a fix: #799 2025-12-24 11:45:07 +08:00
程序员阿江(Relakkes)
b9663c6a6d fix: #798 2025-12-22 17:44:35 +08:00
程序员阿江(Relakkes)
1a38ae12bd docs: update README.md 2025-12-19 00:23:55 +08:00
程序员阿江(Relakkes)
4ceb94f9c8 docs: webui 支持文档 2025-12-19 00:15:53 +08:00
程序员阿江(Relakkes)
508675a251 feat(api): add WebUI API server with built frontend
- Add FastAPI server with WebSocket support for real-time logs
- Add crawler management API endpoints (start/stop/status)
- Add data browsing API endpoints (list files, preview, download)
- Include pre-built WebUI assets for serving frontend

API endpoints:
- POST /api/crawler/start - Start crawler task
- POST /api/crawler/stop - Stop crawler task
- GET /api/crawler/status - Get crawler status
- WS /api/ws/logs - Real-time log streaming
- GET /api/data/files - List data files
- GET /api/data/stats - Get data statistics

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 00:02:08 +08:00
程序员阿江(Relakkes)
eb66e57f60 feat(cmd): add --headless, --specified_id, --creator_id CLI options
- Add --headless option to control headless mode for Playwright and CDP
- Add --specified_id option for detail mode video/post IDs (comma-separated)
- Add --creator_id option for creator mode IDs (comma-separated)
- Auto-configure platform-specific ID lists (XHS, Bilibili, Douyin, Weibo, Kuaishou)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 23:59:14 +08:00
程序员阿江(Relakkes)
a8930555ac style: increase aside width and ad image size
- 增加右侧 aside 宽度从 256px 到 300px
- 增加广告图片宽度从 200px 到 280px

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 13:29:36 +08:00
程序员阿江(Relakkes)
fb66ef016d docs: add vitepress-plugin-mermaid for Mermaid diagram rendering
- 添加 vitepress-plugin-mermaid 和 mermaid 依赖
- 更新 VitePress 配置以支持 Mermaid 图表渲染
- 在 sidebar 中添加项目架构文档链接

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 13:25:21 +08:00
程序员阿江(Relakkes)
26c511e35f docs: add project architecture documentation with Mermaid diagrams
添加项目架构文档,包含:
- 系统架构总览图
- 数据流向图
- 爬虫基类体系和生命周期图
- 存储层架构图
- 代理、登录、缓存系统图
- 模块依赖关系图

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 13:16:32 +08:00
程序员阿江(Relakkes)
08fcf68b98 docs: update README.md 2025-12-17 12:12:29 +08:00
程序员阿江(Relakkes)
2426095123 docs: update README.md 2025-12-17 11:04:26 +08:00
程序员阿江(Relakkes)
3c75d4f1d0 docs: update docs style 2025-12-16 14:49:14 +08:00
程序员阿江(Relakkes)
332a07ce62 docs: update docs 2025-12-16 14:41:28 +08:00
程序员阿江(Relakkes)
8a0fd49b96 refactor: 抽离应用 runner 并优化退出清理
- 新增 tools/app_runner.py 统一信号/取消/清理超时逻辑
- main.py 精简为业务入口与资源清理实现
- CDPBrowserManager 不再覆盖已有 SIGINT/SIGTERM 处理器
2025-12-15 18:06:57 +08:00
程序员阿江(Relakkes)
9ade3b3eef chore: use playwright sign xhs and update dependency 2025-12-09 14:47:48 +08:00
程序员阿江-Relakkes
2600c48359 fix: xhs sub comment sign error
fix: params参数以及路径问题
2025-12-03 11:02:52 +08:00
MEI
ff9a1624f1 fix: params参数以及路径问题 2025-12-03 10:31:32 +08:00
程序员阿江-Relakkes
630d4c1614 Merge pull request #789 from NanmiCoder/feature/test_new_rule_01
docs: update data store
2025-11-28 22:35:45 +08:00
程序员阿江(Relakkes)
f14242c239 docs: update data store 2025-11-28 22:21:20 +08:00
程序员阿江(Relakkes)
29832ded91 chore: add coderowner rules 2025-11-28 22:17:40 +08:00
程序员阿江(Relakkes)
11f2802624 docs: update README.md 2025-11-28 18:16:04 +08:00
程序员阿江-Relakkes
ab19494883 Merge pull request #785 from hsparks-codes/feat/update_readme
docs: Move data storage section to separate guide
2025-11-28 18:07:56 +08:00
hsparks.codes
2bc9297812 docs: Move data storage section to separate guide
- Create comprehensive data storage guide (docs/data_storage_guide.md)
- Update README.md with link to storage guide instead of full details
- Update README_en.md with link to storage guide
- Bilingual guide (Chinese and English) in single document
- Includes all storage options: CSV, JSON, Excel, SQLite, MySQL
- Detailed usage examples and documentation links

This change improves README readability by moving detailed storage
information to a dedicated document while keeping main README concise.
2025-11-28 10:18:09 +01:00
程序员阿江-Relakkes
ba64c8ff9c Merge pull request #784 from NanmiCoder/feature/excel-export-and-tests
feat: excel store with other platform
2025-11-28 15:15:31 +08:00
程序员阿江-Relakkes
ebbf86d67b Merge pull request #783 from hsparks-codes/feature/excel-export-and-tests
feat: Add Excel export functionality and unit tests
2025-11-28 15:14:25 +08:00
程序员阿江(Relakkes)
6e858c1a00 feat: excel store with other platform 2025-11-28 15:12:36 +08:00
hsparks.codes
324f09cf9f fix: Update tests to handle openpyxl color format and ContextVar
- Fix header color assertion to check only RGB values (not alpha channel)
- Remove ContextVar mock as it cannot be patched in Python 3.11+
- All 17 tests now passing successfully
2025-11-28 05:04:00 +01:00
hsparks.codes
46ef86ddef feat: Add Excel export functionality and unit tests
Features:
- Excel export with formatted multi-sheet workbooks (Contents, Comments, Creators)
- Professional styling: blue headers, auto-width columns, borders, text wrapping
- Smart export: empty sheets automatically removed
- Support for all platforms (xhs, dy, ks, bili, wb, tieba, zhihu)

Testing:
- Added pytest framework with asyncio support
- Unit tests for Excel store functionality
- Unit tests for store factory pattern
- Shared fixtures for test data
- Test coverage for edge cases

Documentation:
- Comprehensive Excel export guide (docs/excel_export_guide.md)
- Updated README.md and README_en.md with Excel examples
- Updated config comments to include excel option

Dependencies:
- Added openpyxl>=3.1.2 for Excel support
- Added pytest>=7.4.0 and pytest-asyncio>=0.21.0 for testing

This contribution adds immediate value for users who need data analysis
capabilities and establishes a testing foundation for future development.
2025-11-28 04:44:12 +01:00
程序员阿江-Relakkes
31a092c653 Merge pull request #782 from NanmiCoder/fix/xhs-sign-20251127
feat: xhs sign playwright version
2025-11-27 11:05:24 +08:00
程序员阿江(Relakkes)
f989ce0788 feat: xhs sign playwright version 2025-11-27 10:53:08 +08:00
程序员阿江-Relakkes
15b98fa511 ip proxy expired logic switch
Fix/proxy 20251125
2025-11-26 16:05:01 +08:00
程序员阿江(Relakkes)
f1e7124654 fix: proxy extract error 2025-11-26 16:01:54 +08:00
程序员阿江(Relakkes)
6eef02d08c feat: ip proxy expired check 2025-11-25 12:39:10 +08:00
程序员阿江(Relakkes)
1da347cbf8 docs: update index.md 2025-11-22 09:12:25 +08:00
程序员阿江(Relakkes)
422cc92dd1 docs: update README 2025-11-22 08:20:09 +08:00
程序员阿江(Relakkes)
13d2302c9c docs: update README 2025-11-18 17:56:55 +08:00
程序员阿江(Relakkes)
ff8c92daad chore: add copyright to every file 2025-11-18 12:24:02 +08:00
程序员阿江(Relakkes)
5288bddb42 refactor: weibo search #771 2025-11-17 17:24:47 +08:00