Commit Graph

701 Commits

Author SHA1 Message Date
程序员阿江-Relakkes
dbbc33a0df Merge pull request #674 from persist-1/chore
chore: 增加--help参数中文显示支持;增加"douyin_aweme"表"music_download_url"字段与功能实现
2025-07-25 17:23:03 +08:00
persist-1
19df1734f1 chore: 增加--help参数中文显示支持及douyin_aweme表music_download_url字段\n\n- 为命令行参数增加中文显示支持,提升用户体验\n- 在douyin_aweme表中新增music_download_url字段用于存储视频音乐下载链接\n- 更新相关数据库表结构文件(tables.sql, sqlite_tables.sql)\n- 实现音乐下载URL提取逻辑并集成到数据存储流程 2025-07-24 22:39:53 +08:00
程序员阿江(Relakkes)
fc06c783f5 fix: fixed xhs req headers 2025-07-23 13:28:58 +08:00
程序员阿江(Relakkes)
b41896f4f3 docs: add a sponsor 2025-07-23 13:17:19 +08:00
程序员阿江(Relakkes)
a4d9aaa34a refactor: xhs update 2025-07-21 21:26:16 +08:00
程序员阿江(Relakkes)
26a43358cb chore: update config 2025-07-20 14:34:56 +08:00
程序员阿江(Relakkes)
13b00f7a36 refactor: config update 2025-07-18 23:26:52 +08:00
程序员阿江(Relakkes)
122978b35c Merge pull request #652 from gaoxiaobei/dev
feat(bilibili): Add flexible search modes and fix limit logic
2025-07-18 21:41:20 +08:00
gaoxiaobei
8105b053ed Merge remote-tracking branch 'origin/dev' into devdev 2025-07-18 17:37:29 +08:00
gaoxiaobei
7176956e51 Merge branch 'NanmiCoder:main' into dev 2025-07-18 17:32:04 +08:00
gaoxiaobei
b913db64bb refactor(config): move platform-specific configs to separate files
- Remove platform-specific configurations from base_config.py
- Create separate config files for each platform in their respective directories
- Update import statements in core files to use new platform-specific config modules
- Clean up unused and deprecated configuration options
2025-07-18 17:27:37 +08:00
程序员阿江(Relakkes)
2753e7631e Merge pull request #664 from cfl-chenfangliang/feat-dyCommentBug
feat: 修复抖音二级评论地理位置缺失问题
2025-07-18 15:53:33 +08:00
chenfangliang
aa54dad9a5 feat: 修复抖音二级评论地理位置缺失问题 2025-07-18 10:48:43 +08:00
gaoxiaobei
5daae04c7d fix: conflict resolution errors 2025-07-17 16:54:17 +08:00
gaoxiaobei
1dc8c1789f docs(config): update Bilibili search mode options
- Clarify the three search mode options for Bilibili
- Add note about setting MAX_NOTES_PER_DAY in bilibili config
2025-07-17 07:51:27 +08:00
gaoxiaobei
29b6cee408 Merge pull request #1 from gaoxiaobei/dev
Enhance robustness.
2025-07-17 06:45:55 +08:00
gaoxiaobei
6ced357096 Merge branch 'main' into dev 2025-07-17 06:45:30 +08:00
gaoxiaobei
9fb396c7d1 fix(media_platform): handle edge cases and improve error handling for Bilibili client and crawler
- BilibiliClient:
  - Improve wbi_img_urls handling for better compatibility
  - Add error handling for missing or invalid 'is_end' and 'next' in comment cursor

- BilibiliCrawler:
  - Fix daily limit logic for keyword-based searches
  - Improve logging and break conditions for max notes count limits
  - Ensure proper tracking of total notes crawled for each keyword
2025-07-17 06:40:56 +08:00
gaoxiaobei
fb846e9060 Merge branch 'NanmiCoder:main' into main 2025-07-17 06:39:04 +08:00
程序员阿江(Relakkes)
08c28e6f7b Merge pull request #658 from persist-1/feature/sqlite-support
增加对本地Sqlite数据库的支持(在不便于使用Mysql服务时也能使用数据库进行相关操作)
2025-07-16 20:35:58 +08:00
买定不离手
9457455c18 fix: 修复SQLite数据库初始化问题并重新生成数据库文件
- db.py: 在init_table_schema函数中增加损坏数据库文件检查和清理逻辑,确保SQLite初始化时能创建干净的数据库文件
- schema/sqlite_tables.db: 重新生成完整的SQLite数据库文件,包含所有平台的表结构和索引
2025-07-16 19:48:52 +08:00
程序员阿江(Relakkes)
c5509ab91f fix: update ip proxy valid url for #662 2025-07-16 11:06:58 +08:00
程序员阿江(Relakkes)
c795b1316a fix: import error for #663 2025-07-16 10:58:11 +08:00
程序员阿江(Relakkes)
3184e0a8d9 fix: sign data too long error #659 2025-07-14 18:43:17 +08:00
gaoxiaobei
4d743f6c17 debug & resume default configuration 2025-07-14 08:00:48 +08:00
买定不离手
3365095c62 fix: 完善Bilibili和抖音平台SQLite SQL语句适配
- 更新 store/bilibili/bilibili_store_sql.py 文件,优化Bilibili平台SQLite数据库SQL语句和查询逻辑
- 更新 store/douyin/douyin_store_sql.py 文件,修复抖音平台SQLite数据存储的SQL语句兼容性问题
2025-07-14 03:51:19 +08:00
买定不离手
1298022410 refactor: 更新各平台store模块初始化以支持SQLite
- 更新 store/bilibili/__init__.py 文件,导入SQLite存储实现类和相关模块
- 更新 store/douyin/__init__.py 文件,集成抖音平台的SQLite数据存储接口
- 更新 store/kuaishou/__init__.py 文件,添加快手平台SQLite存储模块的导入声明
- 更新 store/tieba/__init__.py 文件,引入贴吧平台SQLite数据库操作模块
- 更新 store/weibo/__init__.py 文件,整合微博平台SQLite存储功能模块
- 更新 store/xhs/__init__.py 文件,导入小红书平台SQLite数据存储实现
- 更新 store/zhihu/__init__.py 文件,集成知乎平台SQLite数据库存储模块
2025-07-14 03:51:08 +08:00
买定不离手
1673bd5c0c feat: 增强SQLite数据库配置和命令行参数支持
- 更新 cmd_arg/arg.py 文件,添加SQLite数据库选项的命令行参数解析支持
- 更新 config/base_config.py 文件,集成SQLite数据库的基础配置项和默认设置
- 更新 config/db_config.py 文件,扩展数据库配置以支持SQLite连接和参数管理
- 更新 pyproject.toml 文件,添加SQLite相关依赖包的版本管理和项目配置
2025-07-14 03:50:54 +08:00
买定不离手
191dd5998d docs: 更新README文档以支持SQLite数据库存储
- 更新 README.md 文件,在数据保存部分新增SQLite数据库支持说明,强调其轻量级特性和个人使用优势
- 更新 README_en.md 文件,在数据存储部分添加SQLite数据库介绍,提供英文版本的使用指导和示例
- 更新 README_es.md 文件,在数据存储部分增加SQLite数据库说明,提供西班牙语版本的配置和使用方法
2025-07-14 03:50:32 +08:00
买定不离手
c6b96b7e28 chore: 更新项目依赖锁定文件\n\n- 更新 uv.lock: 同步项目依赖版本 2025-07-14 03:37:10 +08:00
买定不离手
36d4a086dd docs: 在默认说明文档中加入SQLite支持说明\n\n- 更新 docs/index.md 文件,增加对SQLite数据库的支持说明\n- 添加SQLite使用示例和配置说明 2025-07-14 03:36:59 +08:00
买定不离手
f5fbbb36ba fix: 修复SQLite数据库初始化和关闭逻辑\n\n- 更新 main.py: 修复数据库初始化条件,支持sqlite选项\n- 更新 db.py: 添加SQLite数据库初始化和关闭支持 2025-07-14 03:36:48 +08:00
买定不离手
6f274d476b feat: 添加各平台SQLite存储实现文件\n\n- 新增 store/bilibili/bilibili_store_impl.py: B站SQLite存储实现\n- 新增 store/douyin/douyin_store_impl.py: 抖音SQLite存储实现\n- 新增 store/kuaishou/kuaishou_store_impl.py: 快手SQLite存储实现\n- 新增 store/tieba/tieba_store_impl.py: 贴吧SQLite存储实现\n- 新增 store/weibo/weibo_store_impl.py: 微博SQLite存储实现\n- 新增 store/xhs/xhs_store_impl.py: 小红书SQLite存储实现\n- 新增 store/zhihu/zhihu_store_impl.py: 知乎SQLite存储实现 2025-07-14 03:36:36 +08:00
买定不离手
fb938f38aa feat: 更新各平台store SQL文件以支持SQLite\n\n- 更新 store/kuaishou/kuaishou_store_sql.py: 快手平台SQLite适配\n- 更新 store/tieba/tieba_store_sql.py: 贴吧平台SQLite适配\n- 更新 store/weibo/weibo_store_sql.py: 微博平台SQLite适配\n- 更新 store/xhs/xhs_store_sql.py: 小红书平台SQLite适配\n- 更新 store/zhihu/zhihu_store_sql.py: 知乎平台SQLite适配 2025-07-14 03:36:20 +08:00
买定不离手
3a2959d86c feat: 添加SQLite数据库支持核心文件\n\n- 新增 async_sqlite_db.py: SQLite异步数据库操作封装\n- 新增 schema/sqlite_tables.sql: SQLite数据库表结构定义\n- 新增 schema/sqlite_tables.db: SQLite数据库文件 2025-07-14 03:36:06 +08:00
gaoxiaobei
e91ec750bb feat: Enhance Bilibili crawler with retry logic and robustness
This commit introduces several improvements to enhance the stability and functionality of the Bilibili crawler.

- **Add Retry Logic:** Implement a retry mechanism with exponential backoff when fetching video comments. This makes the crawler more resilient to transient network issues or API errors.
- **Improve Error Handling:** Add a `try...except` block to handle potential `JSONDecodeError` in the Bilibili client, preventing crashes when the API returns an invalid response.
- **Ensure Clean Shutdown:** Refactor `main.py` to use a `try...finally` block, guaranteeing that the crawler and database connections are properly closed on exit, error, or `KeyboardInterrupt`.
- **Update Default Config:** Adjust default configuration values to increase concurrency, enable word cloud generation by default, and refine the Bilibili search mode for more practical usage.
2025-07-13 10:42:15 +08:00
gaoxiaobei
d0d7293926 feat(bilibili): Add flexible search modes and fix limit logic
Refactors the Bilibili keyword search functionality to provide more flexible crawling strategies and corrects a flaw in how crawl limits were applied.

Previously, the `ALL_DAY` boolean flag offered a rigid choice for time-based searching and contained a logical issue where `CRAWLER_MAX_NOTES_COUNT` was incorrectly applied on a per-day basis instead of as an overall total.

This commit introduces the `BILI_SEARCH_MODE` configuration option with three distinct modes:
- `normal`: The default search behavior without time constraints.
- `all_in_time_range`: Maximizes data collection within a specified date range, replicating the original intent of `ALL_DAY=True`.
- `daily_limit_in_time_range`: A new mode that strictly enforces both the daily `MAX_NOTES_PER_DAY` and the total `CRAWLER_MAX_NOTES_COUNT` limits across the entire date range.

This change resolves the limit logic bug and gives users more precise control over the crawling process.

Changes include:
- Modified `config/base_config.py` to replace `ALL_DAY` with `BILI_SEARCH_MODE`.
- Refactored `media_platform/bilibili/core.py` to implement the new search mode logic.
2025-07-13 06:07:13 +08:00
gaoxiaobei
e103bfa1f3 Merge branch 'NanmiCoder:main' into main 2025-07-13 05:41:21 +08:00
程序员阿江(Relakkes)
dd8a3f5db8 docs: add a Sponsor 2025-07-12 23:26:30 +08:00
gaoxiaobei
cad9fc7af8 feat: Add daily limit for video/post crawling in Bilibili and base config 2025-07-12 14:50:59 +08:00
程序员阿江(Relakkes)
ec0d29cf0f Merge pull request #642 from cllei12/weibo-search-type
增加选择微博搜索类型的配置
2025-07-07 15:05:29 +08:00
程序员阿江(Relakkes)
0d21a27b6e Merge pull request #641 from cllei12/main
Update playwright version to support Ubuntu 24.04
2025-07-07 15:03:42 +08:00
Lei Cao
355ed183dd 增加选择微博搜索类型的配置 2025-07-05 22:14:31 +00:00
Lei Cao
eb03a4f68d Update playwright version to support Ubuntu 24.04 2025-07-05 21:17:52 +00:00
程序员阿江(Relakkes)
3cb0e2f91f Merge pull request #637 from Mirza-Samad-Ahmed-Baig/fix/bilibili-creator-videos
refactor(bilibili): process creator videos in batches
2025-07-05 00:13:49 +08:00
mirza-samad-ahmed-baig
7edf3bcc15 refactor(bilibili): process creator videos in batches 2025-07-04 21:04:10 +05:00
程序员阿江(Relakkes)
66a68fbb13 docs: update multi language badges size 2025-07-04 14:15:03 +08:00
程序员阿江(Relakkes)
8dcc540797 Merge pull request #635 from Root-FTW/main
🌐 Add multilingual documentation support (English & Spanish)
2025-07-04 13:54:31 +08:00
Root-FTW
4a110abebb feat: Add language navigation links to all README files
- Add prominent language selection section at the top of each README
- Include flag emojis and clear language indicators (🇨🇳 中文, 🇺🇸 English, 🇪🇸 Español)
- Format as horizontal table for easy scanning and navigation
- Show current language with arrow indicator (← Current/当前/Actual)
- Use relative links that work on both GitHub and local repositories
- Improve discoverability of multilingual documentation
- Consistent navigation across all three language versions
2025-07-03 17:14:41 -07:00
Root-FTW
3b7726365c feat: Add localized README files in English and Spanish
- Add README_en.md: Complete English translation of project documentation
- Add README_es.md: Complete Spanish translation of project documentation
- Maintain exact same structure, formatting, and technical accuracy as original
- Preserve all markdown formatting, links, code examples, and legal disclaimers
- Keep original Chinese README.md unchanged
- Support for English and Spanish-speaking developers while maintaining educational focus
2025-07-03 17:09:08 -07:00