Commit Graph

717 Commits

Author SHA1 Message Date
买定不离手
3a2959d86c feat: 添加SQLite数据库支持核心文件\n\n- 新增 async_sqlite_db.py: SQLite异步数据库操作封装\n- 新增 schema/sqlite_tables.sql: SQLite数据库表结构定义\n- 新增 schema/sqlite_tables.db: SQLite数据库文件 2025-07-14 03:36:06 +08:00
gaoxiaobei
e91ec750bb feat: Enhance Bilibili crawler with retry logic and robustness
This commit introduces several improvements to enhance the stability and functionality of the Bilibili crawler.

- **Add Retry Logic:** Implement a retry mechanism with exponential backoff when fetching video comments. This makes the crawler more resilient to transient network issues or API errors.
- **Improve Error Handling:** Add a `try...except` block to handle potential `JSONDecodeError` in the Bilibili client, preventing crashes when the API returns an invalid response.
- **Ensure Clean Shutdown:** Refactor `main.py` to use a `try...finally` block, guaranteeing that the crawler and database connections are properly closed on exit, error, or `KeyboardInterrupt`.
- **Update Default Config:** Adjust default configuration values to increase concurrency, enable word cloud generation by default, and refine the Bilibili search mode for more practical usage.
2025-07-13 10:42:15 +08:00
gaoxiaobei
d0d7293926 feat(bilibili): Add flexible search modes and fix limit logic
Refactors the Bilibili keyword search functionality to provide more flexible crawling strategies and corrects a flaw in how crawl limits were applied.

Previously, the `ALL_DAY` boolean flag offered a rigid choice for time-based searching and contained a logical issue where `CRAWLER_MAX_NOTES_COUNT` was incorrectly applied on a per-day basis instead of as an overall total.

This commit introduces the `BILI_SEARCH_MODE` configuration option with three distinct modes:
- `normal`: The default search behavior without time constraints.
- `all_in_time_range`: Maximizes data collection within a specified date range, replicating the original intent of `ALL_DAY=True`.
- `daily_limit_in_time_range`: A new mode that strictly enforces both the daily `MAX_NOTES_PER_DAY` and the total `CRAWLER_MAX_NOTES_COUNT` limits across the entire date range.

This change resolves the limit logic bug and gives users more precise control over the crawling process.

Changes include:
- Modified `config/base_config.py` to replace `ALL_DAY` with `BILI_SEARCH_MODE`.
- Refactored `media_platform/bilibili/core.py` to implement the new search mode logic.
2025-07-13 06:07:13 +08:00
gaoxiaobei
e103bfa1f3 Merge branch 'NanmiCoder:main' into main 2025-07-13 05:41:21 +08:00
程序员阿江(Relakkes)
dd8a3f5db8 docs: add a Sponsor 2025-07-12 23:26:30 +08:00
gaoxiaobei
cad9fc7af8 feat: Add daily limit for video/post crawling in Bilibili and base config 2025-07-12 14:50:59 +08:00
程序员阿江(Relakkes)
ec0d29cf0f Merge pull request #642 from cllei12/weibo-search-type
增加选择微博搜索类型的配置
2025-07-07 15:05:29 +08:00
程序员阿江(Relakkes)
0d21a27b6e Merge pull request #641 from cllei12/main
Update playwright version to support Ubuntu 24.04
2025-07-07 15:03:42 +08:00
Lei Cao
355ed183dd 增加选择微博搜索类型的配置 2025-07-05 22:14:31 +00:00
Lei Cao
eb03a4f68d Update playwright version to support Ubuntu 24.04 2025-07-05 21:17:52 +00:00
程序员阿江(Relakkes)
3cb0e2f91f Merge pull request #637 from Mirza-Samad-Ahmed-Baig/fix/bilibili-creator-videos
refactor(bilibili): process creator videos in batches
2025-07-05 00:13:49 +08:00
mirza-samad-ahmed-baig
7edf3bcc15 refactor(bilibili): process creator videos in batches 2025-07-04 21:04:10 +05:00
程序员阿江(Relakkes)
66a68fbb13 docs: update multi language badges size 2025-07-04 14:15:03 +08:00
程序员阿江(Relakkes)
8dcc540797 Merge pull request #635 from Root-FTW/main
🌐 Add multilingual documentation support (English & Spanish)
2025-07-04 13:54:31 +08:00
Root-FTW
4a110abebb feat: Add language navigation links to all README files
- Add prominent language selection section at the top of each README
- Include flag emojis and clear language indicators (🇨🇳 中文, 🇺🇸 English, 🇪🇸 Español)
- Format as horizontal table for easy scanning and navigation
- Show current language with arrow indicator (← Current/当前/Actual)
- Use relative links that work on both GitHub and local repositories
- Improve discoverability of multilingual documentation
- Consistent navigation across all three language versions
2025-07-03 17:14:41 -07:00
Root-FTW
3b7726365c feat: Add localized README files in English and Spanish
- Add README_en.md: Complete English translation of project documentation
- Add README_es.md: Complete Spanish translation of project documentation
- Maintain exact same structure, formatting, and technical accuracy as original
- Preserve all markdown formatting, links, code examples, and legal disclaimers
- Keep original Chinese README.md unchanged
- Support for English and Spanish-speaking developers while maintaining educational focus
2025-07-03 17:09:08 -07:00
程序员阿江(Relakkes)
848df2b491 feat: other platfrom support the cdp mode 2025-07-03 17:13:32 +08:00
程序员阿江(Relakkes)
c892c3324c chore: update uv lock file use aliyun pypi 2025-07-03 16:07:07 +08:00
程序员阿江(Relakkes)
452aafeec8 chore: update uv depends source 2025-07-03 15:27:39 +08:00
程序员阿江(Relakkes)
89359aa259 docs: update README.md 2025-07-02 01:20:44 +08:00
程序员阿江(Relakkes)
0514758cff docs: 更新uv安装文档 2025-06-29 00:07:13 +08:00
程序员阿江(Relakkes)
e83b2422d9 feat: 支持playwright通过cdp协议连接本地chrome浏览器
docs: 增加uv来管理python依赖的文档
2025-06-25 23:22:39 +08:00
程序员阿江(Relakkes)
fbc9788d54 docs: update README.md 2025-06-24 17:44:43 +08:00
Relakkes
71210e3f38 chore: delete unnecessary file 2025-06-20 16:26:02 +08:00
Relakkes
fc45b22963 fix: vitepress path problem 2025-06-20 16:10:44 +08:00
Relakkes
22ec34dca3 docs: add list of sponsors 2025-06-20 16:04:04 +08:00
Relakkes
fd33813f8f feat: add like_count field to bilibi for issue #623 2025-06-20 15:50:38 +08:00
Relakkes
31bcdb191f docs: update README.md 2025-06-16 13:58:09 +08:00
Relakkes
d55d8b1efa feat: Douyin supports obtaining video links and cover images. for issue #620 2025-06-14 23:59:08 +08:00
Relakkes
ed1dc7916a docs: update README.md 2025-06-08 15:56:02 +08:00
程序员阿江(Relakkes)
6323e2d45b Merge pull request #616 from chimeElm/main
修复CRAWLER_MAX_NOTES_COUNT在爬取小红书作者帖子时失效的问题
2025-06-07 14:43:37 +08:00
chimeElm
26a845581e Update client.py
修复CRAWLER_MAX_NOTES_COUNT在爬取小红书作者帖子时失效的问题
2025-06-07 02:41:09 +08:00
Relakkes
23c8f8f87b docs: add english license 2025-06-01 23:20:11 +08:00
Relakkes
1e7b950d3e Revert "chore: remove sponor"
This reverts commit 242c06c345.
2025-05-26 22:35:18 +08:00
Relakkes
242c06c345 chore: remove sponor 2025-05-25 11:54:38 +08:00
程序员阿江(Relakkes)
ff41faeb00 Merge pull request #608 from Bowenwin/bili_expand
Bili_function_expand
2025-05-22 23:14:58 +08:00
Bowenwin
66843f216a finish_all_for_expand_bili 2025-05-22 22:26:30 +08:00
Bowenwin
59619fff0a finish_all 2025-05-22 22:06:06 +08:00
Bowenwin
44e3d370ff fix_words 2025-05-22 20:31:48 +08:00
程序员阿江(Relakkes)
7ed6621933 Merge pull request #603 from Bowenwin/fix_words
Fix words
2025-05-19 23:16:12 +08:00
Bowenwin
703a6e84cb fix_words 2025-05-19 20:07:20 +08:00
Bowenwin
144b8bec6a fix_words 2025-05-19 20:04:00 +08:00
Bowenwin
a356358c21 get_fans_and_get_followings 2025-05-19 19:57:36 +08:00
Relakkes
654260cbce docs: update README.md 2025-05-13 18:42:58 +08:00
Relakkes
79a9824f6a fix: modify dy schema 2025-04-30 16:47:13 +08:00
Relakkes
67d31bf42a fix: dy update fp params 2025-04-30 13:26:22 +08:00
程序员阿江(Relakkes)
2a41b684ad Merge pull request #590 from 2513502304/main
关于 issue #589 的增强方法
2025-04-20 14:14:55 +08:00
翟持江
af5a393a7a Update core.py,删除了其它代码贡献者所添加的try-catch语句,该段try-catch语句将会影响其代码的最终逻辑并令其失效,使其仅能爬取当天一天数据而无法跳转到下一天(原先的逻辑就是try-catch捕获异常从而进入下一天,不要再向该语句中添加捕获异常操作或者finally语句!) 2025-04-19 04:34:24 +08:00
翟持江
b675547aab Update __init__.py,为bilibili的视频信息、up主信息、评论信息添加额外字段 2025-04-19 02:29:22 +08:00
翟持江
ec97001451 Update tables.sql 2025-04-19 02:22:22 +08:00