Commit Graph

30 Commits

Author SHA1 Message Date
程序员阿江(Relakkes)
157ddfb21b i18n: translate all Chinese comments, docstrings, and logger messages to English
Comprehensive translation of Chinese text to English across the entire codebase:

- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation

Preserved: Chinese disclaimer header (lines 10-18) for legal compliance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
6eef02d08c feat: ip proxy expired check 2025-11-25 12:39:10 +08:00
程序员阿江(Relakkes)
ff8c92daad chore: add copyright to every file 2025-11-18 12:24:02 +08:00
LePao1
3954c40e69 feat(bilibili):增加视频清晰度参数,可以通过BILI_QN更改下载的视频清晰度;
在 BilibiliClient 中添加视频质量配置并改进错误处理,修复下载请求被 302 重定向到 CDN,旧代码未跟随重定向且只接受 “OK” ,导致失败,现在即便是低清晰度/CDN 跳转的链接也能正常下载。
2025-09-24 12:27:16 +08:00
未来可欺
6a10d0d11c 原始的HTTPStatusError不能捕获像ConnectError、ReadError这些异常类型,本次提交修改了捕获异常的类型为httpx模块请求异常的基类:HTTPError,以便捕获在httpx.request方法中引发的任何异常(例如ip被封,服务器拒接连接),正确处理爬取媒体被中断时并不会导致爬取文本的中断逻辑 2025-08-06 11:24:51 +08:00
未来可欺
81f2dbe4ab 添加了对媒体资源服务器的异常处理,参见 issue #691 2025-08-05 13:11:00 +08:00
未来可欺
0b81240aed 升级 httpx 版本至 0.28.1,并修改关键字参数 proxies 至 proxy 2025-07-31 22:48:02 +08:00
未来可欺
93a1c27fff 通过测试search模式,修复部分运行时的bug,并对能够爬取媒体的平台设置了较长的超时时间 2025-07-30 21:19:56 +08:00
gaoxiaobei
9fb396c7d1 fix(media_platform): handle edge cases and improve error handling for Bilibili client and crawler
- BilibiliClient:
  - Improve wbi_img_urls handling for better compatibility
  - Add error handling for missing or invalid 'is_end' and 'next' in comment cursor

- BilibiliCrawler:
  - Fix daily limit logic for keyword-based searches
  - Improve logging and break conditions for max notes count limits
  - Ensure proper tracking of total notes crawled for each keyword
2025-07-17 06:40:56 +08:00
gaoxiaobei
e91ec750bb feat: Enhance Bilibili crawler with retry logic and robustness
This commit introduces several improvements to enhance the stability and functionality of the Bilibili crawler.

- **Add Retry Logic:** Implement a retry mechanism with exponential backoff when fetching video comments. This makes the crawler more resilient to transient network issues or API errors.
- **Improve Error Handling:** Add a `try...except` block to handle potential `JSONDecodeError` in the Bilibili client, preventing crashes when the API returns an invalid response.
- **Ensure Clean Shutdown:** Refactor `main.py` to use a `try...finally` block, guaranteeing that the crawler and database connections are properly closed on exit, error, or `KeyboardInterrupt`.
- **Update Default Config:** Adjust default configuration values to increase concurrency, enable word cloud generation by default, and refine the Bilibili search mode for more practical usage.
2025-07-13 10:42:15 +08:00
Bowenwin
66843f216a finish_all_for_expand_bili 2025-05-22 22:26:30 +08:00
Bowenwin
44e3d370ff fix_words 2025-05-22 20:31:48 +08:00
Bowenwin
a356358c21 get_fans_and_get_followings 2025-05-19 19:57:36 +08:00
翟持江
d2ecd3b11d Update client.py,将search_video_by_keywordpost_data错误的请求参数进行更新
`pubtime_begin`更改为`pubtime_begin_s`,`pubtime_end`更改为`pubtime_end_s`。已测试
2025-01-15 18:21:03 +08:00
unknown
7e53c4acfc All_platform_comments_restrict 2024-10-23 16:32:02 +08:00
Relakkes
9fe3e47b0f chore: 增加代码学习声明,严格禁止非法、禁止商业、不当用途 2024-10-20 00:43:25 +08:00
Relakkes
aa0f920369 feat: B站搜索接口增加发布日期筛选 2024-10-17 15:11:25 +08:00
helloteemo
d686d17f9b feat: 支持bilibili视频下载 2024-07-15 19:40:17 +08:00
nelzomal
111e08602c feat: support bilibili creator 2024-06-12 16:48:19 +08:00
Nan Zhou
0cad36e17b support bilibili level two comment 2024-05-26 14:10:57 +08:00
Relakkes
87eb8aa6a7 fix: #230 2024-04-13 20:18:04 +08:00
Relakkes
e950e0d6e3 feat: add abstract api client to all platform 2024-03-30 21:27:25 +08:00
Relakkes
aba9f14f50 refactor: 规范日志打印
feat: B站指定视频ID爬取(bvid)
2023-12-23 01:04:08 +08:00
Relakkes
97d7a0c38b feat: Bilibili comment done 2023-12-09 21:10:01 +08:00
Relakkes
c530bd4219 feat: 代理IP缓存到redis中 2023-12-06 23:49:56 +08:00
Relakkes
f71d086464 fix: B站get_wbi_keys函数类型标注问题 2023-12-05 23:32:35 +08:00
Relakkes
a6e877de42 fix: 修复B站搜索Field命名 bug
refactor: ping接口统一更换为pong
2023-12-05 22:54:47 +08:00
Relakkes
8f04943105 feat: B站评论API 2023-12-04 23:16:02 +08:00
Relakkes
a90b411e68 feat: B站爬虫搜索关键词实现 2023-12-03 23:19:02 +08:00
Relakkes
5aeee93fc5 feat: B站爬虫签名实现 2023-12-03 00:30:10 +08:00