MediaCrawler

mirror of https://github.com/NanmiCoder/MediaCrawler.git synced 2026-02-26 18:20:47 +08:00

Author	SHA1	Message	Date
程序员阿江(Relakkes)	157ddfb21b	i18n: translate all Chinese comments, docstrings, and logger messages to English Comprehensive translation of Chinese text to English across the entire codebase: - api/: FastAPI server documentation and logger messages - cache/: Cache abstraction layer comments and docstrings - database/: Database models and MongoDB store documentation - media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu) - model/: Data model documentation - proxy/: Proxy pool and provider documentation - store/: Data storage layer comments - tools/: Utility functions and browser automation - test/: Test file documentation Preserved: Chinese disclaimer header (lines 10-18) for legal compliance 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)	6eef02d08c	feat: ip proxy expired check	2025-11-25 12:39:10 +08:00
程序员阿江(Relakkes)	ff8c92daad	chore: add copyright to every file	2025-11-18 12:24:02 +08:00
LePao1	3954c40e69	feat(bilibili)：增加视频清晰度参数，可以通过`BILI_QN`更改下载的视频清晰度；在 BilibiliClient 中添加视频质量配置并改进错误处理，修复下载请求被 302 重定向到 CDN，旧代码未跟随重定向且只接受 “OK” ，导致失败，现在即便是低清晰度/CDN 跳转的链接也能正常下载。	2025-09-24 12:27:16 +08:00
未来可欺	6a10d0d11c	原始的HTTPStatusError不能捕获像ConnectError、ReadError这些异常类型，本次提交修改了捕获异常的类型为httpx模块请求异常的基类：HTTPError，以便捕获在httpx.request方法中引发的任何异常（例如ip被封，服务器拒接连接），正确处理爬取媒体被中断时并不会导致爬取文本的中断逻辑	2025-08-06 11:24:51 +08:00
未来可欺	81f2dbe4ab	添加了对媒体资源服务器的异常处理，参见 issue #691	2025-08-05 13:11:00 +08:00
未来可欺	0b81240aed	升级 httpx 版本至 0.28.1，并修改关键字参数 proxies 至 proxy	2025-07-31 22:48:02 +08:00
未来可欺	93a1c27fff	通过测试search模式，修复部分运行时的bug，并对能够爬取媒体的平台设置了较长的超时时间	2025-07-30 21:19:56 +08:00
gaoxiaobei	9fb396c7d1	fix(media_platform): handle edge cases and improve error handling for Bilibili client and crawler - BilibiliClient: - Improve wbi_img_urls handling for better compatibility - Add error handling for missing or invalid 'is_end' and 'next' in comment cursor - BilibiliCrawler: - Fix daily limit logic for keyword-based searches - Improve logging and break conditions for max notes count limits - Ensure proper tracking of total notes crawled for each keyword	2025-07-17 06:40:56 +08:00
gaoxiaobei	e91ec750bb	feat: Enhance Bilibili crawler with retry logic and robustness This commit introduces several improvements to enhance the stability and functionality of the Bilibili crawler. - Add Retry Logic: Implement a retry mechanism with exponential backoff when fetching video comments. This makes the crawler more resilient to transient network issues or API errors. - Improve Error Handling: Add a `try...except` block to handle potential `JSONDecodeError` in the Bilibili client, preventing crashes when the API returns an invalid response. - Ensure Clean Shutdown: Refactor `main.py` to use a `try...finally` block, guaranteeing that the crawler and database connections are properly closed on exit, error, or `KeyboardInterrupt`. - Update Default Config: Adjust default configuration values to increase concurrency, enable word cloud generation by default, and refine the Bilibili search mode for more practical usage.	2025-07-13 10:42:15 +08:00
Bowenwin	66843f216a	finish_all_for_expand_bili	2025-05-22 22:26:30 +08:00
Bowenwin	44e3d370ff	fix_words	2025-05-22 20:31:48 +08:00
Bowenwin	a356358c21	get_fans_and_get_followings	2025-05-19 19:57:36 +08:00
翟持江	d2ecd3b11d	Update client.py，将`search_video_by_keyword`中`post_data`错误的请求参数进行更新 `pubtime_begin`更改为`pubtime_begin_s`，`pubtime_end`更改为`pubtime_end_s`。已测试	2025-01-15 18:21:03 +08:00
unknown	7e53c4acfc	All_platform_comments_restrict	2024-10-23 16:32:02 +08:00
Relakkes	9fe3e47b0f	chore: 增加代码学习声明，严格禁止非法、禁止商业、不当用途	2024-10-20 00:43:25 +08:00
Relakkes	aa0f920369	feat: B站搜索接口增加发布日期筛选	2024-10-17 15:11:25 +08:00
helloteemo	d686d17f9b	feat: 支持bilibili视频下载	2024-07-15 19:40:17 +08:00
nelzomal	111e08602c	feat: support bilibili creator	2024-06-12 16:48:19 +08:00
Nan Zhou	0cad36e17b	support bilibili level two comment	2024-05-26 14:10:57 +08:00
Relakkes	87eb8aa6a7	fix: #230	2024-04-13 20:18:04 +08:00
Relakkes	e950e0d6e3	feat: add abstract api client to all platform	2024-03-30 21:27:25 +08:00
Relakkes	aba9f14f50	refactor: 规范日志打印 feat: B站指定视频ID爬取（bvid）	2023-12-23 01:04:08 +08:00
Relakkes	97d7a0c38b	feat: Bilibili comment done	2023-12-09 21:10:01 +08:00
Relakkes	c530bd4219	feat: 代理IP缓存到redis中	2023-12-06 23:49:56 +08:00
Relakkes	f71d086464	fix: B站get_wbi_keys函数类型标注问题	2023-12-05 23:32:35 +08:00
Relakkes	a6e877de42	fix: 修复B站搜索Field命名 bug refactor: ping接口统一更换为pong	2023-12-05 22:54:47 +08:00
Relakkes	8f04943105	feat: B站评论API	2023-12-04 23:16:02 +08:00
Relakkes	a90b411e68	feat: B站爬虫搜索关键词实现	2023-12-03 23:19:02 +08:00
Relakkes	5aeee93fc5	feat: B站爬虫签名实现	2023-12-03 00:30:10 +08:00

30 Commits