Commit Graph

256 Commits

Author SHA1 Message Date
ravenling
95c3293b97 fix: 修复zhihu评论爬取分页问题 2026-02-28 15:57:55 +08:00
程序员阿江(Relakkes)
d614ccf247 docs: translate comments and metadata to English
Update Chinese comments, variable descriptions, and metadata across
multiple configuration and core files to English. This improves
codebase accessibility for international developers. Additionally,
removed the sponsorship section from README files.
2026-02-12 05:30:11 +08:00
ouzhuowei
e54463ac78 处理子评论获取失败导致整个流程中断问题
Co-Authored-By: ouzhuowei <190020754@qq.com>
2026-02-10 17:53:30 +08:00
程序员阿江(Relakkes)
c309871485 refactor(xhs): improve login state check logic 2026-02-03 20:49:46 +08:00
程序员阿江(Relakkes)
6625663bde feat: #823 2026-02-03 20:40:15 +08:00
wanzirong
f7d27ab43a feat: 添加命令行参数支持
- 添加 --max_comments_per_post 参数用于控制每个帖子爬取的评论数量
- 添加 --xhs_sort_type 参数用于控制小红书排序方式
- 修复小红书 core.py 中 CRAWLER_MAX_COMMENTS_COUNT_SINGLENOTES 的导入方式
  从直接导入改为通过 config 模块访问,使命令行参数能正确生效
2026-01-21 16:23:47 +08:00
WangXX
1f89713b90 修复拼写错误 2026-01-18 22:22:31 +08:00
程序员阿江(Relakkes)
4de2a325a9 feat: ks comment api upgrade to v2 2026-01-09 21:09:39 +08:00
bear
a59b385615 fix the login status error after scan the QR code 2026-01-09 14:11:47 +08:00
Caelan_Windows
c56b8c4c5d fix(douyin): fetch comments concurrently after each page instead of waiting for all pages
- Moved batch_get_note_comments call inside the pagination loop
- Comments are now fetched immediately after each page of videos is processed
- This allows real-time observation of comment crawling progress
- Improves data availability by not waiting for all video data to be collected first
2026-01-03 01:47:24 +08:00
程序员阿江(Relakkes)
157ddfb21b i18n: translate all Chinese comments, docstrings, and logger messages to English
Comprehensive translation of Chinese text to English across the entire codebase:

- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation

Preserved: Chinese disclaimer header (lines 10-18) for legal compliance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
55d8c7783f feat: webo full context support 2025-12-26 19:22:24 +08:00
程序员阿江(Relakkes)
ff1b681311 fix: weibo get note image fixed 2025-12-26 00:47:20 +08:00
MEI
ff9a1624f1 fix: params参数以及路径问题 2025-12-03 10:31:32 +08:00
程序员阿江(Relakkes)
f989ce0788 feat: xhs sign playwright version 2025-11-27 10:53:08 +08:00
程序员阿江(Relakkes)
6eef02d08c feat: ip proxy expired check 2025-11-25 12:39:10 +08:00
程序员阿江(Relakkes)
ff8c92daad chore: add copyright to every file 2025-11-18 12:24:02 +08:00
程序员阿江(Relakkes)
5288bddb42 refactor: weibo search #771 2025-11-17 17:24:47 +08:00
程序员阿江(Relakkes)
6dcfd7e0a5 refactor: weibo login 2025-11-17 17:11:35 +08:00
程序员阿江(Relakkes)
a1c5e07df8 fix: xhs sub comment bugfix #769 2025-11-17 11:47:33 +08:00
程序员阿江(Relakkes)
b6caa7a85e refactor: add xhs creator params 2025-11-10 21:10:03 +08:00
程序员阿江(Relakkes)
1e3637f238 refactor: update xhs note detail 2025-11-10 18:13:51 +08:00
程序员阿江(Relakkes)
b5dab6d1e8 refactor: 使用 xhshow 替代 playwright 签名方案
感谢 @Cloxl/xhshow 开源项目

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-10 18:12:45 +08:00
程序员阿江(Relakkes)
60cbb3e37d fix: weibo container error #568 2025-11-06 19:43:09 +08:00
程序员阿江-Relakkes
05a1782746 Merge pull request #764 from yangtao210/main
新增存储到mongoDB
2025-11-06 06:10:49 -05:00
yt210
ef6948b305 新增存储到mongoDB 2025-11-06 10:40:30 +08:00
程序员阿江(Relakkes)
0074e975dd fix: dy search 2025-11-04 00:14:16 +08:00
程序员阿江(Relakkes)
3f5925e326 feat: update xhs sign 2025-10-27 19:06:07 +08:00
程序员阿江(Relakkes)
ed6e0bfb5f refactor: tieba 改为浏览器获取数据 2025-10-19 17:09:55 +08:00
程序员阿江(Relakkes)
03e384bbe2 refactor: cdp模式下移除stealth注入 2025-10-19 15:32:03 +08:00
程序员阿江(Relakkes)
ae7955787c feat: kuaishou support url link 2025-10-18 07:40:10 +08:00
程序员阿江(Relakkes)
a9dd08680f feat: xhs support creator url link 2025-10-18 07:20:09 +08:00
程序员阿江(Relakkes)
cae707cb2a feat: douyin support url link 2025-10-18 07:00:21 +08:00
程序员阿江(Relakkes)
906c259cc7 feat: bilibili support url link 2025-10-18 06:30:20 +08:00
程序员阿江(Relakkes)
2cf143cc7c fix: #730 2025-09-26 18:10:30 +08:00
LePao1
3954c40e69 feat(bilibili):增加视频清晰度参数,可以通过BILI_QN更改下载的视频清晰度;
在 BilibiliClient 中添加视频质量配置并改进错误处理,修复下载请求被 302 重定向到 CDN,旧代码未跟随重定向且只接受 “OK” ,导致失败,现在即便是低清晰度/CDN 跳转的链接也能正常下载。
2025-09-24 12:27:16 +08:00
刘小龙
c87df59996 log client modify 2025-09-09 15:27:46 +08:00
程序员阿江(Relakkes)
2bce3593f7 feat: support time deplay for all platform 2025-09-02 16:43:09 +08:00
程序员阿江(Relakkes)
eb799e1fa7 refactor: xhs extractor 2025-09-02 14:50:32 +08:00
未来可欺
6a10d0d11c 原始的HTTPStatusError不能捕获像ConnectError、ReadError这些异常类型,本次提交修改了捕获异常的类型为httpx模块请求异常的基类:HTTPError,以便捕获在httpx.request方法中引发的任何异常(例如ip被封,服务器拒接连接),正确处理爬取媒体被中断时并不会导致爬取文本的中断逻辑 2025-08-06 11:24:51 +08:00
未来可欺
81f2dbe4ab 添加了对媒体资源服务器的异常处理,参见 issue #691 2025-08-05 13:11:00 +08:00
程序员阿江(Relakkes)
b9d30bbabb fix: #693 2025-08-01 15:55:21 +08:00
未来可欺
a6fd9ebdbc 简单更改了抖音保存图片与视频的命名方式,一个视频 id 仅对应一个短视频,返回一个 video_download_url,因此不需要使用数字方式进行命名 2025-07-31 23:11:45 +08:00
未来可欺
0b81240aed 升级 httpx 版本至 0.28.1,并修改关键字参数 proxies 至 proxy 2025-07-31 22:48:02 +08:00
未来可欺
9d90e9fc6d fix issue #689,目前来看,应该是 httpx 库的问题,因为无论是使用同步还是异步版本,构不构造 httpx.***Client 对象来发起请求,返回的响应都是为空,response.content = b'',response.text = ’‘,但换成 requests 库就能正常获取数据了 2025-07-31 22:01:48 +08:00
未来可欺
93a1c27fff 通过测试search模式,修复部分运行时的bug,并对能够爬取媒体的平台设置了较长的超时时间 2025-07-30 21:19:56 +08:00
未来可欺
173bc08a9d 添加了抖音存储视频以及图片的逻辑,并将config.py中ENABLE_GET_IMAGES参数更名为ENABLE_GET_MEIDAS,在此基础上略微修改存储逻辑 2025-07-30 18:24:08 +08:00
korruz
07a6e387ea refactor: move format_proxy_info to utils and update crawler classes to use it 2025-07-29 14:16:24 +08:00
程序员阿江(Relakkes)
fc06c783f5 fix: fixed xhs req headers 2025-07-23 13:28:58 +08:00
程序员阿江(Relakkes)
a4d9aaa34a refactor: xhs update 2025-07-21 21:26:16 +08:00