ravenling
95c3293b97
fix: 修复zhihu评论爬取分页问题
2026-02-28 15:57:55 +08:00
程序员阿江(Relakkes)
d614ccf247
docs: translate comments and metadata to English
...
Update Chinese comments, variable descriptions, and metadata across
multiple configuration and core files to English. This improves
codebase accessibility for international developers. Additionally,
removed the sponsorship section from README files.
2026-02-12 05:30:11 +08:00
ouzhuowei
e54463ac78
处理子评论获取失败导致整个流程中断问题
...
Co-Authored-By: ouzhuowei <190020754@qq.com >
2026-02-10 17:53:30 +08:00
程序员阿江(Relakkes)
c309871485
refactor(xhs): improve login state check logic
2026-02-03 20:49:46 +08:00
程序员阿江(Relakkes)
6625663bde
feat: #823
2026-02-03 20:40:15 +08:00
wanzirong
f7d27ab43a
feat: 添加命令行参数支持
...
- 添加 --max_comments_per_post 参数用于控制每个帖子爬取的评论数量
- 添加 --xhs_sort_type 参数用于控制小红书排序方式
- 修复小红书 core.py 中 CRAWLER_MAX_COMMENTS_COUNT_SINGLENOTES 的导入方式
从直接导入改为通过 config 模块访问,使命令行参数能正确生效
2026-01-21 16:23:47 +08:00
WangXX
1f89713b90
修复拼写错误
2026-01-18 22:22:31 +08:00
程序员阿江(Relakkes)
4de2a325a9
feat: ks comment api upgrade to v2
2026-01-09 21:09:39 +08:00
bear
a59b385615
fix the login status error after scan the QR code
2026-01-09 14:11:47 +08:00
Caelan_Windows
c56b8c4c5d
fix(douyin): fetch comments concurrently after each page instead of waiting for all pages
...
- Moved batch_get_note_comments call inside the pagination loop
- Comments are now fetched immediately after each page of videos is processed
- This allows real-time observation of comment crawling progress
- Improves data availability by not waiting for all video data to be collected first
2026-01-03 01:47:24 +08:00
程序员阿江(Relakkes)
157ddfb21b
i18n: translate all Chinese comments, docstrings, and logger messages to English
...
Comprehensive translation of Chinese text to English across the entire codebase:
- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation
Preserved: Chinese disclaimer header (lines 10-18) for legal compliance
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
55d8c7783f
feat: webo full context support
2025-12-26 19:22:24 +08:00
程序员阿江(Relakkes)
ff1b681311
fix: weibo get note image fixed
2025-12-26 00:47:20 +08:00
MEI
ff9a1624f1
fix: params参数以及路径问题
2025-12-03 10:31:32 +08:00
程序员阿江(Relakkes)
f989ce0788
feat: xhs sign playwright version
2025-11-27 10:53:08 +08:00
程序员阿江(Relakkes)
6eef02d08c
feat: ip proxy expired check
2025-11-25 12:39:10 +08:00
程序员阿江(Relakkes)
ff8c92daad
chore: add copyright to every file
2025-11-18 12:24:02 +08:00
程序员阿江(Relakkes)
5288bddb42
refactor: weibo search #771
2025-11-17 17:24:47 +08:00
程序员阿江(Relakkes)
6dcfd7e0a5
refactor: weibo login
2025-11-17 17:11:35 +08:00
程序员阿江(Relakkes)
a1c5e07df8
fix: xhs sub comment bugfix #769
2025-11-17 11:47:33 +08:00
程序员阿江(Relakkes)
b6caa7a85e
refactor: add xhs creator params
2025-11-10 21:10:03 +08:00
程序员阿江(Relakkes)
1e3637f238
refactor: update xhs note detail
2025-11-10 18:13:51 +08:00
程序员阿江(Relakkes)
b5dab6d1e8
refactor: 使用 xhshow 替代 playwright 签名方案
...
感谢 @Cloxl/xhshow 开源项目
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-10 18:12:45 +08:00
程序员阿江(Relakkes)
60cbb3e37d
fix: weibo container error #568
2025-11-06 19:43:09 +08:00
程序员阿江-Relakkes
05a1782746
Merge pull request #764 from yangtao210/main
...
新增存储到mongoDB
2025-11-06 06:10:49 -05:00
yt210
ef6948b305
新增存储到mongoDB
2025-11-06 10:40:30 +08:00
程序员阿江(Relakkes)
0074e975dd
fix: dy search
2025-11-04 00:14:16 +08:00
程序员阿江(Relakkes)
3f5925e326
feat: update xhs sign
2025-10-27 19:06:07 +08:00
程序员阿江(Relakkes)
ed6e0bfb5f
refactor: tieba 改为浏览器获取数据
2025-10-19 17:09:55 +08:00
程序员阿江(Relakkes)
03e384bbe2
refactor: cdp模式下移除stealth注入
2025-10-19 15:32:03 +08:00
程序员阿江(Relakkes)
ae7955787c
feat: kuaishou support url link
2025-10-18 07:40:10 +08:00
程序员阿江(Relakkes)
a9dd08680f
feat: xhs support creator url link
2025-10-18 07:20:09 +08:00
程序员阿江(Relakkes)
cae707cb2a
feat: douyin support url link
2025-10-18 07:00:21 +08:00
程序员阿江(Relakkes)
906c259cc7
feat: bilibili support url link
2025-10-18 06:30:20 +08:00
程序员阿江(Relakkes)
2cf143cc7c
fix : #730
2025-09-26 18:10:30 +08:00
LePao1
3954c40e69
feat(bilibili):增加视频清晰度参数,可以通过BILI_QN更改下载的视频清晰度;
...
在 BilibiliClient 中添加视频质量配置并改进错误处理,修复下载请求被 302 重定向到 CDN,旧代码未跟随重定向且只接受 “OK” ,导致失败,现在即便是低清晰度/CDN 跳转的链接也能正常下载。
2025-09-24 12:27:16 +08:00
刘小龙
c87df59996
log client modify
2025-09-09 15:27:46 +08:00
程序员阿江(Relakkes)
2bce3593f7
feat: support time deplay for all platform
2025-09-02 16:43:09 +08:00
程序员阿江(Relakkes)
eb799e1fa7
refactor: xhs extractor
2025-09-02 14:50:32 +08:00
未来可欺
6a10d0d11c
原始的HTTPStatusError不能捕获像ConnectError、ReadError这些异常类型,本次提交修改了捕获异常的类型为httpx模块请求异常的基类:HTTPError,以便捕获在httpx.request方法中引发的任何异常(例如ip被封,服务器拒接连接),正确处理爬取媒体被中断时并不会导致爬取文本的中断逻辑
2025-08-06 11:24:51 +08:00
未来可欺
81f2dbe4ab
添加了对媒体资源服务器的异常处理,参见 issue #691
2025-08-05 13:11:00 +08:00
程序员阿江(Relakkes)
b9d30bbabb
fix : #693
2025-08-01 15:55:21 +08:00
未来可欺
a6fd9ebdbc
简单更改了抖音保存图片与视频的命名方式,一个视频 id 仅对应一个短视频,返回一个 video_download_url,因此不需要使用数字方式进行命名
2025-07-31 23:11:45 +08:00
未来可欺
0b81240aed
升级 httpx 版本至 0.28.1,并修改关键字参数 proxies 至 proxy
2025-07-31 22:48:02 +08:00
未来可欺
9d90e9fc6d
fix issue #689,目前来看,应该是 httpx 库的问题,因为无论是使用同步还是异步版本,构不构造 httpx.***Client 对象来发起请求,返回的响应都是为空,response.content = b'',response.text = ’‘,但换成 requests 库就能正常获取数据了
2025-07-31 22:01:48 +08:00
未来可欺
93a1c27fff
通过测试search模式,修复部分运行时的bug,并对能够爬取媒体的平台设置了较长的超时时间
2025-07-30 21:19:56 +08:00
未来可欺
173bc08a9d
添加了抖音存储视频以及图片的逻辑,并将config.py中ENABLE_GET_IMAGES参数更名为ENABLE_GET_MEIDAS,在此基础上略微修改存储逻辑
2025-07-30 18:24:08 +08:00
korruz
07a6e387ea
refactor: move format_proxy_info to utils and update crawler classes to use it
2025-07-29 14:16:24 +08:00
程序员阿江(Relakkes)
fc06c783f5
fix: fixed xhs req headers
2025-07-23 13:28:58 +08:00
程序员阿江(Relakkes)
a4d9aaa34a
refactor: xhs update
2025-07-21 21:26:16 +08:00