wanzirong
f7d27ab43a
feat: 添加命令行参数支持
...
- 添加 --max_comments_per_post 参数用于控制每个帖子爬取的评论数量
- 添加 --xhs_sort_type 参数用于控制小红书排序方式
- 修复小红书 core.py 中 CRAWLER_MAX_COMMENTS_COUNT_SINGLENOTES 的导入方式
从直接导入改为通过 config 模块访问,使命令行参数能正确生效
2026-01-21 16:23:47 +08:00
WangXX
1f89713b90
修复拼写错误
2026-01-18 22:22:31 +08:00
程序员阿江(Relakkes)
4de2a325a9
feat: ks comment api upgrade to v2
2026-01-09 21:09:39 +08:00
bear
a59b385615
fix the login status error after scan the QR code
2026-01-09 14:11:47 +08:00
Caelan_Windows
c56b8c4c5d
fix(douyin): fetch comments concurrently after each page instead of waiting for all pages
...
- Moved batch_get_note_comments call inside the pagination loop
- Comments are now fetched immediately after each page of videos is processed
- This allows real-time observation of comment crawling progress
- Improves data availability by not waiting for all video data to be collected first
2026-01-03 01:47:24 +08:00
程序员阿江(Relakkes)
157ddfb21b
i18n: translate all Chinese comments, docstrings, and logger messages to English
...
Comprehensive translation of Chinese text to English across the entire codebase:
- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation
Preserved: Chinese disclaimer header (lines 10-18) for legal compliance
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
55d8c7783f
feat: webo full context support
2025-12-26 19:22:24 +08:00
程序员阿江(Relakkes)
ff1b681311
fix: weibo get note image fixed
2025-12-26 00:47:20 +08:00
MEI
ff9a1624f1
fix: params参数以及路径问题
2025-12-03 10:31:32 +08:00
程序员阿江(Relakkes)
f989ce0788
feat: xhs sign playwright version
2025-11-27 10:53:08 +08:00
程序员阿江(Relakkes)
6eef02d08c
feat: ip proxy expired check
2025-11-25 12:39:10 +08:00
程序员阿江(Relakkes)
ff8c92daad
chore: add copyright to every file
2025-11-18 12:24:02 +08:00
程序员阿江(Relakkes)
5288bddb42
refactor: weibo search #771
2025-11-17 17:24:47 +08:00
程序员阿江(Relakkes)
6dcfd7e0a5
refactor: weibo login
2025-11-17 17:11:35 +08:00
程序员阿江(Relakkes)
a1c5e07df8
fix: xhs sub comment bugfix #769
2025-11-17 11:47:33 +08:00
程序员阿江(Relakkes)
b6caa7a85e
refactor: add xhs creator params
2025-11-10 21:10:03 +08:00
程序员阿江(Relakkes)
1e3637f238
refactor: update xhs note detail
2025-11-10 18:13:51 +08:00
程序员阿江(Relakkes)
b5dab6d1e8
refactor: 使用 xhshow 替代 playwright 签名方案
...
感谢 @Cloxl/xhshow 开源项目
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude <noreply@anthropic.com >
2025-11-10 18:12:45 +08:00
程序员阿江(Relakkes)
60cbb3e37d
fix: weibo container error #568
2025-11-06 19:43:09 +08:00
程序员阿江-Relakkes
05a1782746
Merge pull request #764 from yangtao210/main
...
新增存储到mongoDB
2025-11-06 06:10:49 -05:00
yt210
ef6948b305
新增存储到mongoDB
2025-11-06 10:40:30 +08:00
程序员阿江(Relakkes)
0074e975dd
fix: dy search
2025-11-04 00:14:16 +08:00
程序员阿江(Relakkes)
3f5925e326
feat: update xhs sign
2025-10-27 19:06:07 +08:00
程序员阿江(Relakkes)
ed6e0bfb5f
refactor: tieba 改为浏览器获取数据
2025-10-19 17:09:55 +08:00
程序员阿江(Relakkes)
03e384bbe2
refactor: cdp模式下移除stealth注入
2025-10-19 15:32:03 +08:00
程序员阿江(Relakkes)
ae7955787c
feat: kuaishou support url link
2025-10-18 07:40:10 +08:00
程序员阿江(Relakkes)
a9dd08680f
feat: xhs support creator url link
2025-10-18 07:20:09 +08:00
程序员阿江(Relakkes)
cae707cb2a
feat: douyin support url link
2025-10-18 07:00:21 +08:00
程序员阿江(Relakkes)
906c259cc7
feat: bilibili support url link
2025-10-18 06:30:20 +08:00
程序员阿江(Relakkes)
2cf143cc7c
fix : #730
2025-09-26 18:10:30 +08:00
LePao1
3954c40e69
feat(bilibili):增加视频清晰度参数,可以通过BILI_QN更改下载的视频清晰度;
...
在 BilibiliClient 中添加视频质量配置并改进错误处理,修复下载请求被 302 重定向到 CDN,旧代码未跟随重定向且只接受 “OK” ,导致失败,现在即便是低清晰度/CDN 跳转的链接也能正常下载。
2025-09-24 12:27:16 +08:00
刘小龙
c87df59996
log client modify
2025-09-09 15:27:46 +08:00
程序员阿江(Relakkes)
2bce3593f7
feat: support time deplay for all platform
2025-09-02 16:43:09 +08:00
程序员阿江(Relakkes)
eb799e1fa7
refactor: xhs extractor
2025-09-02 14:50:32 +08:00
未来可欺
6a10d0d11c
原始的HTTPStatusError不能捕获像ConnectError、ReadError这些异常类型,本次提交修改了捕获异常的类型为httpx模块请求异常的基类:HTTPError,以便捕获在httpx.request方法中引发的任何异常(例如ip被封,服务器拒接连接),正确处理爬取媒体被中断时并不会导致爬取文本的中断逻辑
2025-08-06 11:24:51 +08:00
未来可欺
81f2dbe4ab
添加了对媒体资源服务器的异常处理,参见 issue #691
2025-08-05 13:11:00 +08:00
程序员阿江(Relakkes)
b9d30bbabb
fix : #693
2025-08-01 15:55:21 +08:00
未来可欺
a6fd9ebdbc
简单更改了抖音保存图片与视频的命名方式,一个视频 id 仅对应一个短视频,返回一个 video_download_url,因此不需要使用数字方式进行命名
2025-07-31 23:11:45 +08:00
未来可欺
0b81240aed
升级 httpx 版本至 0.28.1,并修改关键字参数 proxies 至 proxy
2025-07-31 22:48:02 +08:00
未来可欺
9d90e9fc6d
fix issue #689,目前来看,应该是 httpx 库的问题,因为无论是使用同步还是异步版本,构不构造 httpx.***Client 对象来发起请求,返回的响应都是为空,response.content = b'',response.text = ’‘,但换成 requests 库就能正常获取数据了
2025-07-31 22:01:48 +08:00
未来可欺
93a1c27fff
通过测试search模式,修复部分运行时的bug,并对能够爬取媒体的平台设置了较长的超时时间
2025-07-30 21:19:56 +08:00
未来可欺
173bc08a9d
添加了抖音存储视频以及图片的逻辑,并将config.py中ENABLE_GET_IMAGES参数更名为ENABLE_GET_MEIDAS,在此基础上略微修改存储逻辑
2025-07-30 18:24:08 +08:00
korruz
07a6e387ea
refactor: move format_proxy_info to utils and update crawler classes to use it
2025-07-29 14:16:24 +08:00
程序员阿江(Relakkes)
fc06c783f5
fix: fixed xhs req headers
2025-07-23 13:28:58 +08:00
程序员阿江(Relakkes)
a4d9aaa34a
refactor: xhs update
2025-07-21 21:26:16 +08:00
程序员阿江(Relakkes)
13b00f7a36
refactor: config update
2025-07-18 23:26:52 +08:00
gaoxiaobei
8105b053ed
Merge remote-tracking branch 'origin/dev' into devdev
2025-07-18 17:37:29 +08:00
gaoxiaobei
7176956e51
Merge branch 'NanmiCoder:main' into dev
2025-07-18 17:32:04 +08:00
gaoxiaobei
b913db64bb
refactor(config): move platform-specific configs to separate files
...
- Remove platform-specific configurations from base_config.py
- Create separate config files for each platform in their respective directories
- Update import statements in core files to use new platform-specific config modules
- Clean up unused and deprecated configuration options
2025-07-18 17:27:37 +08:00
chenfangliang
aa54dad9a5
feat: 修复抖音二级评论地理位置缺失问题
2025-07-18 10:48:43 +08:00