Commit Graph

83 Commits

Author SHA1 Message Date
WangXX
1f89713b90 修复拼写错误 2026-01-18 22:22:31 +08:00
Caelan_Windows
c56b8c4c5d fix(douyin): fetch comments concurrently after each page instead of waiting for all pages
- Moved batch_get_note_comments call inside the pagination loop
- Comments are now fetched immediately after each page of videos is processed
- This allows real-time observation of comment crawling progress
- Improves data availability by not waiting for all video data to be collected first
2026-01-03 01:47:24 +08:00
程序员阿江(Relakkes)
157ddfb21b i18n: translate all Chinese comments, docstrings, and logger messages to English
Comprehensive translation of Chinese text to English across the entire codebase:

- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation

Preserved: Chinese disclaimer header (lines 10-18) for legal compliance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
6eef02d08c feat: ip proxy expired check 2025-11-25 12:39:10 +08:00
程序员阿江(Relakkes)
ff8c92daad chore: add copyright to every file 2025-11-18 12:24:02 +08:00
程序员阿江(Relakkes)
0074e975dd fix: dy search 2025-11-04 00:14:16 +08:00
程序员阿江(Relakkes)
03e384bbe2 refactor: cdp模式下移除stealth注入 2025-10-19 15:32:03 +08:00
程序员阿江(Relakkes)
cae707cb2a feat: douyin support url link 2025-10-18 07:00:21 +08:00
程序员阿江(Relakkes)
2bce3593f7 feat: support time deplay for all platform 2025-09-02 16:43:09 +08:00
未来可欺
6a10d0d11c 原始的HTTPStatusError不能捕获像ConnectError、ReadError这些异常类型,本次提交修改了捕获异常的类型为httpx模块请求异常的基类:HTTPError,以便捕获在httpx.request方法中引发的任何异常(例如ip被封,服务器拒接连接),正确处理爬取媒体被中断时并不会导致爬取文本的中断逻辑 2025-08-06 11:24:51 +08:00
未来可欺
81f2dbe4ab 添加了对媒体资源服务器的异常处理,参见 issue #691 2025-08-05 13:11:00 +08:00
未来可欺
a6fd9ebdbc 简单更改了抖音保存图片与视频的命名方式,一个视频 id 仅对应一个短视频,返回一个 video_download_url,因此不需要使用数字方式进行命名 2025-07-31 23:11:45 +08:00
未来可欺
0b81240aed 升级 httpx 版本至 0.28.1,并修改关键字参数 proxies 至 proxy 2025-07-31 22:48:02 +08:00
未来可欺
9d90e9fc6d fix issue #689,目前来看,应该是 httpx 库的问题,因为无论是使用同步还是异步版本,构不构造 httpx.***Client 对象来发起请求,返回的响应都是为空,response.content = b'',response.text = ’‘,但换成 requests 库就能正常获取数据了 2025-07-31 22:01:48 +08:00
未来可欺
93a1c27fff 通过测试search模式,修复部分运行时的bug,并对能够爬取媒体的平台设置了较长的超时时间 2025-07-30 21:19:56 +08:00
未来可欺
173bc08a9d 添加了抖音存储视频以及图片的逻辑,并将config.py中ENABLE_GET_IMAGES参数更名为ENABLE_GET_MEIDAS,在此基础上略微修改存储逻辑 2025-07-30 18:24:08 +08:00
korruz
07a6e387ea refactor: move format_proxy_info to utils and update crawler classes to use it 2025-07-29 14:16:24 +08:00
程序员阿江(Relakkes)
13b00f7a36 refactor: config update 2025-07-18 23:26:52 +08:00
gaoxiaobei
8105b053ed Merge remote-tracking branch 'origin/dev' into devdev 2025-07-18 17:37:29 +08:00
gaoxiaobei
b913db64bb refactor(config): move platform-specific configs to separate files
- Remove platform-specific configurations from base_config.py
- Create separate config files for each platform in their respective directories
- Update import statements in core files to use new platform-specific config modules
- Clean up unused and deprecated configuration options
2025-07-18 17:27:37 +08:00
chenfangliang
aa54dad9a5 feat: 修复抖音二级评论地理位置缺失问题 2025-07-18 10:48:43 +08:00
程序员阿江(Relakkes)
e83b2422d9 feat: 支持playwright通过cdp协议连接本地chrome浏览器
docs: 增加uv来管理python依赖的文档
2025-06-25 23:22:39 +08:00
Relakkes
67d31bf42a fix: dy update fp params 2025-04-30 13:26:22 +08:00
Relakkes
660fd18a95 fix: dy login fix 2025-04-08 20:58:04 +08:00
crpa33
2c4af2337e douyin搜索页为空跳下一关键词
预计页数没到,空了也跳
2025-03-27 23:32:21 +08:00
Relakkes
ef4eba121c fix: 兼容windows编码问题 2024-10-28 15:26:49 +08:00
unknown
7e53c4acfc All_platform_comments_restrict 2024-10-23 16:32:02 +08:00
Relakkes
9fe3e47b0f chore: 增加代码学习声明,严格禁止非法、禁止商业、不当用途 2024-10-20 00:43:25 +08:00
Relakkes
7b5b099636 feat: update douyin abogus params 2024-09-27 14:58:10 +08:00
Relakkes
c70bd9e071 feat: 增加搜索词来源渠道 2024-08-23 08:29:24 +08:00
Relakkes
04cbe549af fix: 修复抖音关键词搜索bug 2024-08-20 03:09:42 +08:00
Relakkes
548271e537 fix: 修复抖音中文搜索关键二次编码问题 2024-07-16 01:33:58 +08:00
Relakkes
f8096e3d58 feat: 抖音abogus参数更新 2024-07-14 03:20:05 +08:00
Relakkes
d3eeccbaac feat: logger record current search page 2024-06-24 22:24:51 +08:00
Relakkes Yang
a0e5a29af8 fix: weibo bug 2024-06-17 00:25:48 +08:00
522109452
6080c22a3d feat: base_config 增加抖音发布时间配置
fix: 抖音排序类型枚举值
fix: 抖音offset计算问题
2024-06-14 14:13:39 +08:00
xueyueben
576c8e8d9f fix: 修复抖音筛选发布时间和排序失效问题 2024-06-13 11:46:25 +08:00
nelzomal
eace7d1750 improve base config reading command line arg logic 2024-06-09 18:51:36 +08:00
程序员阿江-Relakkes
c8dbc0bf3d Merge pull request #278 from ZuWard/main
抖音二级评论
2024-06-07 13:04:18 +08:00
Relakkes
4bba1447f8 feat: cache impl done 2024-06-02 19:57:13 +08:00
ZuWard
0ba68809a5 抖音二级评论 2024-05-29 06:35:37 +08:00
Relakkes
478db4cc4b feat: 抖音指定创作者done 2024-05-28 01:07:19 +08:00
Relakkes
df1e4a7b02 refactor: 抖音登录态检测不在抛出警告,可能会误导使用者 2024-05-27 22:44:35 +08:00
Relakkes
764bafc626 feat: 抖音登录态检测逻辑更新支持 2024-05-23 22:15:14 +08:00
Relakkes
e64df93edd feat: 由于xhs和dy现在检测playwright二维码登录了,大概率会出现滑块或者手机验证,增加登录态检测时间为5min,预留足够的时间手动过验证码。 2024-05-15 23:23:30 +08:00
Relakkes
5681dd6925 fix: #237 2024-04-17 23:32:17 +08:00
Relakkes
87eb8aa6a7 fix: #230 2024-04-13 20:18:04 +08:00
Tianci-King
1115b0d90c feat(core): 新增控制爬虫 参数起始页面的页数start_page;perf(argparse): 向命令行解析器添加程序参数起始页面页数和关键字 2024-04-12 00:52:47 +08:00
chunpat
6422500e32 Remove duplication Qrcode Show 2024-04-05 21:24:06 +08:00
leantli
68a60faa7f chore: 简化判断方式 2024-04-04 00:11:22 +08:00