Commit Graph

25 Commits

Author SHA1 Message Date
程序员阿江(Relakkes)
0c5f281212 fix: 避免复用浏览器时跨域 Cookie 过长导致请求失败
连接已有 Chrome 会把整个浏览器上下文的 cookie 带入平台 client。
除 xhs 外,多数平台仍直接读取全量 cookies,导致请求头过长并放大跨域污染。
本次将各平台的 cookie 读取统一收口到平台域名,并补上基础回归测试。

Constraint: 必须继续复用用户真实浏览器里的平台登录态
Rejected: 仅修复 xhs | 其他平台在连接已有浏览器时仍会携带超长 Cookie
Confidence: high
Scope-risk: moderate
Reversibility: clean
Directive: 后续新增平台或调整 update_cookies 和 create client 流程时,只按平台域名读取 cookies
Tested: uv run pytest test/test_utils.py; python3 -m compileall tools/crawler_util.py media_platform/douyin/core.py media_platform/douyin/client.py media_platform/kuaishou/core.py media_platform/kuaishou/client.py media_platform/bilibili/core.py media_platform/bilibili/client.py media_platform/zhihu/core.py media_platform/zhihu/client.py media_platform/tieba/core.py media_platform/tieba/client.py media_platform/xhs/core.py media_platform/xhs/client.py media_platform/weibo/core.py media_platform/weibo/client.py test/test_utils.py
Not-tested: 各平台在真实 CDP 浏览器连接下的端到端抓取流程
2026-04-21 13:49:37 +08:00
Wei Liu
125e02a4b9 fix: make SSL verification opt-in via config, extend fix to all platforms
- Add DISABLE_SSL_VERIFY = False to base_config.py (default: verification on)
- Add tools/httpx_util.py with make_async_client() factory that reads the config
- Replace all httpx.AsyncClient() call sites across all platforms (bilibili,
  weibo, zhihu, xhs, douyin, kuaishou) and crawler_util with make_async_client()
- Extends SSL fix to previously missed platforms: xhs, douyin, kuaishou

Users running behind an intercepting proxy can set DISABLE_SSL_VERIFY = True
in config/base_config.py. All other users retain certificate verification.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-18 12:31:49 +13:00
程序员阿江(Relakkes)
4de2a325a9 feat: ks comment api upgrade to v2 2026-01-09 21:09:39 +08:00
程序员阿江(Relakkes)
157ddfb21b i18n: translate all Chinese comments, docstrings, and logger messages to English
Comprehensive translation of Chinese text to English across the entire codebase:

- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation

Preserved: Chinese disclaimer header (lines 10-18) for legal compliance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
6eef02d08c feat: ip proxy expired check 2025-11-25 12:39:10 +08:00
程序员阿江(Relakkes)
ff8c92daad chore: add copyright to every file 2025-11-18 12:24:02 +08:00
未来可欺
0b81240aed 升级 httpx 版本至 0.28.1,并修改关键字参数 proxies 至 proxy 2025-07-31 22:48:02 +08:00
Relakkes
061d1c15e2 feat: kuaishou search params update 2025-03-11 23:42:34 +08:00
unknown
7e53c4acfc All_platform_comments_restrict 2024-10-23 16:32:02 +08:00
Relakkes
9fe3e47b0f chore: 增加代码学习声明,严格禁止非法、禁止商业、不当用途 2024-10-20 00:43:25 +08:00
HIRO
1d224999af fix 二级评论爬取bug 2024-06-13 15:57:09 +08:00
HIRO
a001556ba7 快手指定创作者主页和二级评论 2024-06-13 14:49:07 +08:00
Relakkes
87eb8aa6a7 fix: #230 2024-04-13 20:18:04 +08:00
Relakkes
e950e0d6e3 feat: add abstract api client to all platform 2024-03-30 21:27:25 +08:00
Relakkes
e940a41033 refactor: 移除评论中指定数量和过滤特定关键词的逻辑 2024-01-17 23:02:05 +08:00
Relakkes
894dabcf63 refactor: 数据存储重构,分离不同类型的存储实现 2024-01-14 22:06:31 +08:00
Relakkes
aba9f14f50 refactor: 规范日志打印
feat: B站指定视频ID爬取(bvid)
2023-12-23 01:04:08 +08:00
peanutsplash
f17a85305e 添加功能:(哔哩哔哩,快手,小红书)每个视频/帖子抓取评论最大条数限制,评论关键词筛选 2023-12-13 23:53:12 +08:00
Relakkes
a6e877de42 fix: 修复B站搜索Field命名 bug
refactor: ping接口统一更换为pong
2023-12-05 22:54:47 +08:00
Relakkes
62534d7ee2 fix: 移出快手 client 多余的代码 2023-11-26 22:11:06 +08:00
Relakkes
dfb1788141 feat: 快手视频评论爬取done;数据保存到DB、CSV done 2023-11-26 21:43:39 +08:00
Relakkes
bdf36ccb09 feat: 快手关键词搜索存储CSV完成 2023-11-26 01:05:52 +08:00
Relakkes
512192a93e feat: 搜索接口调试完成 2023-11-25 00:02:33 +08:00
Relakkes
f08b2ceb76 feat: 1、命令行支持快手 2、快手playwright 代码 done 2023-11-24 00:04:33 +08:00
Relakkes
95ca606938 feat: 快手文件目录建立 2023-11-23 23:13:54 +08:00