Commit Graph

58 Commits

Author SHA1 Message Date
程序员阿江(Relakkes)
d614ccf247 docs: translate comments and metadata to English
Update Chinese comments, variable descriptions, and metadata across
multiple configuration and core files to English. This improves
codebase accessibility for international developers. Additionally,
removed the sponsorship section from README files.
2026-02-12 05:30:11 +08:00
Caelan_Windows
c56b8c4c5d fix(douyin): fetch comments concurrently after each page instead of waiting for all pages
- Moved batch_get_note_comments call inside the pagination loop
- Comments are now fetched immediately after each page of videos is processed
- This allows real-time observation of comment crawling progress
- Improves data availability by not waiting for all video data to be collected first
2026-01-03 01:47:24 +08:00
程序员阿江(Relakkes)
6eef02d08c feat: ip proxy expired check 2025-11-25 12:39:10 +08:00
程序员阿江(Relakkes)
ff8c92daad chore: add copyright to every file 2025-11-18 12:24:02 +08:00
程序员阿江(Relakkes)
03e384bbe2 refactor: cdp模式下移除stealth注入 2025-10-19 15:32:03 +08:00
程序员阿江(Relakkes)
cae707cb2a feat: douyin support url link 2025-10-18 07:00:21 +08:00
程序员阿江(Relakkes)
2bce3593f7 feat: support time deplay for all platform 2025-09-02 16:43:09 +08:00
未来可欺
81f2dbe4ab 添加了对媒体资源服务器的异常处理,参见 issue #691 2025-08-05 13:11:00 +08:00
未来可欺
a6fd9ebdbc 简单更改了抖音保存图片与视频的命名方式,一个视频 id 仅对应一个短视频,返回一个 video_download_url,因此不需要使用数字方式进行命名 2025-07-31 23:11:45 +08:00
未来可欺
0b81240aed 升级 httpx 版本至 0.28.1,并修改关键字参数 proxies 至 proxy 2025-07-31 22:48:02 +08:00
未来可欺
93a1c27fff 通过测试search模式,修复部分运行时的bug,并对能够爬取媒体的平台设置了较长的超时时间 2025-07-30 21:19:56 +08:00
未来可欺
173bc08a9d 添加了抖音存储视频以及图片的逻辑,并将config.py中ENABLE_GET_IMAGES参数更名为ENABLE_GET_MEIDAS,在此基础上略微修改存储逻辑 2025-07-30 18:24:08 +08:00
korruz
07a6e387ea refactor: move format_proxy_info to utils and update crawler classes to use it 2025-07-29 14:16:24 +08:00
程序员阿江(Relakkes)
13b00f7a36 refactor: config update 2025-07-18 23:26:52 +08:00
gaoxiaobei
b913db64bb refactor(config): move platform-specific configs to separate files
- Remove platform-specific configurations from base_config.py
- Create separate config files for each platform in their respective directories
- Update import statements in core files to use new platform-specific config modules
- Clean up unused and deprecated configuration options
2025-07-18 17:27:37 +08:00
程序员阿江(Relakkes)
e83b2422d9 feat: 支持playwright通过cdp协议连接本地chrome浏览器
docs: 增加uv来管理python依赖的文档
2025-06-25 23:22:39 +08:00
crpa33
2c4af2337e douyin搜索页为空跳下一关键词
预计页数没到,空了也跳
2025-03-27 23:32:21 +08:00
unknown
7e53c4acfc All_platform_comments_restrict 2024-10-23 16:32:02 +08:00
Relakkes
9fe3e47b0f chore: 增加代码学习声明,严格禁止非法、禁止商业、不当用途 2024-10-20 00:43:25 +08:00
Relakkes
c70bd9e071 feat: 增加搜索词来源渠道 2024-08-23 08:29:24 +08:00
Relakkes
04cbe549af fix: 修复抖音关键词搜索bug 2024-08-20 03:09:42 +08:00
Relakkes
f8096e3d58 feat: 抖音abogus参数更新 2024-07-14 03:20:05 +08:00
Relakkes
d3eeccbaac feat: logger record current search page 2024-06-24 22:24:51 +08:00
522109452
6080c22a3d feat: base_config 增加抖音发布时间配置
fix: 抖音排序类型枚举值
fix: 抖音offset计算问题
2024-06-14 14:13:39 +08:00
nelzomal
eace7d1750 improve base config reading command line arg logic 2024-06-09 18:51:36 +08:00
ZuWard
0ba68809a5 抖音二级评论 2024-05-29 06:35:37 +08:00
Relakkes
478db4cc4b feat: 抖音指定创作者done 2024-05-28 01:07:19 +08:00
Relakkes
5681dd6925 fix: #237 2024-04-17 23:32:17 +08:00
Tianci-King
1115b0d90c feat(core): 新增控制爬虫 参数起始页面的页数start_page;perf(argparse): 向命令行解析器添加程序参数起始页面页数和关键字 2024-04-12 00:52:47 +08:00
leantli
68a60faa7f chore: 简化判断方式 2024-04-04 00:11:22 +08:00
leantli
133f978477 fix: 修复爬取视频/帖子最大数设置值较低导致不爬取的问题 2024-04-03 12:18:23 +08:00
Relakkes
e950e0d6e3 feat: add abstract api client to all platform 2024-03-30 21:27:25 +08:00
Relakkes
59cd9f67a0 feat: 支持评论模式是否开启爬取选项 2024-03-16 11:52:42 +08:00
Relakkes
149b6bcdc8 fix: 修复抖音关键词搜索为中文的情况下,有bug 2024-03-03 19:36:36 +08:00
Relakkes
384c8f9f7e fix: issue #140 2024-02-26 23:47:02 +08:00
Relakkes
e940a41033 refactor: 移除评论中指定数量和过滤特定关键词的逻辑 2024-01-17 23:02:05 +08:00
Relakkes
894dabcf63 refactor: 数据存储重构,分离不同类型的存储实现 2024-01-14 22:06:31 +08:00
Relakkes
e31aebbdfb fix: 修复代理Bug 2024-01-13 15:50:02 +08:00
Relakkes
aba9f14f50 refactor: 规范日志打印
feat: B站指定视频ID爬取(bvid)
2023-12-23 01:04:08 +08:00
peanutsplash
f17a85305e 添加功能:(哔哩哔哩,快手,小红书)每个视频/帖子抓取评论最大条数限制,评论关键词筛选 2023-12-13 23:53:12 +08:00
Relakkes
97d7a0c38b feat: Bilibili comment done 2023-12-09 21:10:01 +08:00
Relakkes
1cec23f73d feat: 代理IP功能 Done 2023-12-08 00:10:04 +08:00
Relakkes
a6e877de42 fix: 修复B站搜索Field命名 bug
refactor: ping接口统一更换为pong
2023-12-05 22:54:47 +08:00
peanutsplash
ab1a10bac1 添加功能:抖音每个视频抓取评论最大条数限制,抖音评论关键词筛选 2023-12-05 11:21:47 +08:00
Relakkes
986179b9c9 feat: 增加 IP 代理的最新实现 2023-12-02 16:14:36 +08:00
Relakkes
81bc8b51e2 feat: 抖音支持指定视频列表爬去 2023-11-18 22:07:30 +08:00
Relakkes
700946b28a feat: 小红书增加指定帖子爬取功能
fix: 修复程序一些异常 bug
refactor: 优化部分代码逻辑
2023-11-18 13:38:11 +08:00
Relakkes
9177c38521 feat: 支持数据保存到CSV中 2023-08-16 19:49:41 +08:00
Relakkes
c1a3f06c7a fix: issue #32 2023-08-16 13:58:44 +08:00
Relakkes
4ff2cf8661 refactor: 优化代码 2023-07-29 15:35:40 +08:00