ouzhuowei
7484156f02
新增数据保存路径,默认不指定则保存到data文件夹下
...
Co-Authored-By: ouzhuowei <190020754@qq.com >
2026-02-03 11:24:22 +08:00
Doiiars
70a6ca55bb
feat(database): add PostgreSQL support and fix Windows subprocess encoding
2026-01-09 00:41:59 +08:00
程序员阿江(Relakkes)
157ddfb21b
i18n: translate all Chinese comments, docstrings, and logger messages to English
...
Comprehensive translation of Chinese text to English across the entire codebase:
- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation
Preserved: Chinese disclaimer header (lines 10-18) for legal compliance
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
8a0fd49b96
refactor: 抽离应用 runner 并优化退出清理
...
- 新增 tools/app_runner.py 统一信号/取消/清理超时逻辑
- main.py 精简为业务入口与资源清理实现
- CDPBrowserManager 不再覆盖已有 SIGINT/SIGTERM 处理器
2025-12-15 18:06:57 +08:00
程序员阿江(Relakkes)
ff8c92daad
chore: add copyright to every file
2025-11-18 12:24:02 +08:00
程序员阿江(Relakkes)
6dcfd7e0a5
refactor: weibo login
2025-11-17 17:11:35 +08:00
程序员阿江(Relakkes)
e89a6d5781
feat: cdp browser cleanup after crawler done
2025-11-17 12:21:53 +08:00
程序员阿江(Relakkes)
889fa01466
fix: bili词云图修复
2025-11-02 13:25:31 +08:00
程序员阿江(Relakkes)
ed6e0bfb5f
refactor: tieba 改为浏览器获取数据
2025-10-19 17:09:55 +08:00
程序员阿江-Relakkes
3237073a0e
Improve BrowserLauncher cleanup handling
2025-09-26 16:52:38 +08:00
persist-1
926ea9dc42
fix: 修复路径分隔符连接方式不当导致的路径格式问题
...
- 修改'async_file_writer.py'中'_get_file_path'返回值由字符串连接改为直接用正斜杠拼接路径,以确保路径分隔符的统一
- 修改获取文件保存时间后缀方式为'get_current_date',以'天'为文件内容划分点
2025-09-11 00:35:02 +08:00
persist-1
0965bd6c96
fix: 使用 get_current_time() 替代 get_current_date() 以避免文件名因同日期而冲突
2025-09-06 04:43:56 +08:00
persist-1
be306c6f54
refactor(database): 重构数据库存储实现,使用SQLAlchemy ORM替代原始SQL操作
...
- 删除旧的async_db.py和async_sqlite_db.py实现
- 新增SQLAlchemy ORM模型和数据库会话管理
- 统一各平台存储实现到_store_impl.py文件
- 添加数据库初始化功能支持
- 更新.gitignore和pyproject.toml依赖配置
- 优化文件存储路径和命名规范
2025-09-06 04:10:20 +08:00
程序员阿江(Relakkes)
12450759d8
fix: httpx proxy format error
...
feat: add a ip proxy provider
2025-08-01 01:05:11 +08:00
GokoRuri
87caf07495
fix : #685
2025-07-30 21:14:37 +08:00
程序员阿江(Relakkes)
8ab1b7ee4c
fix: fixed circular import issue
2025-07-30 14:47:11 +08:00
korruz
07a6e387ea
refactor: move format_proxy_info to utils and update crawler classes to use it
2025-07-29 14:16:24 +08:00
程序员阿江(Relakkes)
13b00f7a36
refactor: config update
2025-07-18 23:26:52 +08:00
程序员阿江(Relakkes)
848df2b491
feat: other platfrom support the cdp mode
2025-07-03 17:13:32 +08:00
程序员阿江(Relakkes)
e83b2422d9
feat: 支持playwright通过cdp协议连接本地chrome浏览器
...
docs: 增加uv来管理python依赖的文档
2025-06-25 23:22:39 +08:00
Relakkes
03e393949a
fix: xhs帖子详情问题更新
2024-10-20 00:59:08 +08:00
Relakkes
9fe3e47b0f
chore: 增加代码学习声明,严格禁止非法、禁止商业、不当用途
2024-10-20 00:43:25 +08:00
Relakkes
b7e57da0d2
feat: 知乎支持(关键词、评论)
2024-09-08 00:00:04 +08:00
Relakkes
01ea4cd543
fix: 解决IpInfoModel循环导入依赖问题
2024-08-12 00:14:50 +08:00
Relakkes
ec47c230a9
feat: jieba日志调整为warning级别
2024-08-09 18:12:15 +08:00
Relakkes
62ac454639
Merge branch 'main' into feature/baidu_tieba_20240805
2024-08-08 14:21:59 +08:00
Relakkes
3f42368c02
feat: 百度贴吧done
2024-08-08 14:19:32 +08:00
Relakkes
1208682a9a
fix: 评论移除html标签内容
2024-08-07 02:39:50 +08:00
AuYeung
fc16ab7c5d
过滤空白字符
2024-08-06 15:24:23 +08:00
Relakkes
d347cf5a2c
feat: 帖子搜索 & 移除登录代码使用IP代理
2024-08-06 03:37:55 +08:00
Relakkes Yang
a0e5a29af8
fix: weibo bug
2024-06-17 00:25:48 +08:00
程序员阿江-Relakkes
131e68334d
Merge branch 'main' into main
2024-06-12 21:53:41 +08:00
Rosyrain
7048f040c9
完成词云图生成函数并添加至存储逻辑中
2024-06-12 15:33:39 +08:00
nelzomal
985ea93caf
add few arg cmds
2024-06-12 10:53:03 +08:00
Relakkes
4bba1447f8
feat: cache impl done
2024-06-02 19:57:13 +08:00
Henry He
0a95b7d30b
feat: 配置日志打印文件及行号
2024-04-27 12:11:42 +08:00
Relakkes
569202af78
feat: update user-agent list
2024-03-17 01:03:56 +08:00
Relakkes
38d6f10bf0
feat: 微博二维码登录done
2023-12-30 18:54:21 +08:00
Relakkes
eee81622ac
feat: 微博支持评论 & 指定帖子
2023-12-25 00:02:11 +08:00
Relakkes
c5b64fdbf5
feat: 微博爬虫帖子搜索完成
2023-12-24 17:57:48 +08:00
Relakkes
5aeee93fc5
feat: B站爬虫签名实现
2023-12-03 00:30:10 +08:00
Relakkes
986179b9c9
feat: 增加 IP 代理的最新实现
2023-12-02 16:14:36 +08:00
Relakkes
700946b28a
feat: 小红书增加指定帖子爬取功能
...
fix: 修复程序一些异常 bug
refactor: 优化部分代码逻辑
2023-11-18 13:38:11 +08:00
Relakkes
4ff2cf8661
refactor: 优化代码
2023-07-29 15:35:40 +08:00
Nanmi
745e59c875
feat: 完善类型注释,增加 mypy 类型检测
2023-07-16 17:57:18 +08:00
Relakkes
2398a17e21
refactor: 优化抖音Crawler部分代码
...
fix: 日志初始化错误修复
2023-07-15 21:30:12 +08:00
Relakkes
57437719bf
feat: 抖音三种方式登录实现 & 抖音滑块模拟滑动实现
2023-07-01 23:10:47 +08:00
Relakkes
66442b0ff8
fix: 接收短信通知的py应该放到项目根目录下,不然没法导入一些其他Package
2023-06-28 10:16:20 +08:00
Relakkes
b8093a2c0f
refactor:优化部分代码
...
feat: 增加IP代理账号池
2023-06-27 23:38:30 +08:00