49 Commits

Author SHA1 Message Date
ouzhuowei
7484156f02 新增数据保存路径,默认不指定则保存到data文件夹下
Co-Authored-By: ouzhuowei <190020754@qq.com>
2026-02-03 11:24:22 +08:00
Doiiars
70a6ca55bb feat(database): add PostgreSQL support and fix Windows subprocess encoding 2026-01-09 00:41:59 +08:00
程序员阿江(Relakkes)
157ddfb21b i18n: translate all Chinese comments, docstrings, and logger messages to English
Comprehensive translation of Chinese text to English across the entire codebase:

- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation

Preserved: Chinese disclaimer header (lines 10-18) for legal compliance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
8a0fd49b96 refactor: 抽离应用 runner 并优化退出清理
- 新增 tools/app_runner.py 统一信号/取消/清理超时逻辑
- main.py 精简为业务入口与资源清理实现
- CDPBrowserManager 不再覆盖已有 SIGINT/SIGTERM 处理器
2025-12-15 18:06:57 +08:00
程序员阿江(Relakkes)
ff8c92daad chore: add copyright to every file 2025-11-18 12:24:02 +08:00
程序员阿江(Relakkes)
6dcfd7e0a5 refactor: weibo login 2025-11-17 17:11:35 +08:00
程序员阿江(Relakkes)
e89a6d5781 feat: cdp browser cleanup after crawler done 2025-11-17 12:21:53 +08:00
程序员阿江(Relakkes)
889fa01466 fix: bili词云图修复 2025-11-02 13:25:31 +08:00
程序员阿江(Relakkes)
ed6e0bfb5f refactor: tieba 改为浏览器获取数据 2025-10-19 17:09:55 +08:00
程序员阿江-Relakkes
3237073a0e Improve BrowserLauncher cleanup handling 2025-09-26 16:52:38 +08:00
persist-1
926ea9dc42 fix: 修复路径分隔符连接方式不当导致的路径格式问题
- 修改'async_file_writer.py'中'_get_file_path'返回值由字符串连接改为直接用正斜杠拼接路径,以确保路径分隔符的统一
- 修改获取文件保存时间后缀方式为'get_current_date',以'天'为文件内容划分点
2025-09-11 00:35:02 +08:00
persist-1
0965bd6c96 fix: 使用 get_current_time() 替代 get_current_date() 以避免文件名因同日期而冲突 2025-09-06 04:43:56 +08:00
persist-1
be306c6f54 refactor(database): 重构数据库存储实现,使用SQLAlchemy ORM替代原始SQL操作
- 删除旧的async_db.py和async_sqlite_db.py实现
- 新增SQLAlchemy ORM模型和数据库会话管理
- 统一各平台存储实现到_store_impl.py文件
- 添加数据库初始化功能支持
- 更新.gitignore和pyproject.toml依赖配置
- 优化文件存储路径和命名规范
2025-09-06 04:10:20 +08:00
程序员阿江(Relakkes)
12450759d8 fix: httpx proxy format error
feat: add a ip proxy provider
2025-08-01 01:05:11 +08:00
GokoRuri
87caf07495 fix: #685 2025-07-30 21:14:37 +08:00
程序员阿江(Relakkes)
8ab1b7ee4c fix: fixed circular import issue 2025-07-30 14:47:11 +08:00
korruz
07a6e387ea refactor: move format_proxy_info to utils and update crawler classes to use it 2025-07-29 14:16:24 +08:00
程序员阿江(Relakkes)
13b00f7a36 refactor: config update 2025-07-18 23:26:52 +08:00
程序员阿江(Relakkes)
848df2b491 feat: other platfrom support the cdp mode 2025-07-03 17:13:32 +08:00
程序员阿江(Relakkes)
e83b2422d9 feat: 支持playwright通过cdp协议连接本地chrome浏览器
docs: 增加uv来管理python依赖的文档
2025-06-25 23:22:39 +08:00
Relakkes
03e393949a fix: xhs帖子详情问题更新 2024-10-20 00:59:08 +08:00
Relakkes
9fe3e47b0f chore: 增加代码学习声明,严格禁止非法、禁止商业、不当用途 2024-10-20 00:43:25 +08:00
Relakkes
b7e57da0d2 feat: 知乎支持(关键词、评论) 2024-09-08 00:00:04 +08:00
Relakkes
01ea4cd543 fix: 解决IpInfoModel循环导入依赖问题 2024-08-12 00:14:50 +08:00
Relakkes
ec47c230a9 feat: jieba日志调整为warning级别 2024-08-09 18:12:15 +08:00
Relakkes
62ac454639 Merge branch 'main' into feature/baidu_tieba_20240805 2024-08-08 14:21:59 +08:00
Relakkes
3f42368c02 feat: 百度贴吧done 2024-08-08 14:19:32 +08:00
Relakkes
1208682a9a fix: 评论移除html标签内容 2024-08-07 02:39:50 +08:00
AuYeung
fc16ab7c5d 过滤空白字符 2024-08-06 15:24:23 +08:00
Relakkes
d347cf5a2c feat: 帖子搜索 & 移除登录代码使用IP代理 2024-08-06 03:37:55 +08:00
Relakkes Yang
a0e5a29af8 fix: weibo bug 2024-06-17 00:25:48 +08:00
程序员阿江-Relakkes
131e68334d Merge branch 'main' into main 2024-06-12 21:53:41 +08:00
Rosyrain
7048f040c9 完成词云图生成函数并添加至存储逻辑中 2024-06-12 15:33:39 +08:00
nelzomal
985ea93caf add few arg cmds 2024-06-12 10:53:03 +08:00
Relakkes
4bba1447f8 feat: cache impl done 2024-06-02 19:57:13 +08:00
Henry He
0a95b7d30b feat: 配置日志打印文件及行号 2024-04-27 12:11:42 +08:00
Relakkes
569202af78 feat: update user-agent list 2024-03-17 01:03:56 +08:00
Relakkes
38d6f10bf0 feat: 微博二维码登录done 2023-12-30 18:54:21 +08:00
Relakkes
eee81622ac feat: 微博支持评论 & 指定帖子 2023-12-25 00:02:11 +08:00
Relakkes
c5b64fdbf5 feat: 微博爬虫帖子搜索完成 2023-12-24 17:57:48 +08:00
Relakkes
5aeee93fc5 feat: B站爬虫签名实现 2023-12-03 00:30:10 +08:00
Relakkes
986179b9c9 feat: 增加 IP 代理的最新实现 2023-12-02 16:14:36 +08:00
Relakkes
700946b28a feat: 小红书增加指定帖子爬取功能
fix: 修复程序一些异常 bug
refactor: 优化部分代码逻辑
2023-11-18 13:38:11 +08:00
Relakkes
4ff2cf8661 refactor: 优化代码 2023-07-29 15:35:40 +08:00
Nanmi
745e59c875 feat: 完善类型注释,增加 mypy 类型检测 2023-07-16 17:57:18 +08:00
Relakkes
2398a17e21 refactor: 优化抖音Crawler部分代码
fix: 日志初始化错误修复
2023-07-15 21:30:12 +08:00
Relakkes
57437719bf feat: 抖音三种方式登录实现 & 抖音滑块模拟滑动实现 2023-07-01 23:10:47 +08:00
Relakkes
66442b0ff8 fix: 接收短信通知的py应该放到项目根目录下,不然没法导入一些其他Package 2023-06-28 10:16:20 +08:00
Relakkes
b8093a2c0f refactor:优化部分代码
feat: 增加IP代理账号池
2023-06-27 23:38:30 +08:00