Doiiars
70a6ca55bb
feat(database): add PostgreSQL support and fix Windows subprocess encoding
2026-01-09 00:41:59 +08:00
程序员阿江(Relakkes)
c895f53e22
fix : #803
2026-01-05 22:29:34 +08:00
程序员阿江(Relakkes)
157ddfb21b
i18n: translate all Chinese comments, docstrings, and logger messages to English
...
Comprehensive translation of Chinese text to English across the entire codebase:
- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation
Preserved: Chinese disclaimer header (lines 10-18) for legal compliance
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com >
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
6e858c1a00
feat: excel store with other platform
2025-11-28 15:12:36 +08:00
hsparks.codes
46ef86ddef
feat: Add Excel export functionality and unit tests
...
Features:
- Excel export with formatted multi-sheet workbooks (Contents, Comments, Creators)
- Professional styling: blue headers, auto-width columns, borders, text wrapping
- Smart export: empty sheets automatically removed
- Support for all platforms (xhs, dy, ks, bili, wb, tieba, zhihu)
Testing:
- Added pytest framework with asyncio support
- Unit tests for Excel store functionality
- Unit tests for store factory pattern
- Shared fixtures for test data
- Test coverage for edge cases
Documentation:
- Comprehensive Excel export guide (docs/excel_export_guide.md)
- Updated README.md and README_en.md with Excel examples
- Updated config comments to include excel option
Dependencies:
- Added openpyxl>=3.1.2 for Excel support
- Added pytest>=7.4.0 and pytest-asyncio>=0.21.0 for testing
This contribution adds immediate value for users who need data analysis
capabilities and establishes a testing foundation for future development.
2025-11-28 04:44:12 +01:00
程序员阿江(Relakkes)
ff8c92daad
chore: add copyright to every file
2025-11-18 12:24:02 +08:00
yt210
b61ec54a72
优化mongodb配置获取逻辑,移动存储基类位置。
2025-11-07 17:42:28 +08:00
yt210
ef6948b305
新增存储到mongoDB
2025-11-06 10:40:30 +08:00
persist-1
0d0af57a01
fix(store): 修复'crawler_type_var'的不当使用导致csv/json保存文件名异常的bug
2025-09-10 23:47:05 +08:00
persist-1
684a16ed9a
fix(数据库): 修复模型字段类型以支持更广泛的数据格式;
...
修复xhs评论存储方法,从批量处理改为单条处理
2025-09-07 04:10:49 +08:00
persist-1
e92c6130e1
fix(store): 修复存储实现的AsyncFileWriter导入
...
重构小红书存储实现,将store_comments方法改为处理单个评论的store_comment
为多个平台添加AsyncFileWriter工具类导入
2025-09-06 04:41:37 +08:00
persist-1
be306c6f54
refactor(database): 重构数据库存储实现,使用SQLAlchemy ORM替代原始SQL操作
...
- 删除旧的async_db.py和async_sqlite_db.py实现
- 新增SQLAlchemy ORM模型和数据库会话管理
- 统一各平台存储实现到_store_impl.py文件
- 添加数据库初始化功能支持
- 更新.gitignore和pyproject.toml依赖配置
- 优化文件存储路径和命名规范
2025-09-06 04:10:20 +08:00
未来可欺
a7cc18ec7d
修改部分文档
2025-07-30 18:58:10 +08:00
未来可欺
ecddfbe02c
将store文件夹中后缀名为_video或_image的.py文件统一更名为以_media.py为结尾的命名方式,避免某些平台仅有_video.py文件或_image.py文件的单独实现。之后的所有存储视频或图像的代码均放在此文件中实现
2025-07-30 18:32:08 +08:00
未来可欺
173bc08a9d
添加了抖音存储视频以及图片的逻辑,并将config.py中ENABLE_GET_IMAGES参数更名为ENABLE_GET_MEIDAS,在此基础上略微修改存储逻辑
2025-07-30 18:24:08 +08:00
买定不离手
1298022410
refactor: 更新各平台store模块初始化以支持SQLite
...
- 更新 store/bilibili/__init__.py 文件,导入SQLite存储实现类和相关模块
- 更新 store/douyin/__init__.py 文件,集成抖音平台的SQLite数据存储接口
- 更新 store/kuaishou/__init__.py 文件,添加快手平台SQLite存储模块的导入声明
- 更新 store/tieba/__init__.py 文件,引入贴吧平台SQLite数据库操作模块
- 更新 store/weibo/__init__.py 文件,整合微博平台SQLite存储功能模块
- 更新 store/xhs/__init__.py 文件,导入小红书平台SQLite数据存储实现
- 更新 store/zhihu/__init__.py 文件,集成知乎平台SQLite数据库存储模块
2025-07-14 03:51:08 +08:00
买定不离手
6f274d476b
feat: 添加各平台SQLite存储实现文件\n\n- 新增 store/bilibili/bilibili_store_impl.py: B站SQLite存储实现\n- 新增 store/douyin/douyin_store_impl.py: 抖音SQLite存储实现\n- 新增 store/kuaishou/kuaishou_store_impl.py: 快手SQLite存储实现\n- 新增 store/tieba/tieba_store_impl.py: 贴吧SQLite存储实现\n- 新增 store/weibo/weibo_store_impl.py: 微博SQLite存储实现\n- 新增 store/xhs/xhs_store_impl.py: 小红书SQLite存储实现\n- 新增 store/zhihu/zhihu_store_impl.py: 知乎SQLite存储实现
2025-07-14 03:36:36 +08:00
买定不离手
fb938f38aa
feat: 更新各平台store SQL文件以支持SQLite\n\n- 更新 store/kuaishou/kuaishou_store_sql.py: 快手平台SQLite适配\n- 更新 store/tieba/tieba_store_sql.py: 贴吧平台SQLite适配\n- 更新 store/weibo/weibo_store_sql.py: 微博平台SQLite适配\n- 更新 store/xhs/xhs_store_sql.py: 小红书平台SQLite适配\n- 更新 store/zhihu/zhihu_store_sql.py: 知乎平台SQLite适配
2025-07-14 03:36:20 +08:00
Bowenwin
703a6e84cb
fix_words
2025-05-19 20:07:20 +08:00
HuiLong
d929ad16ae
fix xhs get gender
2024-12-28 20:24:37 +08:00
Relakkes
79bf9fc05d
chore: add xhs field comment for issue #526
2024-12-26 18:28:23 +08:00
liudongkai
33e7ef016d
feat: xhs 非代理模式下增加随机等待间隔, db存储模式下增加存储xsec_token字段
2024-12-05 21:10:31 +08:00
Relakkes
9fe3e47b0f
chore: 增加代码学习声明,严格禁止非法、禁止商业、不当用途
2024-10-20 00:43:25 +08:00
tooyang
0c1adb75fe
fix: json内容保存格式优化,支持缩进
2024-09-04 11:24:12 +08:00
Relakkes
65699aa1cb
feat: xhs支持获取评论的点赞数量
2024-08-24 06:07:33 +08:00
Relakkes
c70bd9e071
feat: 增加搜索词来源渠道
2024-08-23 08:29:24 +08:00
Relakkes
7229d29123
feat: xhs update
2024-08-04 14:54:03 +08:00
Relakkes
f8096e3d58
feat: 抖音abogus参数更新
2024-07-14 03:20:05 +08:00
helloteemo
6545a15ff3
feature: 支持小红书图片、视频下载
2024-07-11 22:56:30 +08:00
helloteemo
e71690a985
fix: 解决小红书图片水印问题
2024-07-11 17:39:48 +08:00
you@company-pc
409c0ab36d
修复无法采集小红书 creator IP 归属地的问题
2024-06-26 11:56:41 +08:00
Relakkes Yang
a0e5a29af8
fix: weibo bug
2024-06-17 00:25:48 +08:00
Rosyrain
7048f040c9
完成词云图生成函数并添加至存储逻辑中
2024-06-12 15:33:39 +08:00
leantli
43acde240b
fix: 捕捉处理 ValueError 报错以及修改错字
2024-05-08 22:26:35 +08:00
KEXNA
9f8ffe1840
Update weibo_store_impl.py
...
Update bilibili_store_impl.py
新增id
Update bilibili_store_impl.py
新增id,解决同一天内的不同查询写入同一个文件的问题
Update douyin_store_impl.py
新增id,解决同一天内的不同查询写入同一个文件的问题
Update kuaishou_store_impl.py
Update weibo_store_impl.py
Update xhs_store_impl.py
Update weibo_store_impl.py
Update kuaishou_store_impl.py
Update bilibili_store_impl.py
Update douyin_store_impl.py
Update kuaishou_store_impl.py
Update xhs_store_impl.py
2024-04-30 21:42:06 +08:00
leantli
6cabece01a
chore: remove redundant line breaks
2024-04-12 18:18:01 +08:00
leantli
ad01dfba95
feat: 轻量化支持爬取小红书二级评论
2024-04-12 17:32:20 +08:00
leantli
81a9946afd
feat: 支持爬取小红书二级评论
2024-04-11 17:16:13 +08:00
Relakkes
d392747fe7
fix: 移除orm的所有内容
2024-04-06 23:51:03 +08:00
Relakkes
0c8484c334
feat: db数据存储重构完成
2024-04-06 22:11:10 +08:00
Relakkes
96309dcfee
fix: 小红书创作者功能数据获取优化
2024-03-17 14:50:10 +08:00
Relakkes
41fee4ff4f
feat:小红书支持获取评论中的图片链接 #145
2024-03-07 22:30:44 +08:00
jayeeliu@gmail.com
61ba8c5cc7
feat: 小红书支持通过博主ID采集笔记和评论,小红书type=search时支持配置按哪种排序方式获取笔记数据,小红书笔记增加视频地址和标签字段
2024-03-02 01:49:42 +08:00
Jian Chang
79c0f3bd68
修复小红书评论重复插入
2024-01-25 13:01:04 +08:00
Relakkes
e0f9a487e4
refactor: 代码优化
2024-01-16 00:40:07 +08:00
Relakkes
4dfa0d3fbf
feat: 数据保存支持JSON格式
2024-01-14 22:40:01 +08:00
Relakkes
894dabcf63
refactor: 数据存储重构,分离不同类型的存储实现
2024-01-14 22:06:31 +08:00