Commit Graph

97 Commits

Author SHA1 Message Date
程序员阿江(Relakkes)
d614ccf247 docs: translate comments and metadata to English
Update Chinese comments, variable descriptions, and metadata across
multiple configuration and core files to English. This improves
codebase accessibility for international developers. Additionally,
removed the sponsorship section from README files.
2026-02-12 05:30:11 +08:00
程序员阿江-Relakkes
4ad065ce9a Merge pull request #825 from ouzhuowei/add_save_data_path
新增数据保存路径,默认不指定则保存到data文件夹下
2026-02-04 18:03:22 +08:00
ouzhuowei
2a0d1fd69f 补充各平台的媒体存储文件路径适配
Co-Authored-By: ouzhuowei <190020754@qq.com>
2026-02-04 09:48:39 +08:00
程序员阿江(Relakkes)
fb42ab5b60 fix: #826 2026-02-03 20:35:33 +08:00
程序员阿江(Relakkes)
4de2a325a9 feat: ks comment api upgrade to v2 2026-01-09 21:09:39 +08:00
Doiiars
70a6ca55bb feat(database): add PostgreSQL support and fix Windows subprocess encoding 2026-01-09 00:41:59 +08:00
程序员阿江(Relakkes)
c895f53e22 fix: #803 2026-01-05 22:29:34 +08:00
程序员阿江(Relakkes)
157ddfb21b i18n: translate all Chinese comments, docstrings, and logger messages to English
Comprehensive translation of Chinese text to English across the entire codebase:

- api/: FastAPI server documentation and logger messages
- cache/: Cache abstraction layer comments and docstrings
- database/: Database models and MongoDB store documentation
- media_platform/: All platform crawlers (Bilibili, Douyin, Kuaishou, Tieba, Weibo, Xiaohongshu, Zhihu)
- model/: Data model documentation
- proxy/: Proxy pool and provider documentation
- store/: Data storage layer comments
- tools/: Utility functions and browser automation
- test/: Test file documentation

Preserved: Chinese disclaimer header (lines 10-18) for legal compliance

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-26 23:27:19 +08:00
程序员阿江(Relakkes)
6e858c1a00 feat: excel store with other platform 2025-11-28 15:12:36 +08:00
hsparks.codes
46ef86ddef feat: Add Excel export functionality and unit tests
Features:
- Excel export with formatted multi-sheet workbooks (Contents, Comments, Creators)
- Professional styling: blue headers, auto-width columns, borders, text wrapping
- Smart export: empty sheets automatically removed
- Support for all platforms (xhs, dy, ks, bili, wb, tieba, zhihu)

Testing:
- Added pytest framework with asyncio support
- Unit tests for Excel store functionality
- Unit tests for store factory pattern
- Shared fixtures for test data
- Test coverage for edge cases

Documentation:
- Comprehensive Excel export guide (docs/excel_export_guide.md)
- Updated README.md and README_en.md with Excel examples
- Updated config comments to include excel option

Dependencies:
- Added openpyxl>=3.1.2 for Excel support
- Added pytest>=7.4.0 and pytest-asyncio>=0.21.0 for testing

This contribution adds immediate value for users who need data analysis
capabilities and establishes a testing foundation for future development.
2025-11-28 04:44:12 +01:00
程序员阿江(Relakkes)
ff8c92daad chore: add copyright to every file 2025-11-18 12:24:02 +08:00
yangtao210
58eb89f073 Merge branch 'NanmiCoder:main' into main 2025-11-07 17:44:09 +08:00
yt210
b61ec54a72 优化mongodb配置获取逻辑,移动存储基类位置。 2025-11-07 17:42:28 +08:00
程序员阿江-Relakkes
05a1782746 Merge pull request #764 from yangtao210/main
新增存储到mongoDB
2025-11-06 06:10:49 -05:00
yt210
ef6948b305 新增存储到mongoDB 2025-11-06 10:40:30 +08:00
程序员阿江(Relakkes)
889fa01466 fix: bili词云图修复 2025-11-02 13:25:31 +08:00
persist-1
0d0af57a01 fix(store): 修复'crawler_type_var'的不当使用导致csv/json保存文件名异常的bug 2025-09-10 23:47:05 +08:00
persist-1
40de0e47e5 fix(store): 将async for循环替换为async with语句来修复zhihu数据库会话管理 2025-09-08 00:29:04 +08:00
persist-1
684a16ed9a fix(数据库): 修复模型字段类型以支持更广泛的数据格式;
修复xhs评论存储方法,从批量处理改为单条处理
2025-09-07 04:10:49 +08:00
persist-1
e92c6130e1 fix(store): 修复存储实现的AsyncFileWriter导入
重构小红书存储实现,将store_comments方法改为处理单个评论的store_comment
为多个平台添加AsyncFileWriter工具类导入
2025-09-06 04:41:37 +08:00
persist-1
be306c6f54 refactor(database): 重构数据库存储实现,使用SQLAlchemy ORM替代原始SQL操作
- 删除旧的async_db.py和async_sqlite_db.py实现
- 新增SQLAlchemy ORM模型和数据库会话管理
- 统一各平台存储实现到_store_impl.py文件
- 添加数据库初始化功能支持
- 更新.gitignore和pyproject.toml依赖配置
- 优化文件存储路径和命名规范
2025-09-06 04:10:20 +08:00
Czs-HF
48da268bc5 fix: 为抖音JSON存储添加格式化输出
- 在DouyinJsonStoreImplement.save_data_to_json方法中添加indent=4参数
- 使抖音JSON输出格式与小红书保持一致,提高可读性
- 解决JSON文件所有内容都在一行的问题
2025-08-16 12:52:37 +08:00
未来可欺
a7cc18ec7d 修改部分文档 2025-07-30 18:58:10 +08:00
未来可欺
ecddfbe02c 将store文件夹中后缀名为_video或_image的.py文件统一更名为以_media.py为结尾的命名方式,避免某些平台仅有_video.py文件或_image.py文件的单独实现。之后的所有存储视频或图像的代码均放在此文件中实现 2025-07-30 18:32:08 +08:00
未来可欺
173bc08a9d 添加了抖音存储视频以及图片的逻辑,并将config.py中ENABLE_GET_IMAGES参数更名为ENABLE_GET_MEIDAS,在此基础上略微修改存储逻辑 2025-07-30 18:24:08 +08:00
翟持江
e6db6be1ca Update __init__.py,添加了提取抖音笔记图片的逻辑 2025-07-30 10:45:38 +08:00
persist-1
19df1734f1 chore: 增加--help参数中文显示支持及douyin_aweme表music_download_url字段\n\n- 为命令行参数增加中文显示支持,提升用户体验\n- 在douyin_aweme表中新增music_download_url字段用于存储视频音乐下载链接\n- 更新相关数据库表结构文件(tables.sql, sqlite_tables.sql)\n- 实现音乐下载URL提取逻辑并集成到数据存储流程 2025-07-24 22:39:53 +08:00
买定不离手
3365095c62 fix: 完善Bilibili和抖音平台SQLite SQL语句适配
- 更新 store/bilibili/bilibili_store_sql.py 文件,优化Bilibili平台SQLite数据库SQL语句和查询逻辑
- 更新 store/douyin/douyin_store_sql.py 文件,修复抖音平台SQLite数据存储的SQL语句兼容性问题
2025-07-14 03:51:19 +08:00
买定不离手
1298022410 refactor: 更新各平台store模块初始化以支持SQLite
- 更新 store/bilibili/__init__.py 文件,导入SQLite存储实现类和相关模块
- 更新 store/douyin/__init__.py 文件,集成抖音平台的SQLite数据存储接口
- 更新 store/kuaishou/__init__.py 文件,添加快手平台SQLite存储模块的导入声明
- 更新 store/tieba/__init__.py 文件,引入贴吧平台SQLite数据库操作模块
- 更新 store/weibo/__init__.py 文件,整合微博平台SQLite存储功能模块
- 更新 store/xhs/__init__.py 文件,导入小红书平台SQLite数据存储实现
- 更新 store/zhihu/__init__.py 文件,集成知乎平台SQLite数据库存储模块
2025-07-14 03:51:08 +08:00
买定不离手
6f274d476b feat: 添加各平台SQLite存储实现文件\n\n- 新增 store/bilibili/bilibili_store_impl.py: B站SQLite存储实现\n- 新增 store/douyin/douyin_store_impl.py: 抖音SQLite存储实现\n- 新增 store/kuaishou/kuaishou_store_impl.py: 快手SQLite存储实现\n- 新增 store/tieba/tieba_store_impl.py: 贴吧SQLite存储实现\n- 新增 store/weibo/weibo_store_impl.py: 微博SQLite存储实现\n- 新增 store/xhs/xhs_store_impl.py: 小红书SQLite存储实现\n- 新增 store/zhihu/zhihu_store_impl.py: 知乎SQLite存储实现 2025-07-14 03:36:36 +08:00
买定不离手
fb938f38aa feat: 更新各平台store SQL文件以支持SQLite\n\n- 更新 store/kuaishou/kuaishou_store_sql.py: 快手平台SQLite适配\n- 更新 store/tieba/tieba_store_sql.py: 贴吧平台SQLite适配\n- 更新 store/weibo/weibo_store_sql.py: 微博平台SQLite适配\n- 更新 store/xhs/xhs_store_sql.py: 小红书平台SQLite适配\n- 更新 store/zhihu/zhihu_store_sql.py: 知乎平台SQLite适配 2025-07-14 03:36:20 +08:00
Relakkes
fd33813f8f feat: add like_count field to bilibi for issue #623 2025-06-20 15:50:38 +08:00
Relakkes
d55d8b1efa feat: Douyin supports obtaining video links and cover images. for issue #620 2025-06-14 23:59:08 +08:00
Bowenwin
66843f216a finish_all_for_expand_bili 2025-05-22 22:26:30 +08:00
Bowenwin
59619fff0a finish_all 2025-05-22 22:06:06 +08:00
Bowenwin
44e3d370ff fix_words 2025-05-22 20:31:48 +08:00
Bowenwin
a356358c21 get_fans_and_get_followings 2025-05-19 19:57:36 +08:00
翟持江
b675547aab Update __init__.py,为bilibili的视频信息、up主信息、评论信息添加额外字段 2025-04-19 02:29:22 +08:00
Relakkes
30d0e733d5 feat: douyin adds comment images 2025-01-15 14:50:05 +08:00
HuiLong
d929ad16ae fix xhs get gender 2024-12-28 20:24:37 +08:00
Relakkes
79bf9fc05d chore: add xhs field comment for issue #526 2024-12-26 18:28:23 +08:00
liudongkai
33e7ef016d feat: xhs 非代理模式下增加随机等待间隔, db存储模式下增加存储xsec_token字段 2024-12-05 21:10:31 +08:00
Relakkes
8ab4c67443 feat: 抖音支持评论点赞数量 #495 2024-11-16 00:37:48 +08:00
Relakkes
9fe3e47b0f chore: 增加代码学习声明,严格禁止非法、禁止商业、不当用途 2024-10-20 00:43:25 +08:00
Relakkes
da8f1c62b8 feat: 知乎支持创作者主页数据爬取(回答、文章、视频) 2024-10-16 21:02:27 +08:00
Relakkes
b7e57da0d2 feat: 知乎支持(关键词、评论) 2024-09-08 00:00:04 +08:00
tooyang
0c1adb75fe fix: json内容保存格式优化,支持缩进 2024-09-04 11:24:12 +08:00
Relakkes Yang
acb29add28 feat: 百度贴吧支持创作者主页帖子爬取 2024-08-24 11:03:23 +08:00
Relakkes
8adb593ba6 temp commit 2024-08-24 09:12:03 +08:00
Relakkes
65699aa1cb feat: xhs支持获取评论的点赞数量 2024-08-24 06:07:33 +08:00