Commit Graph

491 Commits

Author SHA1 Message Date
Relakkes
fd33813f8f feat: add like_count field to bilibi for issue #623 2025-06-20 15:50:38 +08:00
Relakkes
31bcdb191f docs: update README.md 2025-06-16 13:58:09 +08:00
Relakkes
d55d8b1efa feat: Douyin supports obtaining video links and cover images. for issue #620 2025-06-14 23:59:08 +08:00
Relakkes
ed1dc7916a docs: update README.md 2025-06-08 15:56:02 +08:00
程序员阿江(Relakkes)
6323e2d45b Merge pull request #616 from chimeElm/main
修复CRAWLER_MAX_NOTES_COUNT在爬取小红书作者帖子时失效的问题
2025-06-07 14:43:37 +08:00
chimeElm
26a845581e Update client.py
修复CRAWLER_MAX_NOTES_COUNT在爬取小红书作者帖子时失效的问题
2025-06-07 02:41:09 +08:00
Relakkes
23c8f8f87b docs: add english license 2025-06-01 23:20:11 +08:00
Relakkes
1e7b950d3e Revert "chore: remove sponor"
This reverts commit 242c06c345.
2025-05-26 22:35:18 +08:00
Relakkes
242c06c345 chore: remove sponor 2025-05-25 11:54:38 +08:00
程序员阿江(Relakkes)
ff41faeb00 Merge pull request #608 from Bowenwin/bili_expand
Bili_function_expand
2025-05-22 23:14:58 +08:00
Bowenwin
66843f216a finish_all_for_expand_bili 2025-05-22 22:26:30 +08:00
Bowenwin
59619fff0a finish_all 2025-05-22 22:06:06 +08:00
Bowenwin
44e3d370ff fix_words 2025-05-22 20:31:48 +08:00
程序员阿江(Relakkes)
7ed6621933 Merge pull request #603 from Bowenwin/fix_words
Fix words
2025-05-19 23:16:12 +08:00
Bowenwin
703a6e84cb fix_words 2025-05-19 20:07:20 +08:00
Bowenwin
144b8bec6a fix_words 2025-05-19 20:04:00 +08:00
Bowenwin
a356358c21 get_fans_and_get_followings 2025-05-19 19:57:36 +08:00
Relakkes
654260cbce docs: update README.md 2025-05-13 18:42:58 +08:00
Relakkes
79a9824f6a fix: modify dy schema 2025-04-30 16:47:13 +08:00
Relakkes
67d31bf42a fix: dy update fp params 2025-04-30 13:26:22 +08:00
程序员阿江(Relakkes)
2a41b684ad Merge pull request #590 from 2513502304/main
关于 issue #589 的增强方法
2025-04-20 14:14:55 +08:00
翟持江
af5a393a7a Update core.py,删除了其它代码贡献者所添加的try-catch语句,该段try-catch语句将会影响其代码的最终逻辑并令其失效,使其仅能爬取当天一天数据而无法跳转到下一天(原先的逻辑就是try-catch捕获异常从而进入下一天,不要再向该语句中添加捕获异常操作或者finally语句!) 2025-04-19 04:34:24 +08:00
翟持江
b675547aab Update __init__.py,为bilibili的视频信息、up主信息、评论信息添加额外字段 2025-04-19 02:29:22 +08:00
翟持江
ec97001451 Update tables.sql 2025-04-19 02:22:22 +08:00
翟持江
9935a07279 Add files via upload 2025-04-19 02:18:52 +08:00
Relakkes
cb2ae6cbab docs: add sponsor 2025-04-16 16:49:49 +08:00
Relakkes
0d715a9f32 fix: bili qrcode login fix 2025-04-08 21:11:40 +08:00
Relakkes
660fd18a95 fix: dy login fix 2025-04-08 20:58:04 +08:00
程序员阿江(Relakkes)
afbd4ec1bf Merge pull request #572 from crpa33/main
避免预料之外的数据为None的情况终止进程
2025-04-02 13:34:41 +08:00
crpa33
274d64aefc 处理xhs意外的评论信息为空的情况
报错就会打断我,我没辙
2025-04-02 11:59:27 +08:00
crpa33
a39b571d27 输出到日志-处理视频搜索页任务列表构造的错误 2025-04-02 11:57:28 +08:00
crpa33
413d91a520 输出到日志-author被封禁或存在错误 2025-04-02 11:52:36 +08:00
crpa33
eaf14721f8 输出到日志-NoneType导致的推导式错误 2025-04-02 11:48:36 +08:00
crpa33
2c4af2337e douyin搜索页为空跳下一关键词
预计页数没到,空了也跳
2025-03-27 23:32:21 +08:00
crpa33
3c72fc48b0 保护author为None但未被识别的情况 2025-03-27 23:22:47 +08:00
crpa33
6b6e2b8ba0 修复NoneType导致的推导式错误 2025-03-27 23:18:01 +08:00
Relakkes
dfddfa7fdc docs: update README.md 2025-03-23 20:35:11 +08:00
Relakkes
daaea7155b feat: add uv tool to manage project 2025-03-23 18:13:13 +08:00
Relakkes
8030d2a02f docs: removed sponsor 2025-03-13 15:07:54 +08:00
Relakkes
061d1c15e2 feat: kuaishou search params update 2025-03-11 23:42:34 +08:00
Relakkes
f2cf864c27 fix: zhihu article url error #564 2025-03-03 18:18:41 +08:00
Relakkes
b43d6b7b91 chore: update config 2025-02-12 10:58:48 +08:00
Relakkes
66a7ab1db8 refactor: bibi default to get without time data 2025-02-12 10:58:15 +08:00
Relakkes
678ce1bfac fix: bilibili bugfix 2025-02-10 17:13:37 +08:00
Relakkes
457205efd8 docs: add sponsor 2025-02-08 15:28:18 +08:00
程序员阿江(Relakkes)
38f2b36bf5 Merge pull request #542 from 2513502304/main
Update core.py,为爬取类型为`detail`和`creator`的任务,添加了和`search`任务一样的,用于转存up主信息的`bilibili_store.update_up_info`的函数调用
2025-01-20 19:30:10 +08:00
翟持江
0364b23b5b Update core.py,为爬取类型为detailcreator的任务,添加了和search任务一样的,用于转存up主信息的bilibili_store.update_up_info的函数调用
正如`search`函数中一样,在调用`get_video_info_task`后,`bilibili_video`和`bilibili_up_info`信息都将获得。
原先的`get_specified_videos`在`detail`任务中仅保存了指定`bilibili_video`的信息,而`bilibili_up_info`信息尚未保存,`creator`任务的`get_creator_videos`中也调用了`get_specified_videos`获取指定创作者下所有的视频信息,同理也未保存`bilibili_up_info`信息。
所以只需为`get_specified_videos`添加一句`await bilibili_store.update_up_info(video_detail)`即可和`search`任务下获得的数据文件个数保持一致,不会缺少对应up主的个人信息。
已测试:
- 原先仅`search`任务下产生`*_creator.csv`、`*_contents.csv`、`*_comments.csv`,而`detail`和`creator`任务下缺少`*_creator.csv`文件。
- 此次提交后将使三种模式下的数据文件个数一致。
2025-01-19 19:55:18 +08:00
程序员阿江(Relakkes)
4b63ea68ec Merge pull request #538 from 2513502304/main
feat: bilibli support date range filter
2025-01-17 19:43:57 +08:00
翟持江
2d93ec5a82 Update core.py,更改了错误的缩进 2025-01-15 18:33:12 +08:00
翟持江
8741952cb5 Update requirements.txt,添加了pandas模块,datetime为Python标准库模块无需添加 2025-01-15 18:27:40 +08:00