mirror of
https://github.com/NanmiCoder/MediaCrawler.git
synced 2026-03-02 04:00:45 +08:00
Refactors the Bilibili keyword search functionality to provide more flexible crawling strategies and corrects a flaw in how crawl limits were applied. Previously, the `ALL_DAY` boolean flag offered a rigid choice for time-based searching and contained a logical issue where `CRAWLER_MAX_NOTES_COUNT` was incorrectly applied on a per-day basis instead of as an overall total. This commit introduces the `BILI_SEARCH_MODE` configuration option with three distinct modes: - `normal`: The default search behavior without time constraints. - `all_in_time_range`: Maximizes data collection within a specified date range, replicating the original intent of `ALL_DAY=True`. - `daily_limit_in_time_range`: A new mode that strictly enforces both the daily `MAX_NOTES_PER_DAY` and the total `CRAWLER_MAX_NOTES_COUNT` limits across the entire date range. This change resolves the limit logic bug and gives users more precise control over the crawling process. Changes include: - Modified `config/base_config.py` to replace `ALL_DAY` with `BILI_SEARCH_MODE`. - Refactored `media_platform/bilibili/core.py` to implement the new search mode logic.