This commit introduces several improvements to enhance the stability and functionality of the Bilibili crawler.
- **Add Retry Logic:** Implement a retry mechanism with exponential backoff when fetching video comments. This makes the crawler more resilient to transient network issues or API errors.
- **Improve Error Handling:** Add a `try...except` block to handle potential `JSONDecodeError` in the Bilibili client, preventing crashes when the API returns an invalid response.
- **Ensure Clean Shutdown:** Refactor `main.py` to use a `try...finally` block, guaranteeing that the crawler and database connections are properly closed on exit, error, or `KeyboardInterrupt`.
- **Update Default Config:** Adjust default configuration values to increase concurrency, enable word cloud generation by default, and refine the Bilibili search mode for more practical usage.
Refactors the Bilibili keyword search functionality to provide more flexible crawling strategies and corrects a flaw in how crawl limits were applied.
Previously, the `ALL_DAY` boolean flag offered a rigid choice for time-based searching and contained a logical issue where `CRAWLER_MAX_NOTES_COUNT` was incorrectly applied on a per-day basis instead of as an overall total.
This commit introduces the `BILI_SEARCH_MODE` configuration option with three distinct modes:
- `normal`: The default search behavior without time constraints.
- `all_in_time_range`: Maximizes data collection within a specified date range, replicating the original intent of `ALL_DAY=True`.
- `daily_limit_in_time_range`: A new mode that strictly enforces both the daily `MAX_NOTES_PER_DAY` and the total `CRAWLER_MAX_NOTES_COUNT` limits across the entire date range.
This change resolves the limit logic bug and gives users more precise control over the crawling process.
Changes include:
- Modified `config/base_config.py` to replace `ALL_DAY` with `BILI_SEARCH_MODE`.
- Refactored `media_platform/bilibili/core.py` to implement the new search mode logic.
- Add prominent language selection section at the top of each README
- Include flag emojis and clear language indicators (🇨🇳 中文, 🇺🇸 English, 🇪🇸 Español)
- Format as horizontal table for easy scanning and navigation
- Show current language with arrow indicator (← Current/当前/Actual)
- Use relative links that work on both GitHub and local repositories
- Improve discoverability of multilingual documentation
- Consistent navigation across all three language versions
- Add README_en.md: Complete English translation of project documentation
- Add README_es.md: Complete Spanish translation of project documentation
- Maintain exact same structure, formatting, and technical accuracy as original
- Preserve all markdown formatting, links, code examples, and legal disclaimers
- Keep original Chinese README.md unchanged
- Support for English and Spanish-speaking developers while maintaining educational focus