The PR adds API limit overrides and static proxy support, but the review found that the default proxy provider changed to an invalid static placeholder and the new API fields accepted unbounded values. This keeps the existing proxy default intact, makes static proxy explicit via config or CLI, validates API limit ranges, and adds focused regression coverage for both paths.
Constraint: PR branch must remain contributor-branch compatible and avoid adding dependencies
Rejected: Keep static as the default provider | breaks existing --enable_ip_proxy defaults with an invalid placeholder URL
Rejected: Accept arbitrary integer limits | lets API callers request negative or excessive crawl sizes
Confidence: high
Scope-risk: narrow
Directive: Do not change proxy provider defaults when adding new providers; new providers should be opt-in and covered by provider-specific tests
Tested: uv run pytest tests/test_api_limits.py tests/test_static_proxy_provider.py
Tested: uv run pytest tests
Tested: uv run pytest test/test_utils.py
Tested: uv run python -m compileall api cmd_arg config proxy tests
Tested: git diff --cached --check
Not-tested: Live crawler run against external platforms or real proxy vendor endpoints
Tieba search, detail, comments, creator, and forum-list pages now rely on the current signed PC JSON APIs instead of brittle HTML selectors. The CLI also maps Tieba detail and creator arguments into the platform-specific config so command-line runs exercise the intended mode.
Constraint: Tieba PC pages no longer expose stable HTML structures for search, creator, and forum-list extraction
Constraint: Current PC APIs require browser cookies, tbs, and the web client signing convention
Rejected: Keep expanding HTML selectors | search and creator pages returned large documents with empty parsed results after the redesign
Confidence: high
Scope-risk: moderate
Directive: Do not replace these API paths with page HTML parsing without re-verifying the current Tieba network requests
Tested: uv run pytest tests/test_tieba_client_pagination.py tests/test_cmd_arg_tieba.py tests/test_tieba_extractor.py -q
Tested: uv run python -m py_compile cmd_arg/arg.py media_platform/tieba/help.py media_platform/tieba/client.py media_platform/tieba/core.py tests/test_cmd_arg_tieba.py tests/test_tieba_client_pagination.py tests/test_tieba_extractor.py
Tested: uv run main.py --platform tieba --type search --keywords 编程兼职 --get_comment false
Tested: uv run main.py --platform tieba --type detail --specified_id 9835114923 --get_comment true --max_comments_count_singlenotes 3
Tested: uv run main.py --platform tieba --type creator --creator_id https://tieba.baidu.com/home/main?id=tb.1.6ad0cd4a.7ZcjVYWa7UpHttCld2OppA --get_comment false
Not-tested: Second-level Tieba comment API migration; this path still uses the existing /p/comment HTML parser
Not-tested: Full pytest suite has one pre-existing unrelated XHS Excel store assertion failure
Migrate remaining httpx.AsyncClient call sites in proxy/ package to
use make_async_client(), completing the DISABLE_SSL_VERIFY coverage
across all outbound HTTP requests in the project.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add DISABLE_SSL_VERIFY = False to base_config.py (default: verification on)
- Add tools/httpx_util.py with make_async_client() factory that reads the config
- Replace all httpx.AsyncClient() call sites across all platforms (bilibili,
weibo, zhihu, xhs, douyin, kuaishou) and crawler_util with make_async_client()
- Extends SSL fix to previously missed platforms: xhs, douyin, kuaishou
Users running behind an intercepting proxy can set DISABLE_SSL_VERIFY = True
in config/base_config.py. All other users retain certificate verification.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add verify=False to all httpx.AsyncClient calls across bilibili,
weibo, zhihu clients and crawler_util. Fixes SSL certificate
validation errors when running behind a corporate proxy or VPN.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Update Chinese comments, variable descriptions, and metadata across
multiple configuration and core files to English. This improves
codebase accessibility for international developers. Additionally,
removed the sponsorship section from README files.