From 26c511e35f2ca765f9bd81f4e5c6c4b051a143a3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E7=A8=8B=E5=BA=8F=E5=91=98=E9=98=BF=E6=B1=9F=28Relakkes?=
 =?UTF-8?q?=29?= <relakkes@gmail.com>
Date: Thu, 18 Dec 2025 13:16:32 +0800
Subject: [PATCH] docs: add project architecture documentation with Mermaid
 diagrams
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

添加项目架构文档，包含：
- 系统架构总览图
- 数据流向图
- 爬虫基类体系和生命周期图
- 存储层架构图
- 代理、登录、缓存系统图
- 模块依赖关系图

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---
 docs/index.md        |   4 +
 docs/项目架构文档.md | 883 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 887 insertions(+)
 create mode 100644 docs/项目架构文档.md
diff --git a/docs/index.md b/docs/index.md
index dc0c837..332d0c5 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,5 +1,9 @@
 # MediaCrawler使用方法
 
+## 项目文档
+
+- [项目架构文档](项目架构文档.md) - 系统架构、模块设计、数据流向（含 Mermaid 图表）
+
 ## 推荐：使用 uv 管理依赖
 
 ### 1. 前置依赖
diff --git a/docs/项目架构文档.md b/docs/项目架构文档.md
new file mode 100644
index 0000000..661fac8
--- /dev/null
+++ b/docs/项目架构文档.md
@@ -0,0 +1,883 @@
+# MediaCrawler 项目架构文档
+
+## 1. 项目概述
+
+### 1.1 项目简介
+
+MediaCrawler 是一个多平台自媒体爬虫框架，采用 Python 异步编程实现，支持爬取主流社交媒体平台的内容、评论和创作者信息。
+
+### 1.2 支持的平台
+
+| 平台 | 代号 | 主要功能 |
+|------|------|---------|
+| 小红书 | `xhs` | 笔记搜索、详情、创作者 |
+| 抖音 | `dy` | 视频搜索、详情、创作者 |
+| 快手 | `ks` | 视频搜索、详情、创作者 |
+| B站 | `bili` | 视频搜索、详情、UP主 |
+| 微博 | `wb` | 微博搜索、详情、博主 |
+| 百度贴吧 | `tieba` | 帖子搜索、详情 |
+| 知乎 | `zhihu` | 问答搜索、详情、答主 |
+
+### 1.3 核心功能特性
+
+- **多平台支持**：统一的爬虫接口，支持 7 大主流平台
+- **多种登录方式**：二维码、手机号、Cookie 三种登录方式
+- **多种存储方式**：CSV、JSON、SQLite、MySQL、MongoDB、Excel
+- **反爬虫对策**：CDP 模式、代理 IP 池、请求签名
+- **异步高并发**：基于 asyncio 的异步架构，高效并发爬取
+- **词云生成**：自动生成评论词云图
+
+---
+
+## 2. 系统架构总览
+
+### 2.1 高层架构图
+
+```mermaid
+flowchart TB
+    subgraph Entry["入口层"]
+        main["main.py<br/>程序入口"]
+        cmdarg["cmd_arg<br/>命令行参数"]
+        config["config<br/>配置管理"]
+    end
+
+    subgraph Core["核心爬虫层"]
+        factory["CrawlerFactory<br/>爬虫工厂"]
+        base["AbstractCrawler<br/>爬虫基类"]
+
+        subgraph Platforms["平台实现"]
+            xhs["XiaoHongShuCrawler"]
+            dy["DouYinCrawler"]
+            ks["KuaishouCrawler"]
+            bili["BilibiliCrawler"]
+            wb["WeiboCrawler"]
+            tieba["TieBaCrawler"]
+            zhihu["ZhihuCrawler"]
+        end
+    end
+
+    subgraph Client["API客户端层"]
+        absClient["AbstractApiClient<br/>客户端基类"]
+        xhsClient["XiaoHongShuClient"]
+        dyClient["DouYinClient"]
+        ksClient["KuaiShouClient"]
+        biliClient["BilibiliClient"]
+        wbClient["WeiboClient"]
+        tiebaClient["BaiduTieBaClient"]
+        zhihuClient["ZhiHuClient"]
+    end
+
+    subgraph Storage["数据存储层"]
+        storeFactory["StoreFactory<br/>存储工厂"]
+        csv["CSV存储"]
+        json["JSON存储"]
+        sqlite["SQLite存储"]
+        mysql["MySQL存储"]
+        mongodb["MongoDB存储"]
+        excel["Excel存储"]
+    end
+
+    subgraph Infra["基础设施层"]
+        browser["浏览器管理<br/>Playwright/CDP"]
+        proxy["代理IP池"]
+        cache["缓存系统"]
+        login["登录管理"]
+    end
+
+    main --> factory
+    cmdarg --> main
+    config --> main
+    factory --> base
+    base --> Platforms
+    Platforms --> Client
+    Client --> Storage
+    Client --> Infra
+    Storage --> storeFactory
+    storeFactory --> csv & json & sqlite & mysql & mongodb & excel
+```
+
+### 2.2 数据流向图
+
+```mermaid
+flowchart LR
+    subgraph Input["输入"]
+        keywords["关键词/ID"]
+        config["配置参数"]
+    end
+
+    subgraph Process["处理流程"]
+        browser["启动浏览器"]
+        login["登录认证"]
+        search["搜索/爬取"]
+        parse["数据解析"]
+        comment["获取评论"]
+    end
+
+    subgraph Output["输出"]
+        content["内容数据"]
+        comments["评论数据"]
+        creator["创作者数据"]
+        media["媒体文件"]
+    end
+
+    subgraph Storage["存储"]
+        file["文件存储<br/>CSV/JSON/Excel"]
+        db["数据库<br/>SQLite/MySQL"]
+        nosql["NoSQL<br/>MongoDB"]
+    end
+
+    keywords --> browser
+    config --> browser
+    browser --> login
+    login --> search
+    search --> parse
+    parse --> comment
+    parse --> content
+    comment --> comments
+    parse --> creator
+    parse --> media
+    content & comments & creator --> file & db & nosql
+    media --> file
+```
+
+---
+
+## 3. 目录结构
+
+```
+MediaCrawler/
+├── main.py                 # 程序入口
+├── var.py                  # 全局上下文变量
+├── pyproject.toml          # 项目配置
+│
+├── base/                   # 基础抽象类
+│   └── base_crawler.py     # 爬虫、登录、存储、客户端基类
+│
+├── config/                 # 配置管理
+│   ├── base_config.py      # 核心配置
+│   ├── db_config.py        # 数据库配置
+│   └── {platform}_config.py # 平台特定配置
+│
+├── media_platform/         # 平台爬虫实现
+│   ├── xhs/                # 小红书
+│   ├── douyin/             # 抖音
+│   ├── kuaishou/           # 快手
+│   ├── bilibili/           # B站
+│   ├── weibo/              # 微博
+│   ├── tieba/              # 百度贴吧
+│   └── zhihu/              # 知乎
+│
+├── store/                  # 数据存储
+│   ├── excel_store_base.py # Excel存储基类
+│   └── {platform}/         # 各平台存储实现
+│
+├── database/               # 数据库层
+│   ├── models.py           # ORM模型定义
+│   ├── db_session.py       # 数据库会话管理
+│   └── mongodb_store_base.py # MongoDB基类
+│
+├── proxy/                  # 代理管理
+│   ├── proxy_ip_pool.py    # IP池管理
+│   ├── proxy_mixin.py      # 代理刷新混入
+│   └── providers/          # 代理提供商
+│
+├── cache/                  # 缓存系统
+│   ├── abs_cache.py        # 缓存抽象类
+│   ├── local_cache.py      # 本地缓存
+│   └── redis_cache.py      # Redis缓存
+│
+├── tools/                  # 工具模块
+│   ├── app_runner.py       # 应用运行管理
+│   ├── browser_launcher.py # 浏览器启动
+│   ├── cdp_browser.py      # CDP浏览器管理
+│   ├── crawler_util.py     # 爬虫工具
+│   └── async_file_writer.py # 异步文件写入
+│
+├── model/                  # 数据模型
+│   └── m_{platform}.py     # Pydantic模型
+│
+├── libs/                   # JS脚本库
+│   └── stealth.min.js      # 反检测脚本
+│
+└── cmd_arg/                # 命令行参数
+    └── arg.py              # 参数定义
+```
+
+---
+
+## 4. 核心模块详解
+
+### 4.1 爬虫基类体系
+
+```mermaid
+classDiagram
+    class AbstractCrawler {
+        <<abstract>>
+        +start()* 启动爬虫
+        +search()* 搜索功能
+        +launch_browser() 启动浏览器
+        +launch_browser_with_cdp() CDP模式启动
+    }
+
+    class AbstractLogin {
+        <<abstract>>
+        +begin()* 开始登录
+        +login_by_qrcode()* 二维码登录
+        +login_by_mobile()* 手机号登录
+        +login_by_cookies()* Cookie登录
+    }
+
+    class AbstractStore {
+        <<abstract>>
+        +store_content()* 存储内容
+        +store_comment()* 存储评论
+        +store_creator()* 存储创作者
+        +store_image()* 存储图片
+        +store_video()* 存储视频
+    }
+
+    class AbstractApiClient {
+        <<abstract>>
+        +request()* HTTP请求
+        +update_cookies()* 更新Cookies
+    }
+
+    class ProxyRefreshMixin {
+        +init_proxy_pool() 初始化代理池
+        +_refresh_proxy_if_expired() 刷新过期代理
+    }
+
+    class XiaoHongShuCrawler {
+        +xhs_client: XiaoHongShuClient
+        +start()
+        +search()
+        +get_specified_notes()
+        +get_creators_and_notes()
+    }
+
+    class XiaoHongShuClient {
+        +playwright_page: Page
+        +cookie_dict: Dict
+        +request()
+        +pong() 检查登录状态
+        +get_note_by_keyword()
+        +get_note_by_id()
+    }
+
+    AbstractCrawler <|-- XiaoHongShuCrawler
+    AbstractApiClient <|-- XiaoHongShuClient
+    ProxyRefreshMixin <|-- XiaoHongShuClient
+```
+
+### 4.2 爬虫生命周期
+
+```mermaid
+sequenceDiagram
+    participant Main as main.py
+    participant Factory as CrawlerFactory
+    participant Crawler as XiaoHongShuCrawler
+    participant Browser as Playwright/CDP
+    participant Login as XiaoHongShuLogin
+    participant Client as XiaoHongShuClient
+    participant Store as StoreFactory
+
+    Main->>Factory: create_crawler("xhs")
+    Factory-->>Main: crawler实例
+
+    Main->>Crawler: start()
+
+    alt 启用IP代理
+        Crawler->>Crawler: create_ip_pool()
+    end
+
+    alt CDP模式
+        Crawler->>Browser: launch_browser_with_cdp()
+    else 标准模式
+        Crawler->>Browser: launch_browser()
+    end
+    Browser-->>Crawler: browser_context
+
+    Crawler->>Crawler: create_xhs_client()
+    Crawler->>Client: pong() 检查登录状态
+
+    alt 未登录
+        Crawler->>Login: begin()
+        Login->>Login: login_by_qrcode/mobile/cookie
+        Login-->>Crawler: 登录成功
+    end
+
+    alt search模式
+        Crawler->>Client: get_note_by_keyword()
+        Client-->>Crawler: 搜索结果
+        loop 获取详情
+            Crawler->>Client: get_note_by_id()
+            Client-->>Crawler: 笔记详情
+        end
+    else detail模式
+        Crawler->>Client: get_note_by_id()
+    else creator模式
+        Crawler->>Client: get_creator_info()
+    end
+
+    Crawler->>Store: store_content/comment/creator
+    Store-->>Crawler: 存储完成
+
+    Main->>Crawler: cleanup()
+    Crawler->>Browser: close()
+```
+
+### 4.3 平台爬虫实现结构
+
+每个平台目录包含以下核心文件：
+
+```
+media_platform/{platform}/
+├── __init__.py         # 模块导出
+├── core.py             # 爬虫主实现类
+├── client.py           # API客户端
+├── login.py            # 登录实现
+├── field.py            # 字段/枚举定义
+├── exception.py        # 异常定义
+├── help.py             # 辅助函数
+└── {特殊实现}.py       # 平台特定逻辑
+```
+
+### 4.4 三种爬虫模式
+
+| 模式 | 配置值 | 功能描述 | 适用场景 |
+|------|--------|---------|---------|
+| 搜索模式 | `search` | 根据关键词搜索内容 | 批量获取特定主题内容 |
+| 详情模式 | `detail` | 获取指定ID的详情 | 精确获取已知内容 |
+| 创作者模式 | `creator` | 获取创作者所有内容 | 追踪特定博主/UP主 |
+
+---
+
+## 5. 数据存储层
+
+### 5.1 存储架构图
+
+```mermaid
+classDiagram
+    class AbstractStore {
+        <<abstract>>
+        +store_content()*
+        +store_comment()*
+        +store_creator()*
+    }
+
+    class StoreFactory {
+        +STORES: Dict
+        +create_store() AbstractStore
+    }
+
+    class CsvStoreImplement {
+        +async_file_writer: AsyncFileWriter
+        +store_content()
+        +store_comment()
+    }
+
+    class JsonStoreImplement {
+        +async_file_writer: AsyncFileWriter
+        +store_content()
+        +store_comment()
+    }
+
+    class DbStoreImplement {
+        +session: AsyncSession
+        +store_content()
+        +store_comment()
+    }
+
+    class SqliteStoreImplement {
+        +session: AsyncSession
+        +store_content()
+        +store_comment()
+    }
+
+    class MongoStoreImplement {
+        +mongo_base: MongoDBStoreBase
+        +store_content()
+        +store_comment()
+    }
+
+    class ExcelStoreImplement {
+        +excel_base: ExcelStoreBase
+        +store_content()
+        +store_comment()
+    }
+
+    AbstractStore <|-- CsvStoreImplement
+    AbstractStore <|-- JsonStoreImplement
+    AbstractStore <|-- DbStoreImplement
+    AbstractStore <|-- SqliteStoreImplement
+    AbstractStore <|-- MongoStoreImplement
+    AbstractStore <|-- ExcelStoreImplement
+    StoreFactory --> AbstractStore
+```
+
+### 5.2 存储工厂模式
+
+```python
+# 以抖音为例
+class DouyinStoreFactory:
+    STORES = {
+        "csv": DouyinCsvStoreImplement,
+        "db": DouyinDbStoreImplement,
+        "json": DouyinJsonStoreImplement,
+        "sqlite": DouyinSqliteStoreImplement,
+        "mongodb": DouyinMongoStoreImplement,
+        "excel": DouyinExcelStoreImplement,
+    }
+
+    @staticmethod
+    def create_store() -> AbstractStore:
+        store_class = DouyinStoreFactory.STORES.get(config.SAVE_DATA_OPTION)
+        return store_class()
+```
+
+### 5.3 存储方式对比
+
+| 存储方式 | 配置值 | 优点 | 适用场景 |
+|---------|--------|-----|---------|
+| CSV | `csv` | 简单、通用 | 小规模数据、快速查看 |
+| JSON | `json` | 结构完整、易解析 | API对接、数据交换 |
+| SQLite | `sqlite` | 轻量、无需服务 | 本地开发、小型项目 |
+| MySQL | `db` | 性能好、支持并发 | 生产环境、大规模数据 |
+| MongoDB | `mongodb` | 灵活、易扩展 | 非结构化数据、快速迭代 |
+| Excel | `excel` | 可视化、易分享 | 报告、数据分析 |
+
+---
+
+## 6. 基础设施层
+
+### 6.1 代理系统架构
+
+```mermaid
+flowchart TB
+    subgraph Config["配置"]
+        enable["ENABLE_IP_PROXY"]
+        provider["IP_PROXY_PROVIDER"]
+        count["IP_PROXY_POOL_COUNT"]
+    end
+
+    subgraph Pool["代理池管理"]
+        pool["ProxyIpPool"]
+        load["load_proxies()"]
+        validate["_is_valid_proxy()"]
+        get["get_proxy()"]
+        refresh["get_or_refresh_proxy()"]
+    end
+
+    subgraph Providers["代理提供商"]
+        kuaidl["快代理<br/>KuaiDaiLiProxy"]
+        wandou["万代理<br/>WanDouHttpProxy"]
+        jishu["技术IP<br/>JiShuHttpProxy"]
+    end
+
+    subgraph Client["API客户端"]
+        mixin["ProxyRefreshMixin"]
+        request["request()"]
+    end
+
+    enable --> pool
+    provider --> Providers
+    count --> load
+    pool --> load
+    load --> validate
+    validate --> Providers
+    pool --> get
+    pool --> refresh
+    mixin --> refresh
+    mixin --> Client
+    request --> mixin
+```
+
+### 6.2 登录流程
+
+```mermaid
+flowchart TB
+    Start([开始登录]) --> CheckType{登录类型?}
+
+    CheckType -->|qrcode| QR[显示二维码]
+    QR --> WaitScan[等待扫描]
+    WaitScan --> CheckQR{扫描成功?}
+    CheckQR -->|是| SaveCookie[保存Cookie]
+    CheckQR -->|否| WaitScan
+
+    CheckType -->|phone| Phone[输入手机号]
+    Phone --> SendCode[发送验证码]
+    SendCode --> Slider{需要滑块?}
+    Slider -->|是| DoSlider[滑动验证]
+    DoSlider --> InputCode[输入验证码]
+    Slider -->|否| InputCode
+    InputCode --> Verify[验证登录]
+    Verify --> SaveCookie
+
+    CheckType -->|cookie| LoadCookie[加载已保存Cookie]
+    LoadCookie --> VerifyCookie{Cookie有效?}
+    VerifyCookie -->|是| SaveCookie
+    VerifyCookie -->|否| Fail[登录失败]
+
+    SaveCookie --> UpdateContext[更新浏览器上下文]
+    UpdateContext --> End([登录完成])
+```
+
+### 6.3 浏览器管理
+
+```mermaid
+flowchart LR
+    subgraph Mode["启动模式"]
+        standard["标准模式<br/>Playwright"]
+        cdp["CDP模式<br/>Chrome DevTools"]
+    end
+
+    subgraph Standard["标准模式流程"]
+        launch["chromium.launch()"]
+        context["new_context()"]
+        stealth["注入stealth.js"]
+    end
+
+    subgraph CDP["CDP模式流程"]
+        detect["检测浏览器路径"]
+        start["启动浏览器进程"]
+        connect["connect_over_cdp()"]
+        cdpContext["获取已有上下文"]
+    end
+
+    subgraph Features["特性"]
+        f1["用户数据持久化"]
+        f2["扩展和设置继承"]
+        f3["反检测能力增强"]
+    end
+
+    standard --> Standard
+    cdp --> CDP
+    CDP --> Features
+```
+
+### 6.4 缓存系统
+
+```mermaid
+classDiagram
+    class AbstractCache {
+        <<abstract>>
+        +get(key)* 获取缓存
+        +set(key, value, expire)* 设置缓存
+        +keys(pattern)* 获取所有键
+    }
+
+    class ExpiringLocalCache {
+        -_cache: Dict
+        -_expire_times: Dict
+        +get(key)
+        +set(key, value, expire_time)
+        +keys(pattern)
+        -_is_expired(key)
+    }
+
+    class RedisCache {
+        -_client: Redis
+        +get(key)
+        +set(key, value, expire_time)
+        +keys(pattern)
+    }
+
+    class CacheFactory {
+        +create_cache(type) AbstractCache
+    }
+
+    AbstractCache <|-- ExpiringLocalCache
+    AbstractCache <|-- RedisCache
+    CacheFactory --> AbstractCache
+```
+
+---
+
+## 7. 数据模型
+
+### 7.1 ORM模型关系
+
+```mermaid
+erDiagram
+    DouyinAweme {
+        int id PK
+        string aweme_id UK
+        string aweme_type
+        string title
+        string desc
+        int create_time
+        int liked_count
+        int collected_count
+        int comment_count
+        int share_count
+        string user_id FK
+        datetime add_ts
+        datetime last_modify_ts
+    }
+
+    DouyinAwemeComment {
+        int id PK
+        string comment_id UK
+        string aweme_id FK
+        string content
+        int create_time
+        int sub_comment_count
+        string user_id
+        datetime add_ts
+        datetime last_modify_ts
+    }
+
+    DyCreator {
+        int id PK
+        string user_id UK
+        string nickname
+        string avatar
+        string desc
+        int follower_count
+        int total_favorited
+        datetime add_ts
+        datetime last_modify_ts
+    }
+
+    DouyinAweme ||--o{ DouyinAwemeComment : "has"
+    DyCreator ||--o{ DouyinAweme : "creates"
+```
+
+### 7.2 各平台数据表
+
+| 平台 | 内容表 | 评论表 | 创作者表 |
+|------|--------|--------|---------|
+| 抖音 | DouyinAweme | DouyinAwemeComment | DyCreator |
+| 小红书 | XHSNote | XHSNoteComment | XHSCreator |
+| 快手 | KuaishouVideo | KuaishouVideoComment | KsCreator |
+| B站 | BilibiliVideo | BilibiliVideoComment | BilibiliUpInfo |
+| 微博 | WeiboNote | WeiboNoteComment | WeiboCreator |
+| 贴吧 | TiebaNote | TiebaNoteComment | - |
+| 知乎 | ZhihuContent | ZhihuContentComment | ZhihuCreator |
+
+---
+
+## 8. 配置系统
+
+### 8.1 核心配置项
+
+```python
+# config/base_config.py
+
+# 平台选择
+PLATFORM = "xhs"  # xhs, dy, ks, bili, wb, tieba, zhihu
+
+# 登录配置
+LOGIN_TYPE = "qrcode"  # qrcode, phone, cookie
+SAVE_LOGIN_STATE = True
+
+# 爬虫配置
+CRAWLER_TYPE = "search"  # search, detail, creator
+KEYWORDS = "编程副业,编程兼职"
+CRAWLER_MAX_NOTES_COUNT = 15
+MAX_CONCURRENCY_NUM = 1
+
+# 评论配置
+ENABLE_GET_COMMENTS = True
+ENABLE_GET_SUB_COMMENTS = False
+CRAWLER_MAX_COMMENTS_COUNT_SINGLENOTES = 10
+
+# 浏览器配置
+HEADLESS = False
+ENABLE_CDP_MODE = True
+CDP_DEBUG_PORT = 9222
+
+# 代理配置
+ENABLE_IP_PROXY = False
+IP_PROXY_PROVIDER = "kuaidaili"
+IP_PROXY_POOL_COUNT = 2
+
+# 存储配置
+SAVE_DATA_OPTION = "json"  # csv, db, json, sqlite, mongodb, excel
+```
+
+### 8.2 数据库配置
+
+```python
+# config/db_config.py
+
+# MySQL
+MYSQL_DB_HOST = "localhost"
+MYSQL_DB_PORT = 3306
+MYSQL_DB_NAME = "media_crawler"
+
+# Redis
+REDIS_DB_HOST = "127.0.0.1"
+REDIS_DB_PORT = 6379
+
+# MongoDB
+MONGODB_HOST = "localhost"
+MONGODB_PORT = 27017
+
+# SQLite
+SQLITE_DB_PATH = "database/sqlite_tables.db"
+```
+
+---
+
+## 9. 工具模块
+
+### 9.1 工具函数概览
+
+| 模块 | 文件 | 主要功能 |
+|------|------|---------|
+| 应用运行器 | `app_runner.py` | 信号处理、优雅退出、清理管理 |
+| 浏览器启动 | `browser_launcher.py` | 检测浏览器路径、启动浏览器进程 |
+| CDP管理 | `cdp_browser.py` | CDP连接、浏览器上下文管理 |
+| 爬虫工具 | `crawler_util.py` | 二维码识别、验证码处理、User-Agent |
+| 文件写入 | `async_file_writer.py` | 异步CSV/JSON写入、词云生成 |
+| 滑块验证 | `slider_util.py` | 滑动验证码破解 |
+| 时间工具 | `time_util.py` | 时间戳转换、日期处理 |
+
+### 9.2 应用运行管理
+
+```mermaid
+flowchart TB
+    Start([程序启动]) --> Run["run(app_main, app_cleanup)"]
+    Run --> Main["执行 app_main()"]
+    Main --> Running{运行中}
+
+    Running -->|正常完成| Cleanup1["执行 app_cleanup()"]
+    Running -->|SIGINT/SIGTERM| Signal["捕获信号"]
+
+    Signal --> First{第一次信号?}
+    First -->|是| Cleanup2["启动清理流程"]
+    First -->|否| Force["强制退出"]
+
+    Cleanup1 & Cleanup2 --> Cancel["取消其他任务"]
+    Cancel --> Wait["等待任务完成<br/>(超时15秒)"]
+    Wait --> End([程序退出])
+    Force --> End
+```
+
+---
+
+## 10. 模块依赖关系
+
+```mermaid
+flowchart TB
+    subgraph Entry["入口层"]
+        main["main.py"]
+        config["config/"]
+        cmdarg["cmd_arg/"]
+    end
+
+    subgraph Core["核心层"]
+        base["base/base_crawler.py"]
+        platforms["media_platform/*/"]
+    end
+
+    subgraph Client["客户端层"]
+        client["*/client.py"]
+        login["*/login.py"]
+    end
+
+    subgraph Storage["存储层"]
+        store["store/"]
+        database["database/"]
+    end
+
+    subgraph Infra["基础设施"]
+        proxy["proxy/"]
+        cache["cache/"]
+        tools["tools/"]
+    end
+
+    subgraph External["外部依赖"]
+        playwright["Playwright"]
+        httpx["httpx"]
+        sqlalchemy["SQLAlchemy"]
+        motor["Motor/MongoDB"]
+    end
+
+    main --> config
+    main --> cmdarg
+    main --> Core
+
+    Core --> base
+    platforms --> base
+    platforms --> Client
+
+    client --> proxy
+    client --> httpx
+    login --> tools
+
+    platforms --> Storage
+    Storage --> sqlalchemy
+    Storage --> motor
+
+    client --> playwright
+    tools --> playwright
+
+    proxy --> cache
+```
+
+---
+
+## 11. 扩展指南
+
+### 11.1 添加新平台
+
+1. 在 `media_platform/` 下创建新目录
+2. 实现以下核心文件：
+   - `core.py` - 继承 `AbstractCrawler`
+   - `client.py` - 继承 `AbstractApiClient` 和 `ProxyRefreshMixin`
+   - `login.py` - 继承 `AbstractLogin`
+   - `field.py` - 定义平台枚举
+3. 在 `store/` 下创建对应存储目录
+4. 在 `main.py` 的 `CrawlerFactory.CRAWLERS` 中注册
+
+### 11.2 添加新存储方式
+
+1. 在 `store/` 下创建新的存储实现类
+2. 继承 `AbstractStore` 基类
+3. 实现 `store_content`、`store_comment`、`store_creator` 方法
+4. 在各平台的 `StoreFactory.STORES` 中注册
+
+### 11.3 添加新代理提供商
+
+1. 在 `proxy/providers/` 下创建新的代理类
+2. 继承 `BaseProxy` 基类
+3. 实现 `get_proxy()` 方法
+4. 在配置中注册
+
+---
+
+## 12. 快速参考
+
+### 12.1 常用命令
+
+```bash
+# 启动爬虫
+python main.py
+
+# 指定平台
+python main.py --platform xhs
+
+# 指定登录方式
+python main.py --lt qrcode
+
+# 指定爬虫类型
+python main.py --type search
+```
+
+### 12.2 关键文件路径
+
+| 用途 | 文件路径 |
+|------|---------|
+| 程序入口 | `main.py` |
+| 核心配置 | `config/base_config.py` |
+| 数据库配置 | `config/db_config.py` |
+| 爬虫基类 | `base/base_crawler.py` |
+| ORM模型 | `database/models.py` |
+| 代理池 | `proxy/proxy_ip_pool.py` |
+| CDP浏览器 | `tools/cdp_browser.py` |
+
+---
+
+*文档生成时间: 2025-12-18*