Merge pull request #783 from hsparks-codes/feature/excel-export-and-tests

feat: Add Excel export functionality and unit tests
2026-06-24 09:44:56 +08:00 · 2025-11-28 15:14:25 +08:00
parent 31a092c653 324f09cf9f
commit ebbf86d67b
14 changed files with 882 additions and 4 deletions
--- a/README.md
+++ b/README.md
@@ -212,6 +212,10 @@ python main.py --help
 支持多种数据存储方式：
 - **CSV 文件**：支持保存到 CSV 中（`data/` 目录下）
 - **JSON 文件**：支持保存到 JSON 中（`data/` 目录下）
+- **Excel 文件**：支持保存到格式化的 Excel 文件（`data/` 目录下）✨ 新功能
+  - 多工作表支持（内容、评论、创作者）
+  - 专业格式化（标题样式、自动列宽、边框）
+  - 易于分析和分享
 - **数据库存储**
  - 使用参数 `--init_db` 进行数据库初始化（使用`--init_db`时不需要携带其他optional）
  - **SQLite 数据库**：轻量级数据库，无需服务器，适合个人使用（推荐）
@@ -224,6 +228,9 @@ python main.py --help

 ### 使用示例：
 ```shell
+# 使用 Excel 存储数据（推荐用于数据分析）✨ 新功能
+uv run main.py --platform xhs --lt qrcode --type search --save_data_option excel
+
 # 初始化 SQLite 数据库（使用'--init_db'时不需要携带其他optional）
 uv run main.py --init_db sqlite
 # 使用 SQLite 存储数据（推荐个人用户使用）
--- a/README_en.md
+++ b/README_en.md
@@ -209,6 +209,10 @@ python main.py --help
 Supports multiple data storage methods:
 - **CSV Files**: Supports saving to CSV (under `data/` directory)
 - **JSON Files**: Supports saving to JSON (under `data/` directory)
+- **Excel Files**: Supports saving to formatted Excel files (under `data/` directory) ✨ New Feature
+  - Multi-sheet support (Contents, Comments, Creators)
+  - Professional formatting (styled headers, auto-width columns, borders)
+  - Easy to analyze and share
 - **Database Storage**
  - Use the `--init_db` parameter for database initialization (when using `--init_db`, no other optional arguments are needed)
  - **SQLite Database**: Lightweight database, no server required, suitable for personal use (recommended)
@@ -221,6 +225,9 @@ Supports multiple data storage methods:

 ### Usage Examples:
 ```shell
+# Use Excel to store data (recommended for data analysis) ✨ New Feature
+uv run main.py --platform xhs --lt qrcode --type search --save_data_option excel
+
 # Initialize SQLite database (when using '--init_db', no other optional arguments are needed)
 uv run main.py --init_db sqlite
 # Use SQLite to store data (recommended for personal users)
--- a/config/base_config.py
+++ b/config/base_config.py
@@ -70,8 +70,8 @@ BROWSER_LAUNCH_TIMEOUT = 60
 # 设置为False可以保持浏览器运行，便于调试
 AUTO_CLOSE_BROWSER = True

-# 数据保存类型选项配置,支持四种类型：csv、db、json、sqlite, 最好保存到DB，有排重的功能。
-SAVE_DATA_OPTION = "json"  # csv or db or json or sqlite
+# 数据保存类型选项配置,支持五种类型：csv、db、json、sqlite、excel, 最好保存到DB，有排重的功能。
+SAVE_DATA_OPTION = "json"  # csv or db or json or sqlite or excel

 # 用户浏览器缓存的浏览器文件配置
 USER_DATA_DIR = "%s_user_data_dir"  # %s will be replaced by platform name
--- a/docs/excel_export_guide.md
+++ b/docs/excel_export_guide.md
@@ -0,0 +1,244 @@
+# Excel Export Guide
+
+## Overview
+
+MediaCrawler now supports exporting crawled data to formatted Excel files (.xlsx) with professional styling and multiple sheets for contents, comments, and creators.
+
+## Features
+
+- **Multi-sheet workbooks**: Separate sheets for Contents, Comments, and Creators
+- **Professional formatting**: 
+  - Styled headers with blue background and white text
+  - Auto-adjusted column widths
+  - Cell borders and text wrapping
+  - Clean, readable layout
+- **Smart export**: Empty sheets are automatically removed
+- **Organized storage**: Files saved to `data/{platform}/` directory with timestamps
+
+## Installation
+
+Excel export requires the `openpyxl` library:
+
+```bash
+# Using uv (recommended)
+uv sync
+
+# Or using pip
+pip install openpyxl
+```
+
+## Usage
+
+### Basic Usage
+
+1. **Configure Excel export** in `config/base_config.py`:
+
+```python
+SAVE_DATA_OPTION = "excel"  # Change from json/csv/db to excel
+```
+
+2. **Run the crawler**:
+
+```bash
+# Xiaohongshu example
+uv run main.py --platform xhs --lt qrcode --type search
+
+# Douyin example
+uv run main.py --platform dy --lt qrcode --type search
+
+# Bilibili example
+uv run main.py --platform bili --lt qrcode --type search
+```
+
+3. **Find your Excel file** in `data/{platform}/` directory:
+   - Filename format: `{platform}_{crawler_type}_{timestamp}.xlsx`
+   - Example: `xhs_search_20250128_143025.xlsx`
+
+### Command Line Examples
+
+```bash
+# Search by keywords and export to Excel
+uv run main.py --platform xhs --lt qrcode --type search --save_data_option excel
+
+# Crawl specific posts and export to Excel
+uv run main.py --platform xhs --lt qrcode --type detail --save_data_option excel
+
+# Crawl creator profile and export to Excel
+uv run main.py --platform xhs --lt qrcode --type creator --save_data_option excel
+```
+
+## Excel File Structure
+
+### Contents Sheet
+Contains post/video information:
+- `note_id`: Unique post identifier
+- `title`: Post title
+- `desc`: Post description
+- `user_id`: Author user ID
+- `nickname`: Author nickname
+- `liked_count`: Number of likes
+- `comment_count`: Number of comments
+- `share_count`: Number of shares
+- `ip_location`: IP location
+- `image_list`: Comma-separated image URLs
+- `tag_list`: Comma-separated tags
+- `note_url`: Direct link to post
+- And more platform-specific fields...
+
+### Comments Sheet
+Contains comment information:
+- `comment_id`: Unique comment identifier
+- `note_id`: Associated post ID
+- `content`: Comment text
+- `user_id`: Commenter user ID
+- `nickname`: Commenter nickname
+- `like_count`: Comment likes
+- `create_time`: Comment timestamp
+- `ip_location`: Commenter location
+- `sub_comment_count`: Number of replies
+- And more...
+
+### Creators Sheet
+Contains creator/author information:
+- `user_id`: Unique user identifier
+- `nickname`: Display name
+- `gender`: Gender
+- `avatar`: Profile picture URL
+- `desc`: Bio/description
+- `fans`: Follower count
+- `follows`: Following count
+- `interaction`: Total interactions
+- And more...
+
+## Advantages Over Other Formats
+
+### vs CSV
+- ✅ Multiple sheets in one file
+- ✅ Professional formatting
+- ✅ Better handling of special characters
+- ✅ Auto-adjusted column widths
+- ✅ No encoding issues
+
+### vs JSON
+- ✅ Human-readable tabular format
+- ✅ Easy to open in Excel/Google Sheets
+- ✅ Better for data analysis
+- ✅ Easier to share with non-technical users
+
+### vs Database
+- ✅ No database setup required
+- ✅ Portable single-file format
+- ✅ Easy to share and archive
+- ✅ Works offline
+
+## Tips & Best Practices
+
+1. **Large datasets**: For very large crawls (>10,000 rows), consider using database storage instead for better performance
+
+2. **Data analysis**: Excel files work great with:
+   - Microsoft Excel
+   - Google Sheets
+   - LibreOffice Calc
+   - Python pandas: `pd.read_excel('file.xlsx')`
+
+3. **Combining data**: You can merge multiple Excel files using:
+   ```python
+   import pandas as pd
+   df1 = pd.read_excel('file1.xlsx', sheet_name='Contents')
+   df2 = pd.read_excel('file2.xlsx', sheet_name='Contents')
+   combined = pd.concat([df1, df2])
+   combined.to_excel('combined.xlsx', index=False)
+   ```
+
+4. **File size**: Excel files are typically 2-3x larger than CSV but smaller than JSON
+
+## Troubleshooting
+
+### "openpyxl not installed" error
+
+```bash
+# Install openpyxl
+uv add openpyxl
+# or
+pip install openpyxl
+```
+
+### Excel file not created
+
+Check that:
+1. `SAVE_DATA_OPTION = "excel"` in config
+2. Crawler successfully collected data
+3. No errors in console output
+4. `data/{platform}/` directory exists
+
+### Empty Excel file
+
+This happens when:
+- No data was crawled (check keywords/IDs)
+- Login failed (check login status)
+- Platform blocked requests (check IP/rate limits)
+
+## Example Output
+
+After running a successful crawl, you'll see:
+
+```
+[ExcelStoreBase] Initialized Excel export to: data/xhs/xhs_search_20250128_143025.xlsx
+[ExcelStoreBase] Stored content to Excel: 7123456789
+[ExcelStoreBase] Stored comment to Excel: comment_123
+...
+[Main] Excel file saved successfully
+```
+
+Your Excel file will have:
+- Professional blue headers
+- Clean borders
+- Wrapped text for long content
+- Auto-sized columns
+- Separate organized sheets
+
+## Advanced Usage
+
+### Programmatic Access
+
+```python
+from store.excel_store_base import ExcelStoreBase
+
+# Create store
+store = ExcelStoreBase(platform="xhs", crawler_type="search")
+
+# Store data
+await store.store_content({
+    "note_id": "123",
+    "title": "Test Post",
+    "liked_count": 100
+})
+
+# Save to file
+store.flush()
+```
+
+### Custom Formatting
+
+You can extend `ExcelStoreBase` to customize formatting:
+
+```python
+from store.excel_store_base import ExcelStoreBase
+
+class CustomExcelStore(ExcelStoreBase):
+    def _apply_header_style(self, sheet, row_num=1):
+        # Custom header styling
+        super()._apply_header_style(sheet, row_num)
+        # Add your customizations here
+```
+
+## Support
+
+For issues or questions:
+- Check [常见问题](常见问题.md)
+- Open an issue on GitHub
+- Join the WeChat discussion group
+
+---
+
+**Note**: Excel export is designed for learning and research purposes. Please respect platform terms of service and rate limits.
--- a/main.py
+++ b/main.py
@@ -84,6 +84,18 @@ async def main():
    crawler = CrawlerFactory.create_crawler(platform=config.PLATFORM)
    await crawler.start()

+    # Flush Excel data if using Excel export
+    if config.SAVE_DATA_OPTION == "excel":
+        try:
+            # Get the store instance and flush data
+            from store.xhs import XhsStoreFactory
+            store = XhsStoreFactory.create_store()
+            if hasattr(store, 'flush'):
+                store.flush()
+                print(f"[Main] Excel file saved successfully")
+        except Exception as e:
+            print(f"Error flushing Excel data: {e}")
+
    # Generate wordcloud after crawling is complete
    # Only for JSON save mode
    if config.SAVE_DATA_OPTION == "json" and config.ENABLE_GET_WORDCLOUD:
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -35,6 +35,9 @@ dependencies = [
    "wordcloud==1.9.3",
    "xhshow>=0.1.3",
    "pre-commit>=3.5.0",
+    "openpyxl>=3.1.2",
+    "pytest>=7.4.0",
+    "pytest-asyncio>=0.21.0",
 ]

 [[tool.uv.index]]
--- a/requirements.txt
+++ b/requirements.txt
@@ -25,4 +25,7 @@ alembic>=1.16.5
 asyncmy>=0.2.10
 sqlalchemy>=2.0.43
 motor>=3.3.0
-xhshow>=0.1.3
+xhshow>=0.1.3
+openpyxl>=3.1.2
+pytest>=7.4.0
+pytest-asyncio>=0.21.0
--- a/store/excel_store_base.py
+++ b/store/excel_store_base.py
@@ -0,0 +1,263 @@
+# -*- coding: utf-8 -*-
+# Copyright (c) 2025 relakkes@gmail.com
+#
+# This file is part of MediaCrawler project.
+# Repository: https://github.com/NanmiCoder/MediaCrawler
+# GitHub: https://github.com/NanmiCoder
+# Licensed under NON-COMMERCIAL LEARNING LICENSE 1.1
+#
+# 声明:本代码仅供学习和研究目的使用。使用者应遵守以下原则:
+# 1. 不得用于任何商业用途。
+# 2. 使用时应遵守目标平台的使用条款和robots.txt规则。
+# 3. 不得进行大规模爬取或对平台造成运营干扰。
+# 4. 应合理控制请求频率,避免给目标平台带来不必要的负担。
+# 5. 不得用于任何非法或不当的用途。
+#
+# 详细许可条款请参阅项目根目录下的LICENSE文件。
+# 使用本代码即表示您同意遵守上述原则和LICENSE中的所有条款。
+
+"""
+Excel Store Base Implementation
+Provides Excel export functionality for crawled data with formatted sheets
+"""
+
+import os
+from datetime import datetime
+from typing import Dict, List, Any
+from pathlib import Path
+
+try:
+    import openpyxl
+    from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
+    from openpyxl.utils import get_column_letter
+    EXCEL_AVAILABLE = True
+except ImportError:
+    EXCEL_AVAILABLE = False
+
+from base.base_crawler import AbstractStore
+from tools import utils
+
+
+class ExcelStoreBase(AbstractStore):
+    """
+    Base class for Excel storage implementation
+    Provides formatted Excel export with multiple sheets for contents, comments, and creators
+    """
+    
+    def __init__(self, platform: str, crawler_type: str = "search"):
+        """
+        Initialize Excel store
+        
+        Args:
+            platform: Platform name (xhs, dy, ks, etc.)
+            crawler_type: Type of crawler (search, detail, creator)
+        """
+        if not EXCEL_AVAILABLE:
+            raise ImportError(
+                "openpyxl is required for Excel export. "
+                "Install it with: pip install openpyxl"
+            )
+        
+        super().__init__()
+        self.platform = platform
+        self.crawler_type = crawler_type
+        
+        # Create data directory
+        self.data_dir = Path("data") / platform
+        self.data_dir.mkdir(parents=True, exist_ok=True)
+        
+        # Initialize workbook
+        self.workbook = openpyxl.Workbook()
+        self.workbook.remove(self.workbook.active)  # Remove default sheet
+        
+        # Create sheets
+        self.contents_sheet = self.workbook.create_sheet("Contents")
+        self.comments_sheet = self.workbook.create_sheet("Comments")
+        self.creators_sheet = self.workbook.create_sheet("Creators")
+        
+        # Track if headers are written
+        self.contents_headers_written = False
+        self.comments_headers_written = False
+        self.creators_headers_written = False
+        
+        # Generate filename
+        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
+        self.filename = self.data_dir / f"{platform}_{crawler_type}_{timestamp}.xlsx"
+        
+        utils.logger.info(f"[ExcelStoreBase] Initialized Excel export to: {self.filename}")
+    
+    def _apply_header_style(self, sheet, row_num: int = 1):
+        """
+        Apply formatting to header row
+        
+        Args:
+            sheet: Worksheet object
+            row_num: Row number for headers (default: 1)
+        """
+        header_fill = PatternFill(start_color="366092", end_color="366092", fill_type="solid")
+        header_font = Font(bold=True, color="FFFFFF", size=11)
+        header_alignment = Alignment(horizontal="center", vertical="center", wrap_text=True)
+        border = Border(
+            left=Side(style='thin'),
+            right=Side(style='thin'),
+            top=Side(style='thin'),
+            bottom=Side(style='thin')
+        )
+        
+        for cell in sheet[row_num]:
+            cell.fill = header_fill
+            cell.font = header_font
+            cell.alignment = header_alignment
+            cell.border = border
+    
+    def _auto_adjust_column_width(self, sheet):
+        """
+        Auto-adjust column widths based on content
+        
+        Args:
+            sheet: Worksheet object
+        """
+        for column in sheet.columns:
+            max_length = 0
+            column_letter = get_column_letter(column[0].column)
+            
+            for cell in column:
+                try:
+                    if cell.value:
+                        max_length = max(max_length, len(str(cell.value)))
+                except:
+                    pass
+            
+            # Set width with min/max constraints
+            adjusted_width = min(max(max_length + 2, 10), 50)
+            sheet.column_dimensions[column_letter].width = adjusted_width
+    
+    def _write_headers(self, sheet, headers: List[str]):
+        """
+        Write headers to sheet
+        
+        Args:
+            sheet: Worksheet object
+            headers: List of header names
+        """
+        for col_num, header in enumerate(headers, 1):
+            sheet.cell(row=1, column=col_num, value=header)
+        
+        self._apply_header_style(sheet)
+    
+    def _write_row(self, sheet, data: Dict[str, Any], headers: List[str]):
+        """
+        Write data row to sheet
+        
+        Args:
+            sheet: Worksheet object
+            data: Data dictionary
+            headers: List of header names (defines column order)
+        """
+        row_num = sheet.max_row + 1
+        
+        for col_num, header in enumerate(headers, 1):
+            value = data.get(header, "")
+            
+            # Handle different data types
+            if isinstance(value, (list, dict)):
+                value = str(value)
+            elif value is None:
+                value = ""
+            
+            cell = sheet.cell(row=row_num, column=col_num, value=value)
+            
+            # Apply basic formatting
+            cell.alignment = Alignment(vertical="top", wrap_text=True)
+            cell.border = Border(
+                left=Side(style='thin'),
+                right=Side(style='thin'),
+                top=Side(style='thin'),
+                bottom=Side(style='thin')
+            )
+    
+    async def store_content(self, content_item: Dict):
+        """
+        Store content data to Excel
+        
+        Args:
+            content_item: Content data dictionary
+        """
+        # Define headers (customize based on platform)
+        headers = list(content_item.keys())
+        
+        # Write headers if first time
+        if not self.contents_headers_written:
+            self._write_headers(self.contents_sheet, headers)
+            self.contents_headers_written = True
+        
+        # Write data row
+        self._write_row(self.contents_sheet, content_item, headers)
+        
+        utils.logger.info(f"[ExcelStoreBase] Stored content to Excel: {content_item.get('note_id', 'N/A')}")
+    
+    async def store_comment(self, comment_item: Dict):
+        """
+        Store comment data to Excel
+        
+        Args:
+            comment_item: Comment data dictionary
+        """
+        # Define headers
+        headers = list(comment_item.keys())
+        
+        # Write headers if first time
+        if not self.comments_headers_written:
+            self._write_headers(self.comments_sheet, headers)
+            self.comments_headers_written = True
+        
+        # Write data row
+        self._write_row(self.comments_sheet, comment_item, headers)
+        
+        utils.logger.info(f"[ExcelStoreBase] Stored comment to Excel: {comment_item.get('comment_id', 'N/A')}")
+    
+    async def store_creator(self, creator_item: Dict):
+        """
+        Store creator data to Excel
+        
+        Args:
+            creator_item: Creator data dictionary
+        """
+        # Define headers
+        headers = list(creator_item.keys())
+        
+        # Write headers if first time
+        if not self.creators_headers_written:
+            self._write_headers(self.creators_sheet, headers)
+            self.creators_headers_written = True
+        
+        # Write data row
+        self._write_row(self.creators_sheet, creator_item, headers)
+        
+        utils.logger.info(f"[ExcelStoreBase] Stored creator to Excel: {creator_item.get('user_id', 'N/A')}")
+    
+    def flush(self):
+        """
+        Save workbook to file
+        """
+        try:
+            # Auto-adjust column widths
+            self._auto_adjust_column_width(self.contents_sheet)
+            self._auto_adjust_column_width(self.comments_sheet)
+            self._auto_adjust_column_width(self.creators_sheet)
+            
+            # Remove empty sheets
+            if self.contents_sheet.max_row == 1:
+                self.workbook.remove(self.contents_sheet)
+            if self.comments_sheet.max_row == 1:
+                self.workbook.remove(self.comments_sheet)
+            if self.creators_sheet.max_row == 1:
+                self.workbook.remove(self.creators_sheet)
+            
+            # Save workbook
+            self.workbook.save(self.filename)
+            utils.logger.info(f"[ExcelStoreBase] Excel file saved successfully: {self.filename}")
+            
+        except Exception as e:
+            utils.logger.error(f"[ExcelStoreBase] Error saving Excel file: {e}")
+            raise
--- a/store/xhs/init.py
+++ b/store/xhs/init.py
@@ -37,13 +37,14 @@ class XhsStoreFactory:
        "json": XhsJsonStoreImplement,
        "sqlite": XhsSqliteStoreImplement,
        "mongodb": XhsMongoStoreImplement,
+        "excel": XhsExcelStoreImplement,
    }

    @staticmethod
    def create_store() -> AbstractStore:
        store_class = XhsStoreFactory.STORES.get(config.SAVE_DATA_OPTION)
        if not store_class:
-            raise ValueError("[XhsStoreFactory.create_store] Invalid save option only supported csv or db or json or sqlite or mongodb ...")
+            raise ValueError("[XhsStoreFactory.create_store] Invalid save option only supported csv or db or json or sqlite or mongodb or excel ...")
        return store_class()


--- a/store/xhs/_store_impl.py
+++ b/store/xhs/_store_impl.py
@@ -37,6 +37,7 @@ from tools.time_util import get_current_timestamp
 from var import crawler_type_var
 from database.mongodb_store_base import MongoDBStoreBase
 from tools import utils
+from store.excel_store_base import ExcelStoreBase

 class XhsCsvStoreImplement(AbstractStore):
    def __init__(self, **kwargs):
@@ -336,3 +337,11 @@ class XhsMongoStoreImplement(AbstractStore):
            data=creator_item
        )
        utils.logger.info(f"[XhsMongoStoreImplement.store_creator] Saved creator {user_id} to MongoDB")
+
+
+class XhsExcelStoreImplement(ExcelStoreBase):
+    """小红书Excel存储实现"""
+
+    def __init__(self, **kwargs):
+        super().__init__(platform="xhs", crawler_type=crawler_type_var.get())
+        utils.logger.info("[XhsExcelStoreImplement] Excel store initialized")
--- a/tests/init.py
+++ b/tests/init.py
@@ -0,0 +1,2 @@
+# -*- coding: utf-8 -*-
+# MediaCrawler Test Suite
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -0,0 +1,81 @@
+# -*- coding: utf-8 -*-
+"""
+Pytest configuration and shared fixtures
+"""
+
+import pytest
+import sys
+from pathlib import Path
+
+# Add project root to Python path
+project_root = Path(__file__).parent.parent
+sys.path.insert(0, str(project_root))
+
+
+@pytest.fixture(scope="session")
+def project_root_path():
+    """Return project root path"""
+    return project_root
+
+
+@pytest.fixture
+def sample_xhs_note():
+    """Sample Xiaohongshu note data for testing"""
+    return {
+        "note_id": "test_note_123",
+        "type": "normal",
+        "title": "测试标题 Test Title",
+        "desc": "这是一个测试描述 This is a test description",
+        "video_url": "",
+        "time": 1700000000,
+        "last_update_time": 1700000000,
+        "user_id": "user_123",
+        "nickname": "测试用户",
+        "avatar": "https://example.com/avatar.jpg",
+        "liked_count": 100,
+        "collected_count": 50,
+        "comment_count": 25,
+        "share_count": 10,
+        "ip_location": "上海",
+        "image_list": "https://example.com/img1.jpg,https://example.com/img2.jpg",
+        "tag_list": "测试,编程,Python",
+        "note_url": "https://www.xiaohongshu.com/explore/test_note_123",
+        "source_keyword": "测试关键词",
+        "xsec_token": "test_token_123"
+    }
+
+
+@pytest.fixture
+def sample_xhs_comment():
+    """Sample Xiaohongshu comment data for testing"""
+    return {
+        "comment_id": "comment_123",
+        "create_time": 1700000000,
+        "ip_location": "北京",
+        "note_id": "test_note_123",
+        "content": "这是一条测试评论 This is a test comment",
+        "user_id": "user_456",
+        "nickname": "评论用户",
+        "avatar": "https://example.com/avatar2.jpg",
+        "sub_comment_count": 5,
+        "pictures": "",
+        "parent_comment_id": 0,
+        "like_count": 15
+    }
+
+
+@pytest.fixture
+def sample_xhs_creator():
+    """Sample Xiaohongshu creator data for testing"""
+    return {
+        "user_id": "creator_123",
+        "nickname": "创作者名称",
+        "gender": "女",
+        "avatar": "https://example.com/creator_avatar.jpg",
+        "desc": "这是创作者简介",
+        "ip_location": "广州",
+        "follows": 500,
+        "fans": 10000,
+        "interaction": 50000,
+        "tag_list": '{"profession": "设计师", "interest": "摄影"}'
+    }
--- a/tests/test_excel_store.py
+++ b/tests/test_excel_store.py
@@ -0,0 +1,171 @@
+# -*- coding: utf-8 -*-
+"""
+Unit tests for Excel export functionality
+"""
+
+import pytest
+import asyncio
+import os
+from pathlib import Path
+import tempfile
+import shutil
+
+try:
+    import openpyxl
+    EXCEL_AVAILABLE = True
+except ImportError:
+    EXCEL_AVAILABLE = False
+
+from store.excel_store_base import ExcelStoreBase
+
+
+@pytest.mark.skipif(not EXCEL_AVAILABLE, reason="openpyxl not installed")
+class TestExcelStoreBase:
+    """Test cases for ExcelStoreBase"""
+    
+    @pytest.fixture
+    def temp_dir(self):
+        """Create temporary directory for test files"""
+        temp_path = tempfile.mkdtemp()
+        yield temp_path
+        # Cleanup
+        shutil.rmtree(temp_path, ignore_errors=True)
+    
+    @pytest.fixture
+    def excel_store(self, temp_dir, monkeypatch):
+        """Create ExcelStoreBase instance for testing"""
+        # Monkey patch data directory
+        monkeypatch.chdir(temp_dir)
+        store = ExcelStoreBase(platform="test", crawler_type="search")
+        yield store
+        # Cleanup is handled by temp_dir fixture
+    
+    def test_initialization(self, excel_store):
+        """Test Excel store initialization"""
+        assert excel_store.platform == "test"
+        assert excel_store.crawler_type == "search"
+        assert excel_store.workbook is not None
+        assert excel_store.contents_sheet is not None
+        assert excel_store.comments_sheet is not None
+        assert excel_store.creators_sheet is not None
+    
+    @pytest.mark.asyncio
+    async def test_store_content(self, excel_store):
+        """Test storing content data"""
+        content_item = {
+            "note_id": "test123",
+            "title": "Test Title",
+            "desc": "Test Description",
+            "user_id": "user456",
+            "nickname": "TestUser",
+            "liked_count": 100,
+            "comment_count": 50
+        }
+        
+        await excel_store.store_content(content_item)
+        
+        # Verify data was written
+        assert excel_store.contents_sheet.max_row == 2  # Header + 1 data row
+        assert excel_store.contents_headers_written is True
+    
+    @pytest.mark.asyncio
+    async def test_store_comment(self, excel_store):
+        """Test storing comment data"""
+        comment_item = {
+            "comment_id": "comment123",
+            "note_id": "note456",
+            "content": "Great post!",
+            "user_id": "user789",
+            "nickname": "Commenter",
+            "like_count": 10
+        }
+        
+        await excel_store.store_comment(comment_item)
+        
+        # Verify data was written
+        assert excel_store.comments_sheet.max_row == 2  # Header + 1 data row
+        assert excel_store.comments_headers_written is True
+    
+    @pytest.mark.asyncio
+    async def test_store_creator(self, excel_store):
+        """Test storing creator data"""
+        creator_item = {
+            "user_id": "creator123",
+            "nickname": "Creator Name",
+            "fans": 10000,
+            "follows": 500,
+            "interaction": 50000
+        }
+        
+        await excel_store.store_creator(creator_item)
+        
+        # Verify data was written
+        assert excel_store.creators_sheet.max_row == 2  # Header + 1 data row
+        assert excel_store.creators_headers_written is True
+    
+    @pytest.mark.asyncio
+    async def test_multiple_items(self, excel_store):
+        """Test storing multiple items"""
+        # Store multiple content items
+        for i in range(5):
+            await excel_store.store_content({
+                "note_id": f"note{i}",
+                "title": f"Title {i}",
+                "liked_count": i * 10
+            })
+        
+        # Verify all items were stored
+        assert excel_store.contents_sheet.max_row == 6  # Header + 5 data rows
+    
+    def test_flush(self, excel_store):
+        """Test flushing data to file"""
+        # Add some test data
+        asyncio.run(excel_store.store_content({
+            "note_id": "test",
+            "title": "Test"
+        }))
+        
+        # Flush to file
+        excel_store.flush()
+        
+        # Verify file was created
+        assert excel_store.filename.exists()
+        
+        # Verify file can be opened
+        wb = openpyxl.load_workbook(excel_store.filename)
+        assert "Contents" in wb.sheetnames
+        wb.close()
+    
+    def test_header_formatting(self, excel_store):
+        """Test header row formatting"""
+        asyncio.run(excel_store.store_content({"note_id": "test", "title": "Test"}))
+        
+        # Check header formatting
+        header_cell = excel_store.contents_sheet.cell(row=1, column=1)
+        assert header_cell.font.bold is True
+        # RGB color may have different prefix (00 or FF), check the actual color part
+        assert header_cell.fill.start_color.rgb[-6:] == "366092"
+    
+    def test_empty_sheets_removed(self, excel_store):
+        """Test that empty sheets are removed on flush"""
+        # Only add content, leave comments and creators empty
+        asyncio.run(excel_store.store_content({"note_id": "test"}))
+        
+        excel_store.flush()
+        
+        # Reload workbook
+        wb = openpyxl.load_workbook(excel_store.filename)
+        
+        # Only Contents sheet should exist
+        assert "Contents" in wb.sheetnames
+        assert "Comments" not in wb.sheetnames
+        assert "Creators" not in wb.sheetnames
+        wb.close()
+
+
+@pytest.mark.skipif(not EXCEL_AVAILABLE, reason="openpyxl not installed")
+def test_excel_import_availability():
+    """Test that openpyxl is available"""
+    assert EXCEL_AVAILABLE is True
+    import openpyxl
+    assert openpyxl is not None
--- a/tests/test_store_factory.py
+++ b/tests/test_store_factory.py
@@ -0,0 +1,75 @@
+# -*- coding: utf-8 -*-
+"""
+Unit tests for Store Factory functionality
+"""
+
+import pytest
+from unittest.mock import patch, MagicMock
+
+from store.xhs import XhsStoreFactory
+from store.xhs._store_impl import (
+    XhsCsvStoreImplement,
+    XhsJsonStoreImplement,
+    XhsDbStoreImplement,
+    XhsSqliteStoreImplement,
+    XhsMongoStoreImplement,
+    XhsExcelStoreImplement
+)
+
+
+class TestXhsStoreFactory:
+    """Test cases for XhsStoreFactory"""
+    
+    @patch('config.SAVE_DATA_OPTION', 'csv')
+    def test_create_csv_store(self):
+        """Test creating CSV store"""
+        store = XhsStoreFactory.create_store()
+        assert isinstance(store, XhsCsvStoreImplement)
+    
+    @patch('config.SAVE_DATA_OPTION', 'json')
+    def test_create_json_store(self):
+        """Test creating JSON store"""
+        store = XhsStoreFactory.create_store()
+        assert isinstance(store, XhsJsonStoreImplement)
+    
+    @patch('config.SAVE_DATA_OPTION', 'db')
+    def test_create_db_store(self):
+        """Test creating database store"""
+        store = XhsStoreFactory.create_store()
+        assert isinstance(store, XhsDbStoreImplement)
+    
+    @patch('config.SAVE_DATA_OPTION', 'sqlite')
+    def test_create_sqlite_store(self):
+        """Test creating SQLite store"""
+        store = XhsStoreFactory.create_store()
+        assert isinstance(store, XhsSqliteStoreImplement)
+    
+    @patch('config.SAVE_DATA_OPTION', 'mongodb')
+    def test_create_mongodb_store(self):
+        """Test creating MongoDB store"""
+        store = XhsStoreFactory.create_store()
+        assert isinstance(store, XhsMongoStoreImplement)
+    
+    @patch('config.SAVE_DATA_OPTION', 'excel')
+    def test_create_excel_store(self):
+        """Test creating Excel store"""
+        # ContextVar cannot be mocked, so we test with actual value
+        store = XhsStoreFactory.create_store()
+        assert isinstance(store, XhsExcelStoreImplement)
+    
+    @patch('config.SAVE_DATA_OPTION', 'invalid')
+    def test_invalid_store_option(self):
+        """Test that invalid store option raises ValueError"""
+        with pytest.raises(ValueError) as exc_info:
+            XhsStoreFactory.create_store()
+        
+        assert "Invalid save option" in str(exc_info.value)
+    
+    def test_all_stores_registered(self):
+        """Test that all store types are registered"""
+        expected_stores = ['csv', 'json', 'db', 'sqlite', 'mongodb', 'excel']
+        
+        for store_type in expected_stores:
+            assert store_type in XhsStoreFactory.STORES
+        
+        assert len(XhsStoreFactory.STORES) == len(expected_stores)