Merge pull request #783 from hsparks-codes/feature/excel-export-and-tests

feat: Add Excel export functionality and unit tests
This commit is contained in:
程序员阿江-Relakkes
2025-11-28 15:14:25 +08:00
committed by GitHub
14 changed files with 882 additions and 4 deletions

View File

@@ -212,6 +212,10 @@ python main.py --help
支持多种数据存储方式:
- **CSV 文件**:支持保存到 CSV 中(`data/` 目录下)
- **JSON 文件**:支持保存到 JSON 中(`data/` 目录下)
- **Excel 文件**:支持保存到格式化的 Excel 文件(`data/` 目录下)✨ 新功能
- 多工作表支持(内容、评论、创作者)
- 专业格式化(标题样式、自动列宽、边框)
- 易于分析和分享
- **数据库存储**
- 使用参数 `--init_db` 进行数据库初始化(使用`--init_db`时不需要携带其他optional
- **SQLite 数据库**:轻量级数据库,无需服务器,适合个人使用(推荐)
@@ -224,6 +228,9 @@ python main.py --help
### 使用示例:
```shell
# 使用 Excel 存储数据(推荐用于数据分析)✨ 新功能
uv run main.py --platform xhs --lt qrcode --type search --save_data_option excel
# 初始化 SQLite 数据库(使用'--init_db'时不需要携带其他optional
uv run main.py --init_db sqlite
# 使用 SQLite 存储数据(推荐个人用户使用)

View File

@@ -209,6 +209,10 @@ python main.py --help
Supports multiple data storage methods:
- **CSV Files**: Supports saving to CSV (under `data/` directory)
- **JSON Files**: Supports saving to JSON (under `data/` directory)
- **Excel Files**: Supports saving to formatted Excel files (under `data/` directory) ✨ New Feature
- Multi-sheet support (Contents, Comments, Creators)
- Professional formatting (styled headers, auto-width columns, borders)
- Easy to analyze and share
- **Database Storage**
- Use the `--init_db` parameter for database initialization (when using `--init_db`, no other optional arguments are needed)
- **SQLite Database**: Lightweight database, no server required, suitable for personal use (recommended)
@@ -221,6 +225,9 @@ Supports multiple data storage methods:
### Usage Examples:
```shell
# Use Excel to store data (recommended for data analysis) ✨ New Feature
uv run main.py --platform xhs --lt qrcode --type search --save_data_option excel
# Initialize SQLite database (when using '--init_db', no other optional arguments are needed)
uv run main.py --init_db sqlite
# Use SQLite to store data (recommended for personal users)

View File

@@ -70,8 +70,8 @@ BROWSER_LAUNCH_TIMEOUT = 60
# 设置为False可以保持浏览器运行便于调试
AUTO_CLOSE_BROWSER = True
# 数据保存类型选项配置,支持种类型csv、db、json、sqlite, 最好保存到DB有排重的功能。
SAVE_DATA_OPTION = "json" # csv or db or json or sqlite
# 数据保存类型选项配置,支持种类型csv、db、json、sqlite、excel, 最好保存到DB有排重的功能。
SAVE_DATA_OPTION = "json" # csv or db or json or sqlite or excel
# 用户浏览器缓存的浏览器文件配置
USER_DATA_DIR = "%s_user_data_dir" # %s will be replaced by platform name

244
docs/excel_export_guide.md Normal file
View File

@@ -0,0 +1,244 @@
# Excel Export Guide
## Overview
MediaCrawler now supports exporting crawled data to formatted Excel files (.xlsx) with professional styling and multiple sheets for contents, comments, and creators.
## Features
- **Multi-sheet workbooks**: Separate sheets for Contents, Comments, and Creators
- **Professional formatting**:
- Styled headers with blue background and white text
- Auto-adjusted column widths
- Cell borders and text wrapping
- Clean, readable layout
- **Smart export**: Empty sheets are automatically removed
- **Organized storage**: Files saved to `data/{platform}/` directory with timestamps
## Installation
Excel export requires the `openpyxl` library:
```bash
# Using uv (recommended)
uv sync
# Or using pip
pip install openpyxl
```
## Usage
### Basic Usage
1. **Configure Excel export** in `config/base_config.py`:
```python
SAVE_DATA_OPTION = "excel" # Change from json/csv/db to excel
```
2. **Run the crawler**:
```bash
# Xiaohongshu example
uv run main.py --platform xhs --lt qrcode --type search
# Douyin example
uv run main.py --platform dy --lt qrcode --type search
# Bilibili example
uv run main.py --platform bili --lt qrcode --type search
```
3. **Find your Excel file** in `data/{platform}/` directory:
- Filename format: `{platform}_{crawler_type}_{timestamp}.xlsx`
- Example: `xhs_search_20250128_143025.xlsx`
### Command Line Examples
```bash
# Search by keywords and export to Excel
uv run main.py --platform xhs --lt qrcode --type search --save_data_option excel
# Crawl specific posts and export to Excel
uv run main.py --platform xhs --lt qrcode --type detail --save_data_option excel
# Crawl creator profile and export to Excel
uv run main.py --platform xhs --lt qrcode --type creator --save_data_option excel
```
## Excel File Structure
### Contents Sheet
Contains post/video information:
- `note_id`: Unique post identifier
- `title`: Post title
- `desc`: Post description
- `user_id`: Author user ID
- `nickname`: Author nickname
- `liked_count`: Number of likes
- `comment_count`: Number of comments
- `share_count`: Number of shares
- `ip_location`: IP location
- `image_list`: Comma-separated image URLs
- `tag_list`: Comma-separated tags
- `note_url`: Direct link to post
- And more platform-specific fields...
### Comments Sheet
Contains comment information:
- `comment_id`: Unique comment identifier
- `note_id`: Associated post ID
- `content`: Comment text
- `user_id`: Commenter user ID
- `nickname`: Commenter nickname
- `like_count`: Comment likes
- `create_time`: Comment timestamp
- `ip_location`: Commenter location
- `sub_comment_count`: Number of replies
- And more...
### Creators Sheet
Contains creator/author information:
- `user_id`: Unique user identifier
- `nickname`: Display name
- `gender`: Gender
- `avatar`: Profile picture URL
- `desc`: Bio/description
- `fans`: Follower count
- `follows`: Following count
- `interaction`: Total interactions
- And more...
## Advantages Over Other Formats
### vs CSV
- ✅ Multiple sheets in one file
- ✅ Professional formatting
- ✅ Better handling of special characters
- ✅ Auto-adjusted column widths
- ✅ No encoding issues
### vs JSON
- ✅ Human-readable tabular format
- ✅ Easy to open in Excel/Google Sheets
- ✅ Better for data analysis
- ✅ Easier to share with non-technical users
### vs Database
- ✅ No database setup required
- ✅ Portable single-file format
- ✅ Easy to share and archive
- ✅ Works offline
## Tips & Best Practices
1. **Large datasets**: For very large crawls (>10,000 rows), consider using database storage instead for better performance
2. **Data analysis**: Excel files work great with:
- Microsoft Excel
- Google Sheets
- LibreOffice Calc
- Python pandas: `pd.read_excel('file.xlsx')`
3. **Combining data**: You can merge multiple Excel files using:
```python
import pandas as pd
df1 = pd.read_excel('file1.xlsx', sheet_name='Contents')
df2 = pd.read_excel('file2.xlsx', sheet_name='Contents')
combined = pd.concat([df1, df2])
combined.to_excel('combined.xlsx', index=False)
```
4. **File size**: Excel files are typically 2-3x larger than CSV but smaller than JSON
## Troubleshooting
### "openpyxl not installed" error
```bash
# Install openpyxl
uv add openpyxl
# or
pip install openpyxl
```
### Excel file not created
Check that:
1. `SAVE_DATA_OPTION = "excel"` in config
2. Crawler successfully collected data
3. No errors in console output
4. `data/{platform}/` directory exists
### Empty Excel file
This happens when:
- No data was crawled (check keywords/IDs)
- Login failed (check login status)
- Platform blocked requests (check IP/rate limits)
## Example Output
After running a successful crawl, you'll see:
```
[ExcelStoreBase] Initialized Excel export to: data/xhs/xhs_search_20250128_143025.xlsx
[ExcelStoreBase] Stored content to Excel: 7123456789
[ExcelStoreBase] Stored comment to Excel: comment_123
...
[Main] Excel file saved successfully
```
Your Excel file will have:
- Professional blue headers
- Clean borders
- Wrapped text for long content
- Auto-sized columns
- Separate organized sheets
## Advanced Usage
### Programmatic Access
```python
from store.excel_store_base import ExcelStoreBase
# Create store
store = ExcelStoreBase(platform="xhs", crawler_type="search")
# Store data
await store.store_content({
"note_id": "123",
"title": "Test Post",
"liked_count": 100
})
# Save to file
store.flush()
```
### Custom Formatting
You can extend `ExcelStoreBase` to customize formatting:
```python
from store.excel_store_base import ExcelStoreBase
class CustomExcelStore(ExcelStoreBase):
def _apply_header_style(self, sheet, row_num=1):
# Custom header styling
super()._apply_header_style(sheet, row_num)
# Add your customizations here
```
## Support
For issues or questions:
- Check [常见问题](常见问题.md)
- Open an issue on GitHub
- Join the WeChat discussion group
---
**Note**: Excel export is designed for learning and research purposes. Please respect platform terms of service and rate limits.

12
main.py
View File

@@ -84,6 +84,18 @@ async def main():
crawler = CrawlerFactory.create_crawler(platform=config.PLATFORM)
await crawler.start()
# Flush Excel data if using Excel export
if config.SAVE_DATA_OPTION == "excel":
try:
# Get the store instance and flush data
from store.xhs import XhsStoreFactory
store = XhsStoreFactory.create_store()
if hasattr(store, 'flush'):
store.flush()
print(f"[Main] Excel file saved successfully")
except Exception as e:
print(f"Error flushing Excel data: {e}")
# Generate wordcloud after crawling is complete
# Only for JSON save mode
if config.SAVE_DATA_OPTION == "json" and config.ENABLE_GET_WORDCLOUD:

View File

@@ -35,6 +35,9 @@ dependencies = [
"wordcloud==1.9.3",
"xhshow>=0.1.3",
"pre-commit>=3.5.0",
"openpyxl>=3.1.2",
"pytest>=7.4.0",
"pytest-asyncio>=0.21.0",
]
[[tool.uv.index]]

View File

@@ -25,4 +25,7 @@ alembic>=1.16.5
asyncmy>=0.2.10
sqlalchemy>=2.0.43
motor>=3.3.0
xhshow>=0.1.3
xhshow>=0.1.3
openpyxl>=3.1.2
pytest>=7.4.0
pytest-asyncio>=0.21.0

263
store/excel_store_base.py Normal file
View File

@@ -0,0 +1,263 @@
# -*- coding: utf-8 -*-
# Copyright (c) 2025 relakkes@gmail.com
#
# This file is part of MediaCrawler project.
# Repository: https://github.com/NanmiCoder/MediaCrawler
# GitHub: https://github.com/NanmiCoder
# Licensed under NON-COMMERCIAL LEARNING LICENSE 1.1
#
# 声明:本代码仅供学习和研究目的使用。使用者应遵守以下原则:
# 1. 不得用于任何商业用途。
# 2. 使用时应遵守目标平台的使用条款和robots.txt规则。
# 3. 不得进行大规模爬取或对平台造成运营干扰。
# 4. 应合理控制请求频率,避免给目标平台带来不必要的负担。
# 5. 不得用于任何非法或不当的用途。
#
# 详细许可条款请参阅项目根目录下的LICENSE文件。
# 使用本代码即表示您同意遵守上述原则和LICENSE中的所有条款。
"""
Excel Store Base Implementation
Provides Excel export functionality for crawled data with formatted sheets
"""
import os
from datetime import datetime
from typing import Dict, List, Any
from pathlib import Path
try:
import openpyxl
from openpyxl.styles import Font, PatternFill, Alignment, Border, Side
from openpyxl.utils import get_column_letter
EXCEL_AVAILABLE = True
except ImportError:
EXCEL_AVAILABLE = False
from base.base_crawler import AbstractStore
from tools import utils
class ExcelStoreBase(AbstractStore):
"""
Base class for Excel storage implementation
Provides formatted Excel export with multiple sheets for contents, comments, and creators
"""
def __init__(self, platform: str, crawler_type: str = "search"):
"""
Initialize Excel store
Args:
platform: Platform name (xhs, dy, ks, etc.)
crawler_type: Type of crawler (search, detail, creator)
"""
if not EXCEL_AVAILABLE:
raise ImportError(
"openpyxl is required for Excel export. "
"Install it with: pip install openpyxl"
)
super().__init__()
self.platform = platform
self.crawler_type = crawler_type
# Create data directory
self.data_dir = Path("data") / platform
self.data_dir.mkdir(parents=True, exist_ok=True)
# Initialize workbook
self.workbook = openpyxl.Workbook()
self.workbook.remove(self.workbook.active) # Remove default sheet
# Create sheets
self.contents_sheet = self.workbook.create_sheet("Contents")
self.comments_sheet = self.workbook.create_sheet("Comments")
self.creators_sheet = self.workbook.create_sheet("Creators")
# Track if headers are written
self.contents_headers_written = False
self.comments_headers_written = False
self.creators_headers_written = False
# Generate filename
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
self.filename = self.data_dir / f"{platform}_{crawler_type}_{timestamp}.xlsx"
utils.logger.info(f"[ExcelStoreBase] Initialized Excel export to: {self.filename}")
def _apply_header_style(self, sheet, row_num: int = 1):
"""
Apply formatting to header row
Args:
sheet: Worksheet object
row_num: Row number for headers (default: 1)
"""
header_fill = PatternFill(start_color="366092", end_color="366092", fill_type="solid")
header_font = Font(bold=True, color="FFFFFF", size=11)
header_alignment = Alignment(horizontal="center", vertical="center", wrap_text=True)
border = Border(
left=Side(style='thin'),
right=Side(style='thin'),
top=Side(style='thin'),
bottom=Side(style='thin')
)
for cell in sheet[row_num]:
cell.fill = header_fill
cell.font = header_font
cell.alignment = header_alignment
cell.border = border
def _auto_adjust_column_width(self, sheet):
"""
Auto-adjust column widths based on content
Args:
sheet: Worksheet object
"""
for column in sheet.columns:
max_length = 0
column_letter = get_column_letter(column[0].column)
for cell in column:
try:
if cell.value:
max_length = max(max_length, len(str(cell.value)))
except:
pass
# Set width with min/max constraints
adjusted_width = min(max(max_length + 2, 10), 50)
sheet.column_dimensions[column_letter].width = adjusted_width
def _write_headers(self, sheet, headers: List[str]):
"""
Write headers to sheet
Args:
sheet: Worksheet object
headers: List of header names
"""
for col_num, header in enumerate(headers, 1):
sheet.cell(row=1, column=col_num, value=header)
self._apply_header_style(sheet)
def _write_row(self, sheet, data: Dict[str, Any], headers: List[str]):
"""
Write data row to sheet
Args:
sheet: Worksheet object
data: Data dictionary
headers: List of header names (defines column order)
"""
row_num = sheet.max_row + 1
for col_num, header in enumerate(headers, 1):
value = data.get(header, "")
# Handle different data types
if isinstance(value, (list, dict)):
value = str(value)
elif value is None:
value = ""
cell = sheet.cell(row=row_num, column=col_num, value=value)
# Apply basic formatting
cell.alignment = Alignment(vertical="top", wrap_text=True)
cell.border = Border(
left=Side(style='thin'),
right=Side(style='thin'),
top=Side(style='thin'),
bottom=Side(style='thin')
)
async def store_content(self, content_item: Dict):
"""
Store content data to Excel
Args:
content_item: Content data dictionary
"""
# Define headers (customize based on platform)
headers = list(content_item.keys())
# Write headers if first time
if not self.contents_headers_written:
self._write_headers(self.contents_sheet, headers)
self.contents_headers_written = True
# Write data row
self._write_row(self.contents_sheet, content_item, headers)
utils.logger.info(f"[ExcelStoreBase] Stored content to Excel: {content_item.get('note_id', 'N/A')}")
async def store_comment(self, comment_item: Dict):
"""
Store comment data to Excel
Args:
comment_item: Comment data dictionary
"""
# Define headers
headers = list(comment_item.keys())
# Write headers if first time
if not self.comments_headers_written:
self._write_headers(self.comments_sheet, headers)
self.comments_headers_written = True
# Write data row
self._write_row(self.comments_sheet, comment_item, headers)
utils.logger.info(f"[ExcelStoreBase] Stored comment to Excel: {comment_item.get('comment_id', 'N/A')}")
async def store_creator(self, creator_item: Dict):
"""
Store creator data to Excel
Args:
creator_item: Creator data dictionary
"""
# Define headers
headers = list(creator_item.keys())
# Write headers if first time
if not self.creators_headers_written:
self._write_headers(self.creators_sheet, headers)
self.creators_headers_written = True
# Write data row
self._write_row(self.creators_sheet, creator_item, headers)
utils.logger.info(f"[ExcelStoreBase] Stored creator to Excel: {creator_item.get('user_id', 'N/A')}")
def flush(self):
"""
Save workbook to file
"""
try:
# Auto-adjust column widths
self._auto_adjust_column_width(self.contents_sheet)
self._auto_adjust_column_width(self.comments_sheet)
self._auto_adjust_column_width(self.creators_sheet)
# Remove empty sheets
if self.contents_sheet.max_row == 1:
self.workbook.remove(self.contents_sheet)
if self.comments_sheet.max_row == 1:
self.workbook.remove(self.comments_sheet)
if self.creators_sheet.max_row == 1:
self.workbook.remove(self.creators_sheet)
# Save workbook
self.workbook.save(self.filename)
utils.logger.info(f"[ExcelStoreBase] Excel file saved successfully: {self.filename}")
except Exception as e:
utils.logger.error(f"[ExcelStoreBase] Error saving Excel file: {e}")
raise

View File

@@ -37,13 +37,14 @@ class XhsStoreFactory:
"json": XhsJsonStoreImplement,
"sqlite": XhsSqliteStoreImplement,
"mongodb": XhsMongoStoreImplement,
"excel": XhsExcelStoreImplement,
}
@staticmethod
def create_store() -> AbstractStore:
store_class = XhsStoreFactory.STORES.get(config.SAVE_DATA_OPTION)
if not store_class:
raise ValueError("[XhsStoreFactory.create_store] Invalid save option only supported csv or db or json or sqlite or mongodb ...")
raise ValueError("[XhsStoreFactory.create_store] Invalid save option only supported csv or db or json or sqlite or mongodb or excel ...")
return store_class()

View File

@@ -37,6 +37,7 @@ from tools.time_util import get_current_timestamp
from var import crawler_type_var
from database.mongodb_store_base import MongoDBStoreBase
from tools import utils
from store.excel_store_base import ExcelStoreBase
class XhsCsvStoreImplement(AbstractStore):
def __init__(self, **kwargs):
@@ -336,3 +337,11 @@ class XhsMongoStoreImplement(AbstractStore):
data=creator_item
)
utils.logger.info(f"[XhsMongoStoreImplement.store_creator] Saved creator {user_id} to MongoDB")
class XhsExcelStoreImplement(ExcelStoreBase):
"""小红书Excel存储实现"""
def __init__(self, **kwargs):
super().__init__(platform="xhs", crawler_type=crawler_type_var.get())
utils.logger.info("[XhsExcelStoreImplement] Excel store initialized")

2
tests/__init__.py Normal file
View File

@@ -0,0 +1,2 @@
# -*- coding: utf-8 -*-
# MediaCrawler Test Suite

81
tests/conftest.py Normal file
View File

@@ -0,0 +1,81 @@
# -*- coding: utf-8 -*-
"""
Pytest configuration and shared fixtures
"""
import pytest
import sys
from pathlib import Path
# Add project root to Python path
project_root = Path(__file__).parent.parent
sys.path.insert(0, str(project_root))
@pytest.fixture(scope="session")
def project_root_path():
"""Return project root path"""
return project_root
@pytest.fixture
def sample_xhs_note():
"""Sample Xiaohongshu note data for testing"""
return {
"note_id": "test_note_123",
"type": "normal",
"title": "测试标题 Test Title",
"desc": "这是一个测试描述 This is a test description",
"video_url": "",
"time": 1700000000,
"last_update_time": 1700000000,
"user_id": "user_123",
"nickname": "测试用户",
"avatar": "https://example.com/avatar.jpg",
"liked_count": 100,
"collected_count": 50,
"comment_count": 25,
"share_count": 10,
"ip_location": "上海",
"image_list": "https://example.com/img1.jpg,https://example.com/img2.jpg",
"tag_list": "测试,编程,Python",
"note_url": "https://www.xiaohongshu.com/explore/test_note_123",
"source_keyword": "测试关键词",
"xsec_token": "test_token_123"
}
@pytest.fixture
def sample_xhs_comment():
"""Sample Xiaohongshu comment data for testing"""
return {
"comment_id": "comment_123",
"create_time": 1700000000,
"ip_location": "北京",
"note_id": "test_note_123",
"content": "这是一条测试评论 This is a test comment",
"user_id": "user_456",
"nickname": "评论用户",
"avatar": "https://example.com/avatar2.jpg",
"sub_comment_count": 5,
"pictures": "",
"parent_comment_id": 0,
"like_count": 15
}
@pytest.fixture
def sample_xhs_creator():
"""Sample Xiaohongshu creator data for testing"""
return {
"user_id": "creator_123",
"nickname": "创作者名称",
"gender": "",
"avatar": "https://example.com/creator_avatar.jpg",
"desc": "这是创作者简介",
"ip_location": "广州",
"follows": 500,
"fans": 10000,
"interaction": 50000,
"tag_list": '{"profession": "设计师", "interest": "摄影"}'
}

171
tests/test_excel_store.py Normal file
View File

@@ -0,0 +1,171 @@
# -*- coding: utf-8 -*-
"""
Unit tests for Excel export functionality
"""
import pytest
import asyncio
import os
from pathlib import Path
import tempfile
import shutil
try:
import openpyxl
EXCEL_AVAILABLE = True
except ImportError:
EXCEL_AVAILABLE = False
from store.excel_store_base import ExcelStoreBase
@pytest.mark.skipif(not EXCEL_AVAILABLE, reason="openpyxl not installed")
class TestExcelStoreBase:
"""Test cases for ExcelStoreBase"""
@pytest.fixture
def temp_dir(self):
"""Create temporary directory for test files"""
temp_path = tempfile.mkdtemp()
yield temp_path
# Cleanup
shutil.rmtree(temp_path, ignore_errors=True)
@pytest.fixture
def excel_store(self, temp_dir, monkeypatch):
"""Create ExcelStoreBase instance for testing"""
# Monkey patch data directory
monkeypatch.chdir(temp_dir)
store = ExcelStoreBase(platform="test", crawler_type="search")
yield store
# Cleanup is handled by temp_dir fixture
def test_initialization(self, excel_store):
"""Test Excel store initialization"""
assert excel_store.platform == "test"
assert excel_store.crawler_type == "search"
assert excel_store.workbook is not None
assert excel_store.contents_sheet is not None
assert excel_store.comments_sheet is not None
assert excel_store.creators_sheet is not None
@pytest.mark.asyncio
async def test_store_content(self, excel_store):
"""Test storing content data"""
content_item = {
"note_id": "test123",
"title": "Test Title",
"desc": "Test Description",
"user_id": "user456",
"nickname": "TestUser",
"liked_count": 100,
"comment_count": 50
}
await excel_store.store_content(content_item)
# Verify data was written
assert excel_store.contents_sheet.max_row == 2 # Header + 1 data row
assert excel_store.contents_headers_written is True
@pytest.mark.asyncio
async def test_store_comment(self, excel_store):
"""Test storing comment data"""
comment_item = {
"comment_id": "comment123",
"note_id": "note456",
"content": "Great post!",
"user_id": "user789",
"nickname": "Commenter",
"like_count": 10
}
await excel_store.store_comment(comment_item)
# Verify data was written
assert excel_store.comments_sheet.max_row == 2 # Header + 1 data row
assert excel_store.comments_headers_written is True
@pytest.mark.asyncio
async def test_store_creator(self, excel_store):
"""Test storing creator data"""
creator_item = {
"user_id": "creator123",
"nickname": "Creator Name",
"fans": 10000,
"follows": 500,
"interaction": 50000
}
await excel_store.store_creator(creator_item)
# Verify data was written
assert excel_store.creators_sheet.max_row == 2 # Header + 1 data row
assert excel_store.creators_headers_written is True
@pytest.mark.asyncio
async def test_multiple_items(self, excel_store):
"""Test storing multiple items"""
# Store multiple content items
for i in range(5):
await excel_store.store_content({
"note_id": f"note{i}",
"title": f"Title {i}",
"liked_count": i * 10
})
# Verify all items were stored
assert excel_store.contents_sheet.max_row == 6 # Header + 5 data rows
def test_flush(self, excel_store):
"""Test flushing data to file"""
# Add some test data
asyncio.run(excel_store.store_content({
"note_id": "test",
"title": "Test"
}))
# Flush to file
excel_store.flush()
# Verify file was created
assert excel_store.filename.exists()
# Verify file can be opened
wb = openpyxl.load_workbook(excel_store.filename)
assert "Contents" in wb.sheetnames
wb.close()
def test_header_formatting(self, excel_store):
"""Test header row formatting"""
asyncio.run(excel_store.store_content({"note_id": "test", "title": "Test"}))
# Check header formatting
header_cell = excel_store.contents_sheet.cell(row=1, column=1)
assert header_cell.font.bold is True
# RGB color may have different prefix (00 or FF), check the actual color part
assert header_cell.fill.start_color.rgb[-6:] == "366092"
def test_empty_sheets_removed(self, excel_store):
"""Test that empty sheets are removed on flush"""
# Only add content, leave comments and creators empty
asyncio.run(excel_store.store_content({"note_id": "test"}))
excel_store.flush()
# Reload workbook
wb = openpyxl.load_workbook(excel_store.filename)
# Only Contents sheet should exist
assert "Contents" in wb.sheetnames
assert "Comments" not in wb.sheetnames
assert "Creators" not in wb.sheetnames
wb.close()
@pytest.mark.skipif(not EXCEL_AVAILABLE, reason="openpyxl not installed")
def test_excel_import_availability():
"""Test that openpyxl is available"""
assert EXCEL_AVAILABLE is True
import openpyxl
assert openpyxl is not None

View File

@@ -0,0 +1,75 @@
# -*- coding: utf-8 -*-
"""
Unit tests for Store Factory functionality
"""
import pytest
from unittest.mock import patch, MagicMock
from store.xhs import XhsStoreFactory
from store.xhs._store_impl import (
XhsCsvStoreImplement,
XhsJsonStoreImplement,
XhsDbStoreImplement,
XhsSqliteStoreImplement,
XhsMongoStoreImplement,
XhsExcelStoreImplement
)
class TestXhsStoreFactory:
"""Test cases for XhsStoreFactory"""
@patch('config.SAVE_DATA_OPTION', 'csv')
def test_create_csv_store(self):
"""Test creating CSV store"""
store = XhsStoreFactory.create_store()
assert isinstance(store, XhsCsvStoreImplement)
@patch('config.SAVE_DATA_OPTION', 'json')
def test_create_json_store(self):
"""Test creating JSON store"""
store = XhsStoreFactory.create_store()
assert isinstance(store, XhsJsonStoreImplement)
@patch('config.SAVE_DATA_OPTION', 'db')
def test_create_db_store(self):
"""Test creating database store"""
store = XhsStoreFactory.create_store()
assert isinstance(store, XhsDbStoreImplement)
@patch('config.SAVE_DATA_OPTION', 'sqlite')
def test_create_sqlite_store(self):
"""Test creating SQLite store"""
store = XhsStoreFactory.create_store()
assert isinstance(store, XhsSqliteStoreImplement)
@patch('config.SAVE_DATA_OPTION', 'mongodb')
def test_create_mongodb_store(self):
"""Test creating MongoDB store"""
store = XhsStoreFactory.create_store()
assert isinstance(store, XhsMongoStoreImplement)
@patch('config.SAVE_DATA_OPTION', 'excel')
def test_create_excel_store(self):
"""Test creating Excel store"""
# ContextVar cannot be mocked, so we test with actual value
store = XhsStoreFactory.create_store()
assert isinstance(store, XhsExcelStoreImplement)
@patch('config.SAVE_DATA_OPTION', 'invalid')
def test_invalid_store_option(self):
"""Test that invalid store option raises ValueError"""
with pytest.raises(ValueError) as exc_info:
XhsStoreFactory.create_store()
assert "Invalid save option" in str(exc_info.value)
def test_all_stores_registered(self):
"""Test that all store types are registered"""
expected_stores = ['csv', 'json', 'db', 'sqlite', 'mongodb', 'excel']
for store_type in expected_stores:
assert store_type in XhsStoreFactory.STORES
assert len(XhsStoreFactory.STORES) == len(expected_stores)