4.0 KiB
Project Specification: Object Detection Finetuning with Torchvision (Penn-Fudan Pedestrian Dataset)
- Project Overview:
This project aims to fine-tune a Mask R-CNN model using PyTorch and Torchvision for pedestrian detection on the Penn-Fudan Pedestrian dataset. Emphasis is placed on code quality, maintainability, and reproducibility, incorporating modern development tools like uv, ruff, and pre-commit hooks. The project will follow a modular structure and utilize Python dictionaries for configuration. 2. Requirements:
Object Detection Task: Pedestrian detection using Mask R-CNN. Dataset: Penn-Fudan Pedestrian dataset. Python Version: 3.10. Dependencies: torch>=2.0 (or latest compatible with torchvision >= 0.16, CUDA support as needed) torchvision>=0.16 ruff numpy Pillow Development Environment: Local machine with NVIDIA drivers. Package Management: uv. Code Quality: ruff for linting and formatting (PEP 8 compliance, error checking, import sorting). Testing: Unit tests using unittest. Version Control: Git with a feature branch workflow. Pre-commit Hooks: ruff and unittest execution. Configuration: Python dictionaries (mmcv style). Logging: File-based logging. Command-Line Interface: Simple CLI using argparse. 3. Architecture and Design:
Modular Structure: configs/: Configuration files. data/: Dataset storage. models/: Model definitions. utils/: Data loading and preprocessing utilities. train.py: Training script. test.py: Testing script. tests/: Unit tests. pyproject.toml: Dependency management. pre-commit-config.yaml: Pre-commit configuration. README.md: Project documentation. Configuration Files: Python dictionaries for configurable parameters (e.g., training hyperparameters, model settings). Output directories will be named based on the config file name. Model: Mask R-CNN (provided by Torchvision). Data Loading: Custom data loading utilities in utils/data_utils.py to handle the Penn-Fudan dataset. 4. Data Handling:
Dataset Download: wget to download the Penn-Fudan dataset during setup. Data Storage: Dataset stored in data/penn_fudan_root/. Data Preprocessing: Implemented in utils/data_utils.py to prepare the dataset for model training. 5. Error Handling:
Logging: Comprehensive logging of training and testing progress, errors, and warnings to files. Exception Handling: Implement try-except blocks to handle potential errors during data loading, model training, and testing. Input Validation: Validate input parameters from the command-line interface and configuration files. 6. Testing Plan:
Unit Tests: Focus on testing individual components (e.g., data loading functions, model utilities). Use unittest framework. Tests located in tests/test_*.py. Ensure adequate code coverage for critical components. Pre-commit Hooks: Run unit tests automatically before each commit. Ensure all tests pass before allowing commits. 7. Development Tools:
uv: For managing the Python virtual environment and dependencies. ruff: For linting, formatting, and code quality checks. pre-commit: For managing pre-commit hooks (ruff, unittest). Git: For version control (feature branch workflow). 8. Command-Line Interface (CLI):
train.py: Accept configuration file as an argument. Handle training process. test.py: Accept configuration file as an argument. Handle model evaluation. argparse: Used for parsing command-line arguments. 9. Implementation Details:
pyproject.toml: Specify all project dependencies with appropriate versions. Configure uv settings. pre-commit-config.yaml: Configure ruff and unittest hooks. Ensure hooks are correctly set up and executed. configs/: Store all configuration files as python dictionaries. README.md: Provide project documentation, including setup instructions, usage examples, and development guidelines. 10. Next Steps for Developer:
Set up the project structure. Create pyproject.toml and pre-commit-config.yaml files. Implement data downloading and loading scripts. Implement the Mask R-CNN model setup and training logic. Implement the test script and unit tests. Implement the command-line interface. Add logging and error handling. Document the project in README.md.