torchvision-vibecoding-project/project-spec.md

Project Specification: Object Detection Finetuning with Torchvision (Penn-Fudan Pedestrian Dataset)

1. Project Overview:

This project aims to fine-tune a Mask R-CNN model using PyTorch and Torchvision for pedestrian detection on the Penn-Fudan Pedestrian dataset.
Emphasis is placed on code quality, maintainability, and reproducibility, incorporating modern development tools like uv, ruff, and pre-commit hooks.
The project will follow a modular structure and utilize Python dictionaries for configuration.
2. Requirements:

Object Detection Task: Pedestrian detection using Mask R-CNN.
Dataset: Penn-Fudan Pedestrian dataset.
Python Version: 3.10.
Dependencies:
torch>=2.0 (or latest compatible with torchvision >= 0.16, CUDA support as needed)
torchvision>=0.16
ruff
numpy
Pillow
Development Environment: Local machine with NVIDIA drivers.
Package Management: uv.
Code Quality: ruff for linting and formatting (PEP 8 compliance, error checking, import sorting).
Testing: Unit tests using unittest.
Version Control: Git with a feature branch workflow.
Pre-commit Hooks: ruff and unittest execution.
Configuration: Python dictionaries (mmcv style).
Logging: File-based logging.
Command-Line Interface: Simple CLI using argparse.
3. Architecture and Design:

Modular Structure:
configs/: Configuration files.
data/: Dataset storage.
models/: Model definitions.
utils/: Data loading and preprocessing utilities.
train.py: Training script.
test.py: Testing script.
tests/: Unit tests.
pyproject.toml: Dependency management.
pre-commit-config.yaml: Pre-commit configuration.
README.md: Project documentation.
Configuration Files:
Python dictionaries for configurable parameters (e.g., training hyperparameters, model settings).
Output directories will be named based on the config file name.
Model: Mask R-CNN (provided by Torchvision).
Data Loading: Custom data loading utilities in utils/data_utils.py to handle the Penn-Fudan dataset.
4. Data Handling:

Dataset Download: wget to download the Penn-Fudan dataset during setup.
Data Storage: Dataset stored in data/penn_fudan_root/.
Data Preprocessing: Implemented in utils/data_utils.py to prepare the dataset for model training.
5. Error Handling:

Logging: Comprehensive logging of training and testing progress, errors, and warnings to files.
Exception Handling: Implement try-except blocks to handle potential errors during data loading, model training, and testing.
Input Validation: Validate input parameters from the command-line interface and configuration files.
6. Testing Plan:

Unit Tests:
Focus on testing individual components (e.g., data loading functions, model utilities).
Use unittest framework.
Tests located in tests/test_*.py.
Ensure adequate code coverage for critical components.
Pre-commit Hooks:
Run unit tests automatically before each commit.
Ensure all tests pass before allowing commits.
7. Development Tools:

uv: For managing the Python virtual environment and dependencies.
ruff: For linting, formatting, and code quality checks.
pre-commit: For managing pre-commit hooks (ruff, unittest).
Git: For version control (feature branch workflow).
8. Command-Line Interface (CLI):

train.py:
Accept configuration file as an argument.
Handle training process.
test.py:
Accept configuration file as an argument.
Handle model evaluation.
argparse: Used for parsing command-line arguments.
9. Implementation Details:

pyproject.toml:
Specify all project dependencies with appropriate versions.
Configure uv settings.
pre-commit-config.yaml:
Configure ruff and unittest hooks.
Ensure hooks are correctly set up and executed.
configs/:
Store all configuration files as python dictionaries.
README.md:
Provide project documentation, including setup instructions, usage examples, and development guidelines.
10. Next Steps for Developer:

Set up the project structure.
Create pyproject.toml and pre-commit-config.yaml files.
Implement data downloading and loading scripts.
Implement the Mask R-CNN model setup and training logic.
Implement the test script and unit tests.
Implement the command-line interface.
Add logging and error handling.
Document the project in README.md.