102 lines
4.0 KiB
Markdown
102 lines
4.0 KiB
Markdown
Project Specification: Object Detection Finetuning with Torchvision (Penn-Fudan Pedestrian Dataset)
|
|
|
|
1. Project Overview:
|
|
|
|
This project aims to fine-tune a Mask R-CNN model using PyTorch and Torchvision for pedestrian detection on the Penn-Fudan Pedestrian dataset.
|
|
Emphasis is placed on code quality, maintainability, and reproducibility, incorporating modern development tools like uv, ruff, and pre-commit hooks.
|
|
The project will follow a modular structure and utilize Python dictionaries for configuration.
|
|
2. Requirements:
|
|
|
|
Object Detection Task: Pedestrian detection using Mask R-CNN.
|
|
Dataset: Penn-Fudan Pedestrian dataset.
|
|
Python Version: 3.10.
|
|
Dependencies:
|
|
torch>=2.0 (or latest compatible with torchvision >= 0.16, CUDA support as needed)
|
|
torchvision>=0.16
|
|
ruff
|
|
numpy
|
|
Pillow
|
|
Development Environment: Local machine with NVIDIA drivers.
|
|
Package Management: uv.
|
|
Code Quality: ruff for linting and formatting (PEP 8 compliance, error checking, import sorting).
|
|
Testing: Unit tests using unittest.
|
|
Version Control: Git with a feature branch workflow.
|
|
Pre-commit Hooks: ruff and unittest execution.
|
|
Configuration: Python dictionaries (mmcv style).
|
|
Logging: File-based logging.
|
|
Command-Line Interface: Simple CLI using argparse.
|
|
3. Architecture and Design:
|
|
|
|
Modular Structure:
|
|
configs/: Configuration files.
|
|
data/: Dataset storage.
|
|
models/: Model definitions.
|
|
utils/: Data loading and preprocessing utilities.
|
|
train.py: Training script.
|
|
test.py: Testing script.
|
|
tests/: Unit tests.
|
|
pyproject.toml: Dependency management.
|
|
pre-commit-config.yaml: Pre-commit configuration.
|
|
README.md: Project documentation.
|
|
Configuration Files:
|
|
Python dictionaries for configurable parameters (e.g., training hyperparameters, model settings).
|
|
Output directories will be named based on the config file name.
|
|
Model: Mask R-CNN (provided by Torchvision).
|
|
Data Loading: Custom data loading utilities in utils/data_utils.py to handle the Penn-Fudan dataset.
|
|
4. Data Handling:
|
|
|
|
Dataset Download: wget to download the Penn-Fudan dataset during setup.
|
|
Data Storage: Dataset stored in data/penn_fudan_root/.
|
|
Data Preprocessing: Implemented in utils/data_utils.py to prepare the dataset for model training.
|
|
5. Error Handling:
|
|
|
|
Logging: Comprehensive logging of training and testing progress, errors, and warnings to files.
|
|
Exception Handling: Implement try-except blocks to handle potential errors during data loading, model training, and testing.
|
|
Input Validation: Validate input parameters from the command-line interface and configuration files.
|
|
6. Testing Plan:
|
|
|
|
Unit Tests:
|
|
Focus on testing individual components (e.g., data loading functions, model utilities).
|
|
Use unittest framework.
|
|
Tests located in tests/test_*.py.
|
|
Ensure adequate code coverage for critical components.
|
|
Pre-commit Hooks:
|
|
Run unit tests automatically before each commit.
|
|
Ensure all tests pass before allowing commits.
|
|
7. Development Tools:
|
|
|
|
uv: For managing the Python virtual environment and dependencies.
|
|
ruff: For linting, formatting, and code quality checks.
|
|
pre-commit: For managing pre-commit hooks (ruff, unittest).
|
|
Git: For version control (feature branch workflow).
|
|
8. Command-Line Interface (CLI):
|
|
|
|
train.py:
|
|
Accept configuration file as an argument.
|
|
Handle training process.
|
|
test.py:
|
|
Accept configuration file as an argument.
|
|
Handle model evaluation.
|
|
argparse: Used for parsing command-line arguments.
|
|
9. Implementation Details:
|
|
|
|
pyproject.toml:
|
|
Specify all project dependencies with appropriate versions.
|
|
Configure uv settings.
|
|
pre-commit-config.yaml:
|
|
Configure ruff and unittest hooks.
|
|
Ensure hooks are correctly set up and executed.
|
|
configs/:
|
|
Store all configuration files as python dictionaries.
|
|
README.md:
|
|
Provide project documentation, including setup instructions, usage examples, and development guidelines.
|
|
10. Next Steps for Developer:
|
|
|
|
Set up the project structure.
|
|
Create pyproject.toml and pre-commit-config.yaml files.
|
|
Implement data downloading and loading scripts.
|
|
Implement the Mask R-CNN model setup and training logic.
|
|
Implement the test script and unit tests.
|
|
Implement the command-line interface.
|
|
Add logging and error handling.
|
|
Document the project in README.md. |