Project To-Do List

This list outlines the steps required to complete the Torchvision Finetuning project, derived from prompt_plan.md.

Phase 1: Foundation & Setup

Set up project structure (directories: configs, data, models, utils, tests, scripts).
Initialize Git repository.
Create .gitignore file (ignore data, outputs, logs, .venv, caches, *.pth).
Initialize pyproject.toml using uv init, set Python 3.10.
Add core dependencies (torch, torchvision, ruff, numpy, Pillow, pytest) using uv add.
Create pre-commit-config.yaml and configure ruff hooks (format, lint, import sort).
Create __init__.py files in necessary directories.
Create empty placeholder files (train.py, test.py, configs/base_config.py, utils/data_utils.py, models/detection.py, tests/conftest.py).
Create basic README.md.
Install pre-commit hooks (pre-commit install).
Create scripts/download_data.sh script.
- Check if data exists.
- Create data/ directory.
- Use wget to download PennFudanPed dataset.
- Use unzip to extract data.
- Remove zip file after extraction.
- Add informative print messages.
- Make script executable (chmod +x).
Ensure .gitignore ignores data/.
Implement base configuration in configs/base_config.py (base_config dictionary).
Implement specific experiment configuration in configs/pennfudan_maskrcnn_config.py (config dictionary, importing/updating base config).

Phase 2: Data Handling & Model

Implement PennFudanDataset class in utils/data_utils.py.
- __init__: Load image and mask paths.
- __getitem__: Load image/mask, parse masks, generate targets (boxes, labels, masks, image_id, area, iscrowd), apply transforms.
- __len__: Return dataset size.
Implement get_transform(train) function in utils/data_utils.py (using torchvision.transforms.v2).
Implement collate_fn(batch) function in utils/data_utils.py.
Implement get_maskrcnn_model(num_classes, ...) function in models/detection.py.
- Load pre-trained Mask R-CNN (maskrcnn_resnet50_fpn_v2).
- Replace box predictor head (FastRCNNPredictor).
- Replace mask predictor head (MaskRCNNPredictor).

Phase 3: Training Script & Core Logic

Set up basic train.py structure.
- Add imports.
- Implement argparse for --config argument.
- Implement dynamic config loading (importlib).
- Set random seeds.
- Determine compute device (cuda or cpu).
- Create output directory structure (outputs/<config_name>/checkpoints).
- Instantiate PennFudanDataset (train).
- Instantiate DataLoader (train) using collate_fn.
- Instantiate model using get_maskrcnn_model.
- Move model to device.
- Add if __name__ == "__main__": guard.
Implement minimal training step in train.py.
- Instantiate optimizer (torch.optim.SGD).
- Set model.train().
- Fetch one batch.
- Move data to device.
- Perform forward pass (loss_dict = model(...)).
- Calculate total loss (sum(...)).
- Perform backward pass (optimizer.zero_grad(), loss.backward(), optimizer.step()).
- Print/log loss for the single step (and temporarily exit).
Implement logging setup in utils/log_utils.py (setup_logging function).
- Configure logging.basicConfig for file and console output.
Integrate logging into train.py.
- Call setup_logging.
- Replace print with logging.info.
- Log config, device, and training progress/losses.
Implement full training loop in train.py.
- Remove single-step exit.
- Add LR scheduler (torch.optim.lr_scheduler.StepLR).
- Add epoch loop.
- Add batch loop, integrating the single training step logic.
- Log loss periodically within the batch loop.
- Step the LR scheduler at the end of each epoch.
- Log total training time.
Implement checkpointing in train.py.
- Define checkpoint directory.
- Implement logic to find and load the latest checkpoint (resume training).
- Save checkpoints periodically (based on frequency or final epoch).
  - Include epoch, model state, optimizer state, scheduler state, config.
- Log checkpoint loading/saving.

Phase 4: Evaluation & Testing

Add evaluation dependencies (pycocotools - optional initially).
Create utils/eval_utils.py and implement evaluate function.
- Set model.eval().
- Use torch.no_grad().
- Loop through validation/test dataloader.
- Perform forward pass.
- Calculate/aggregate metrics (start with average loss, potentially add mAP later).
- Log evaluation metrics and time.
- Return metrics.
Integrate evaluation into train.py.
- Create validation Dataset and DataLoader (using torch.utils.data.Subset).
- Call evaluate at the end of each epoch.
- Log validation metrics.
- (Later) Implement logic to save the best model based on validation metric.
Implement test.py script.
- Reuse argument parsing, config loading, device setup, dataset/dataloader (test split), model creation from train.py.
- Add --checkpoint argument.
- Load model weights from the specified checkpoint.
- Call evaluate function using the test dataloader.
- Log/print final evaluation results.
- Setup logging for testing (e.g., test.log).
Create unit tests in tests/ using pytest.
- tests/test_config.py: Test config loading.
- tests/test_model.py: Test model creation and head configuration.
- tests/test_data_utils.py: Test dataset instantiation, length, and item format (requires data).
- (Optional) Use fixtures in tests/conftest.py if needed.
Add pytest execution to pre-commit-config.yaml.
Test pre-commit hooks (pre-commit run --all-files).

Phase 5: Refinement & Documentation

Refine error handling in train.py and test.py (try...except).
Add configuration validation checks.
Improve evaluation metrics (e.g., implement mAP in evaluate function).
Add more data augmentations to get_transform(train=True).
Expand README.md significantly.
- Goals
- Detailed Setup
- Configuration explanation
- Training instructions (including resuming)
- Testing instructions
- Project Structure overview
- Dependencies list
- (Optional) Results section
Perform final code quality checks (ruff format ., ruff check . --fix).
Ensure all pre-commit hooks pass.

6.8 KiB Raw Blame History

Project To-Do List

Phase 1: Foundation & Setup

Phase 2: Data Handling & Model

Phase 3: Training Script & Core Logic

Phase 4: Evaluation & Testing

Phase 5: Refinement & Documentation

6.8 KiB

Raw Blame History