125 lines
6.2 KiB
Markdown
125 lines
6.2 KiB
Markdown
# Project To-Do List
|
|
|
|
This list outlines the steps required to complete the Torchvision Finetuning project, derived from `prompt_plan.md`.
|
|
|
|
## Phase 1: Foundation & Setup
|
|
|
|
- [x] Initialize project structure (`configs`, `data`, `models`, `utils`, `tests`, `scripts`)
|
|
- [x] Initialize git repository
|
|
- [x] Configure `.gitignore`
|
|
- [x] Set up `pyproject.toml` with `uv`
|
|
- [x] Add dependencies (`torch`, `torchvision` with CUDA 12.4, `ruff`, `numpy`, `Pillow`, `pytest`, `pre-commit`)
|
|
- [x] Configure `pre-commit` with `ruff` (formatting, linting)
|
|
- [x] Create empty `__init__.py` files
|
|
- [x] Create placeholder files (`train.py`, `test.py`, `configs/base_config.py`, etc.)
|
|
- [x] Create basic `README.md`
|
|
- [x] Install pre-commit hooks
|
|
- [x] Verify PyTorch GPU integration (`scripts/check_gpu.py`)
|
|
- [x] Create data download script (`scripts/download_data.sh`)
|
|
- [x] Implement configuration system (`configs/base_config.py`, `configs/pennfudan_maskrcnn_config.py`)
|
|
|
|
## Phase 2: Data Handling & Model
|
|
|
|
- [x] Implement `PennFudanDataset` class in `utils/data_utils.py`.
|
|
- [x] `__init__`: Load image and mask paths.
|
|
- [x] `__getitem__`: Load image/mask, parse masks, generate targets (boxes, labels, masks, image_id, area, iscrowd), apply transforms.
|
|
- [x] `__len__`: Return dataset size.
|
|
- [x] Implement `get_transform(train)` function in `utils/data_utils.py` (using `torchvision.transforms.v2`).
|
|
- [x] Implement `collate_fn(batch)` function in `utils/data_utils.py`.
|
|
- [x] Implement `get_maskrcnn_model(num_classes, ...)` function in `models/detection.py`.
|
|
- [x] Load pre-trained Mask R-CNN (`maskrcnn_resnet50_fpn_v2`).
|
|
- [x] Replace box predictor head (`FastRCNNPredictor`).
|
|
- [x] Replace mask predictor head (`MaskRCNNPredictor`).
|
|
|
|
## Phase 3: Training Script & Core Logic
|
|
|
|
- [x] Set up basic `train.py` structure.
|
|
- [x] Add imports.
|
|
- [x] Implement `argparse` for `--config` argument.
|
|
- [x] Implement dynamic config loading (`importlib`).
|
|
- [x] Set random seeds.
|
|
- [x] Determine compute device (`cuda` or `cpu`).
|
|
- [x] Create output directory structure (`outputs/<config_name>/checkpoints`).
|
|
- [x] Instantiate `PennFudanDataset` (train).
|
|
- [x] Instantiate `DataLoader` (train) using `collate_fn`.
|
|
- [x] Instantiate model using `get_maskrcnn_model`.
|
|
- [x] Move model to device.
|
|
- [x] Add `if __name__ == "__main__":` guard.
|
|
- [x] Implement minimal training step in `train.py`.
|
|
- [x] Instantiate optimizer (`torch.optim.SGD`).
|
|
- [x] Set `model.train()`.
|
|
- [x] Fetch one batch.
|
|
- [x] Move data to device.
|
|
- [x] Perform forward pass (`loss_dict = model(...)`).
|
|
- [x] Calculate total loss (`sum(...)`).
|
|
- [x] Perform backward pass (`optimizer.zero_grad()`, `loss.backward()`, `optimizer.step()`)
|
|
- [x] Print/log loss for the single step (and temporarily exit).
|
|
- [x] Implement logging setup in `utils/log_utils.py` (`setup_logging` function).
|
|
- [x] Configure `logging.basicConfig` for file and console output.
|
|
- [x] Integrate logging into `train.py`.
|
|
- [x] Call `setup_logging`.
|
|
- [x] Replace `print` with `logging.info`.
|
|
- [x] Log config, device, and training progress/losses.
|
|
- [x] Implement full training loop in `train.py`.
|
|
- [x] Remove single-step exit.
|
|
- [x] Add LR scheduler (`torch.optim.lr_scheduler.StepLR`).
|
|
- [x] Add epoch loop.
|
|
- [x] Add batch loop, integrating the single training step logic.
|
|
- [x] Log loss periodically within the batch loop.
|
|
- [x] Step the LR scheduler at the end of each epoch.
|
|
- [x] Log total training time.
|
|
- [x] Implement checkpointing in `train.py`.
|
|
- [x] Define checkpoint directory.
|
|
- [x] Implement logic to find and load the latest checkpoint (resume training).
|
|
- [x] Save checkpoints periodically (based on frequency or final epoch).
|
|
- [x] Include epoch, model state, optimizer state, scheduler state, config.
|
|
- [x] Log checkpoint loading/saving.
|
|
|
|
## Phase 4: Evaluation & Testing
|
|
|
|
- [ ] Add evaluation dependencies (`pycocotools` - optional initially).
|
|
- [x] Create `utils/eval_utils.py` and implement `evaluate` function.
|
|
- [x] Set `model.eval()`.
|
|
- [x] Use `torch.no_grad()`.
|
|
- [x] Loop through validation/test dataloader.
|
|
- [x] Perform forward pass.
|
|
- [x] Calculate/aggregate metrics (start with average loss, potentially add mAP later).
|
|
- [x] Log evaluation metrics and time.
|
|
- [x] Return metrics.
|
|
- [x] Integrate evaluation into `train.py`.
|
|
- [x] Create validation `Dataset` and `DataLoader` (using `torch.utils.data.Subset`).
|
|
- [x] Call `evaluate` at the end of each epoch.
|
|
- [x] Log validation metrics.
|
|
- [ ] (Later) Implement logic to save the *best* model based on validation metric.
|
|
- [ ] Implement `test.py` script.
|
|
- [ ] Reuse argument parsing, config loading, device setup, dataset/dataloader (test split), model creation from `train.py`.
|
|
- [ ] Add `--checkpoint` argument.
|
|
- [ ] Load model weights from the specified checkpoint.
|
|
- [ ] Call `evaluate` function using the test dataloader.
|
|
- [ ] Log/print final evaluation results.
|
|
- [ ] Setup logging for testing (e.g., `test.log`).
|
|
- [ ] Create unit tests in `tests/` using `pytest`.
|
|
- [ ] `tests/test_config.py`: Test config loading.
|
|
- [ ] `tests/test_model.py`: Test model creation and head configuration.
|
|
- [ ] `tests/test_data_utils.py`: Test dataset instantiation, length, and item format (requires data).
|
|
- [ ] (Optional) Use fixtures in `tests/conftest.py` if needed.
|
|
- [ ] Add `pytest` execution to `pre-commit-config.yaml`.
|
|
- [ ] Test pre-commit hooks (`pre-commit run --all-files`).
|
|
|
|
## Phase 5: Refinement & Documentation
|
|
|
|
- [ ] Refine error handling in `train.py` and `test.py` (`try...except`).
|
|
- [ ] Add configuration validation checks.
|
|
- [ ] Improve evaluation metrics (e.g., implement mAP in `evaluate` function).
|
|
- [ ] Add more data augmentations to `get_transform(train=True)`.
|
|
- [ ] Expand `README.md` significantly.
|
|
- [ ] Goals
|
|
- [ ] Detailed Setup
|
|
- [ ] Configuration explanation
|
|
- [ ] Training instructions (including resuming)
|
|
- [ ] Testing instructions
|
|
- [ ] Project Structure overview
|
|
- [ ] Dependencies list
|
|
- [ ] (Optional) Results section
|
|
- [ ] Perform final code quality checks (`ruff format .`, `ruff check . --fix`).
|
|
- [ ] Ensure all pre-commit hooks pass. |