Create eval loop and use full train dataset

2025-04-12 10:55:10 +01:00
parent e9b97ac2b5
commit 0f3a96ca81
3 changed files with 266 additions and 50 deletions
--- a/todo.md
+++ b/todo.md
@@ -60,36 +60,36 @@ This list outlines the steps required to complete the Torchvision Finetuning pro
    - [x] Call `setup_logging`.
    - [x] Replace `print` with `logging.info`.
    - [x] Log config, device, and training progress/losses.
- [ ] Implement full training loop in `train.py`.
-    - [ ] Remove single-step exit.
-    - [ ] Add LR scheduler (`torch.optim.lr_scheduler.StepLR`).
-    - [ ] Add epoch loop.
-    - [ ] Add batch loop, integrating the single training step logic.
-    - [ ] Log loss periodically within the batch loop.
-    - [ ] Step the LR scheduler at the end of each epoch.
-    - [ ] Log total training time.
- [ ] Implement checkpointing in `train.py`.
-    - [ ] Define checkpoint directory.
-    - [ ] Implement logic to find and load the latest checkpoint (resume training).
-    - [ ] Save checkpoints periodically (based on frequency or final epoch).
-        - [ ] Include epoch, model state, optimizer state, scheduler state, config.
-    - [ ] Log checkpoint loading/saving.
+- [x] Implement full training loop in `train.py`.
+    - [x] Remove single-step exit.
+    - [x] Add LR scheduler (`torch.optim.lr_scheduler.StepLR`).
+    - [x] Add epoch loop.
+    - [x] Add batch loop, integrating the single training step logic.
+    - [x] Log loss periodically within the batch loop.
+    - [x] Step the LR scheduler at the end of each epoch.
+    - [x] Log total training time.
+- [x] Implement checkpointing in `train.py`.
+    - [x] Define checkpoint directory.
+    - [x] Implement logic to find and load the latest checkpoint (resume training).
+    - [x] Save checkpoints periodically (based on frequency or final epoch).
+        - [x] Include epoch, model state, optimizer state, scheduler state, config.
+    - [x] Log checkpoint loading/saving.

 ## Phase 4: Evaluation & Testing

 - [ ] Add evaluation dependencies (`pycocotools` - optional initially).
- [ ] Create `utils/eval_utils.py` and implement `evaluate` function.
-    - [ ] Set `model.eval()`.
-    - [ ] Use `torch.no_grad()`.
-    - [ ] Loop through validation/test dataloader.
-    - [ ] Perform forward pass.
-    - [ ] Calculate/aggregate metrics (start with average loss, potentially add mAP later).
-    - [ ] Log evaluation metrics and time.
-    - [ ] Return metrics.
- [ ] Integrate evaluation into `train.py`.
-    - [ ] Create validation `Dataset` and `DataLoader` (using `torch.utils.data.Subset`).
-    - [ ] Call `evaluate` at the end of each epoch.
-    - [ ] Log validation metrics.
+- [x] Create `utils/eval_utils.py` and implement `evaluate` function.
+    - [x] Set `model.eval()`.
+    - [x] Use `torch.no_grad()`.
+    - [x] Loop through validation/test dataloader.
+    - [x] Perform forward pass.
+    - [x] Calculate/aggregate metrics (start with average loss, potentially add mAP later).
+    - [x] Log evaluation metrics and time.
+    - [x] Return metrics.
+- [x] Integrate evaluation into `train.py`.
+    - [x] Create validation `Dataset` and `DataLoader` (using `torch.utils.data.Subset`).
+    - [x] Call `evaluate` at the end of each epoch.
+    - [x] Log validation metrics.
    - [ ] (Later) Implement logic to save the *best* model based on validation metric.
 - [ ] Implement `test.py` script.
    - [ ] Reuse argument parsing, config loading, device setup, dataset/dataloader (test split), model creation from `train.py`.