Files
torchvision-vibecoding-project/todo.md
2025-04-12 09:35:18 +01:00

6.8 KiB

Project To-Do List

This list outlines the steps required to complete the Torchvision Finetuning project, derived from prompt_plan.md.

Phase 1: Foundation & Setup

  • Set up project structure (directories: configs, data, models, utils, tests, scripts).
  • Initialize Git repository.
  • Create .gitignore file (ignore data, outputs, logs, .venv, caches, *.pth).
  • Initialize pyproject.toml using uv init, set Python 3.10.
  • Add core dependencies (torch, torchvision, ruff, numpy, Pillow, pytest) using uv add.
  • Create pre-commit-config.yaml and configure ruff hooks (format, lint, import sort).
  • Create __init__.py files in necessary directories.
  • Create empty placeholder files (train.py, test.py, configs/base_config.py, utils/data_utils.py, models/detection.py, tests/conftest.py).
  • Create basic README.md.
  • Install pre-commit hooks (pre-commit install).
  • Create scripts/download_data.sh script.
    • Check if data exists.
    • Create data/ directory.
    • Use wget to download PennFudanPed dataset.
    • Use unzip to extract data.
    • Remove zip file after extraction.
    • Add informative print messages.
    • Make script executable (chmod +x).
  • Ensure .gitignore ignores data/.
  • Implement base configuration in configs/base_config.py (base_config dictionary).
  • Implement specific experiment configuration in configs/pennfudan_maskrcnn_config.py (config dictionary, importing/updating base config).

Phase 2: Data Handling & Model

  • Implement PennFudanDataset class in utils/data_utils.py.
    • __init__: Load image and mask paths.
    • __getitem__: Load image/mask, parse masks, generate targets (boxes, labels, masks, image_id, area, iscrowd), apply transforms.
    • __len__: Return dataset size.
  • Implement get_transform(train) function in utils/data_utils.py (using torchvision.transforms.v2).
  • Implement collate_fn(batch) function in utils/data_utils.py.
  • Implement get_maskrcnn_model(num_classes, ...) function in models/detection.py.
    • Load pre-trained Mask R-CNN (maskrcnn_resnet50_fpn_v2).
    • Replace box predictor head (FastRCNNPredictor).
    • Replace mask predictor head (MaskRCNNPredictor).

Phase 3: Training Script & Core Logic

  • Set up basic train.py structure.
    • Add imports.
    • Implement argparse for --config argument.
    • Implement dynamic config loading (importlib).
    • Set random seeds.
    • Determine compute device (cuda or cpu).
    • Create output directory structure (outputs/<config_name>/checkpoints).
    • Instantiate PennFudanDataset (train).
    • Instantiate DataLoader (train) using collate_fn.
    • Instantiate model using get_maskrcnn_model.
    • Move model to device.
    • Add if __name__ == "__main__": guard.
  • Implement minimal training step in train.py.
    • Instantiate optimizer (torch.optim.SGD).
    • Set model.train().
    • Fetch one batch.
    • Move data to device.
    • Perform forward pass (loss_dict = model(...)).
    • Calculate total loss (sum(...)).
    • Perform backward pass (optimizer.zero_grad(), loss.backward(), optimizer.step()).
    • Print/log loss for the single step (and temporarily exit).
  • Implement logging setup in utils/log_utils.py (setup_logging function).
    • Configure logging.basicConfig for file and console output.
  • Integrate logging into train.py.
    • Call setup_logging.
    • Replace print with logging.info.
    • Log config, device, and training progress/losses.
  • Implement full training loop in train.py.
    • Remove single-step exit.
    • Add LR scheduler (torch.optim.lr_scheduler.StepLR).
    • Add epoch loop.
    • Add batch loop, integrating the single training step logic.
    • Log loss periodically within the batch loop.
    • Step the LR scheduler at the end of each epoch.
    • Log total training time.
  • Implement checkpointing in train.py.
    • Define checkpoint directory.
    • Implement logic to find and load the latest checkpoint (resume training).
    • Save checkpoints periodically (based on frequency or final epoch).
      • Include epoch, model state, optimizer state, scheduler state, config.
    • Log checkpoint loading/saving.

Phase 4: Evaluation & Testing

  • Add evaluation dependencies (pycocotools - optional initially).
  • Create utils/eval_utils.py and implement evaluate function.
    • Set model.eval().
    • Use torch.no_grad().
    • Loop through validation/test dataloader.
    • Perform forward pass.
    • Calculate/aggregate metrics (start with average loss, potentially add mAP later).
    • Log evaluation metrics and time.
    • Return metrics.
  • Integrate evaluation into train.py.
    • Create validation Dataset and DataLoader (using torch.utils.data.Subset).
    • Call evaluate at the end of each epoch.
    • Log validation metrics.
    • (Later) Implement logic to save the best model based on validation metric.
  • Implement test.py script.
    • Reuse argument parsing, config loading, device setup, dataset/dataloader (test split), model creation from train.py.
    • Add --checkpoint argument.
    • Load model weights from the specified checkpoint.
    • Call evaluate function using the test dataloader.
    • Log/print final evaluation results.
    • Setup logging for testing (e.g., test.log).
  • Create unit tests in tests/ using pytest.
    • tests/test_config.py: Test config loading.
    • tests/test_model.py: Test model creation and head configuration.
    • tests/test_data_utils.py: Test dataset instantiation, length, and item format (requires data).
    • (Optional) Use fixtures in tests/conftest.py if needed.
  • Add pytest execution to pre-commit-config.yaml.
  • Test pre-commit hooks (pre-commit run --all-files).

Phase 5: Refinement & Documentation

  • Refine error handling in train.py and test.py (try...except).
  • Add configuration validation checks.
  • Improve evaluation metrics (e.g., implement mAP in evaluate function).
  • Add more data augmentations to get_transform(train=True).
  • Expand README.md significantly.
    • Goals
    • Detailed Setup
    • Configuration explanation
    • Training instructions (including resuming)
    • Testing instructions
    • Project Structure overview
    • Dependencies list
    • (Optional) Results section
  • Perform final code quality checks (ruff format ., ruff check . --fix).
  • Ensure all pre-commit hooks pass.