Created project spec and plan using LLMs
This commit is contained in:
102
project-spec.md
Normal file
102
project-spec.md
Normal file
@@ -0,0 +1,102 @@
|
||||
Project Specification: Object Detection Finetuning with Torchvision (Penn-Fudan Pedestrian Dataset)
|
||||
|
||||
1. Project Overview:
|
||||
|
||||
This project aims to fine-tune a Mask R-CNN model using PyTorch and Torchvision for pedestrian detection on the Penn-Fudan Pedestrian dataset.
|
||||
Emphasis is placed on code quality, maintainability, and reproducibility, incorporating modern development tools like uv, ruff, and pre-commit hooks.
|
||||
The project will follow a modular structure and utilize Python dictionaries for configuration.
|
||||
2. Requirements:
|
||||
|
||||
Object Detection Task: Pedestrian detection using Mask R-CNN.
|
||||
Dataset: Penn-Fudan Pedestrian dataset.
|
||||
Python Version: 3.10.
|
||||
Dependencies:
|
||||
torch>=2.0 (or latest compatible with torchvision >= 0.16, CUDA support as needed)
|
||||
torchvision>=0.16
|
||||
ruff
|
||||
numpy
|
||||
Pillow
|
||||
Development Environment: Local machine with NVIDIA drivers.
|
||||
Package Management: uv.
|
||||
Code Quality: ruff for linting and formatting (PEP 8 compliance, error checking, import sorting).
|
||||
Testing: Unit tests using unittest.
|
||||
Version Control: Git with a feature branch workflow.
|
||||
Pre-commit Hooks: ruff and unittest execution.
|
||||
Configuration: Python dictionaries (mmcv style).
|
||||
Logging: File-based logging.
|
||||
Command-Line Interface: Simple CLI using argparse.
|
||||
3. Architecture and Design:
|
||||
|
||||
Modular Structure:
|
||||
configs/: Configuration files.
|
||||
data/: Dataset storage.
|
||||
models/: Model definitions.
|
||||
utils/: Data loading and preprocessing utilities.
|
||||
train.py: Training script.
|
||||
test.py: Testing script.
|
||||
tests/: Unit tests.
|
||||
pyproject.toml: Dependency management.
|
||||
pre-commit-config.yaml: Pre-commit configuration.
|
||||
README.md: Project documentation.
|
||||
Configuration Files:
|
||||
Python dictionaries for configurable parameters (e.g., training hyperparameters, model settings).
|
||||
Output directories will be named based on the config file name.
|
||||
Model: Mask R-CNN (provided by Torchvision).
|
||||
Data Loading: Custom data loading utilities in utils/data_utils.py to handle the Penn-Fudan dataset.
|
||||
4. Data Handling:
|
||||
|
||||
Dataset Download: wget to download the Penn-Fudan dataset during setup.
|
||||
Data Storage: Dataset stored in data/penn_fudan_root/.
|
||||
Data Preprocessing: Implemented in utils/data_utils.py to prepare the dataset for model training.
|
||||
5. Error Handling:
|
||||
|
||||
Logging: Comprehensive logging of training and testing progress, errors, and warnings to files.
|
||||
Exception Handling: Implement try-except blocks to handle potential errors during data loading, model training, and testing.
|
||||
Input Validation: Validate input parameters from the command-line interface and configuration files.
|
||||
6. Testing Plan:
|
||||
|
||||
Unit Tests:
|
||||
Focus on testing individual components (e.g., data loading functions, model utilities).
|
||||
Use unittest framework.
|
||||
Tests located in tests/test_*.py.
|
||||
Ensure adequate code coverage for critical components.
|
||||
Pre-commit Hooks:
|
||||
Run unit tests automatically before each commit.
|
||||
Ensure all tests pass before allowing commits.
|
||||
7. Development Tools:
|
||||
|
||||
uv: For managing the Python virtual environment and dependencies.
|
||||
ruff: For linting, formatting, and code quality checks.
|
||||
pre-commit: For managing pre-commit hooks (ruff, unittest).
|
||||
Git: For version control (feature branch workflow).
|
||||
8. Command-Line Interface (CLI):
|
||||
|
||||
train.py:
|
||||
Accept configuration file as an argument.
|
||||
Handle training process.
|
||||
test.py:
|
||||
Accept configuration file as an argument.
|
||||
Handle model evaluation.
|
||||
argparse: Used for parsing command-line arguments.
|
||||
9. Implementation Details:
|
||||
|
||||
pyproject.toml:
|
||||
Specify all project dependencies with appropriate versions.
|
||||
Configure uv settings.
|
||||
pre-commit-config.yaml:
|
||||
Configure ruff and unittest hooks.
|
||||
Ensure hooks are correctly set up and executed.
|
||||
configs/:
|
||||
Store all configuration files as python dictionaries.
|
||||
README.md:
|
||||
Provide project documentation, including setup instructions, usage examples, and development guidelines.
|
||||
10. Next Steps for Developer:
|
||||
|
||||
Set up the project structure.
|
||||
Create pyproject.toml and pre-commit-config.yaml files.
|
||||
Implement data downloading and loading scripts.
|
||||
Implement the Mask R-CNN model setup and training logic.
|
||||
Implement the test script and unit tests.
|
||||
Implement the command-line interface.
|
||||
Add logging and error handling.
|
||||
Document the project in README.md.
|
||||
406
prompt_plan.md
Normal file
406
prompt_plan.md
Normal file
@@ -0,0 +1,406 @@
|
||||
# LLM Prompts for Torchvision Finetuning Project
|
||||
|
||||
## Prompt 1: Project Foundation Setup
|
||||
|
||||
```text
|
||||
Based on the project specification (`project-spec.md`), set up the initial project structure and tooling.
|
||||
|
||||
1. Create the following directory structure:
|
||||
```
|
||||
torchvision-tutorial/
|
||||
├── configs/
|
||||
├── data/
|
||||
├── models/
|
||||
├── utils/
|
||||
├── tests/
|
||||
├── scripts/
|
||||
├── .git/ # (Initialize git)
|
||||
├── .gitignore
|
||||
├── pyproject.toml
|
||||
├── pre-commit-config.yaml
|
||||
├── README.md
|
||||
├── train.py # Empty file
|
||||
└── test.py # Empty file
|
||||
```
|
||||
2. Initialize a git repository in the `torchvision-tutorial` directory.
|
||||
3. Create a `.gitignore` file suitable for a Python project, ignoring directories like `data/`, `outputs/`, `logs/`, virtual environment folders (`.venv`), cache files (`__pycache__/`, `.pytest_cache/`, `.ruff_cache/`), and model checkpoints (`*.pth`).
|
||||
4. Use `uv init` (or manually create) `pyproject.toml`. Specify Python 3.10. Add the following dependencies using `uv add`: `torch>=2.0`, `torchvision>=0.16`, `ruff`, `numpy`, `Pillow`, `pytest`.
|
||||
5. Create `pre-commit-config.yaml`. Configure `ruff` for formatting (`ruff format`) and linting (`ruff check --select I --fix` for import sorting, and `ruff check --fix` for general linting).
|
||||
6. Create empty `__init__.py` files in `configs/`, `models/`, `utils/`, and `tests/`.
|
||||
7. Create empty placeholder files: `train.py`, `test.py`, `configs/base_config.py`, `utils/data_utils.py`, `models/detection.py`, `tests/conftest.py`.
|
||||
8. Create a basic `README.md` with the project title and a brief description based on `project-spec.md`.
|
||||
9. Install pre-commit hooks (`pre-commit install`).
|
||||
```
|
||||
|
||||
## Prompt 2: Data Acquisition Script
|
||||
|
||||
```text
|
||||
Create a shell script `scripts/download_data.sh` that performs the following:
|
||||
1. Checks if the target directory `data/PennFudanPed` already exists. If it does, print a message and exit.
|
||||
2. Creates the `data/` directory if it doesn't exist.
|
||||
3. Uses `wget` to download the Penn-Fudan dataset zip file (`PennFudanPed.zip`) from the specified URL (`https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip`) into the `data/` directory.
|
||||
4. Uses `unzip` to extract the contents of `PennFudanPed.zip` into the `data/` directory.
|
||||
5. Removes the downloaded `PennFudanPed.zip` file after successful extraction.
|
||||
6. Prints informative messages during download and extraction.
|
||||
7. Make the script executable (`chmod +x`).
|
||||
Ensure the `.gitignore` file correctly ignores the `data/` directory.
|
||||
```
|
||||
|
||||
## Prompt 3: Configuration System
|
||||
|
||||
```text
|
||||
Implement the configuration system using Python dictionaries:
|
||||
|
||||
1. In `configs/base_config.py`, define a Python dictionary named `base_config` containing placeholders or default values for common parameters:
|
||||
* `data_root`: Path to the dataset root (e.g., `'data/PennFudanPed'`)
|
||||
* `output_dir`: Base directory for outputs (e.g., `'outputs'`)
|
||||
* `device`: Compute device (e.g., `'cuda'`)
|
||||
* `num_classes`: Number of classes including background (e.g., `2` for Penn-Fudan)
|
||||
* `batch_size`: Training batch size (e.g., `2`)
|
||||
* `num_epochs`: Number of training epochs (e.g., `10`)
|
||||
* `lr`: Learning rate (e.g., `0.005`)
|
||||
* `momentum`: Optimizer momentum (e.g., `0.9`)
|
||||
* `weight_decay`: Optimizer weight decay (e.g., `0.0005`)
|
||||
* `lr_step_size`: Learning rate scheduler step size (e.g., `3`)
|
||||
* `lr_gamma`: Learning rate scheduler gamma (e.g., `0.1`)
|
||||
* `seed`: Random seed (e.g., `42`)
|
||||
* `log_freq`: Logging frequency during training (e.g., `10`)
|
||||
* `checkpoint_freq`: Checkpoint saving frequency (e.g., `1`)
|
||||
|
||||
2. In `configs/pennfudan_maskrcnn_config.py`, create a dictionary named `config`.
|
||||
* Import the `base_config` from `configs.base_config`.
|
||||
* Create the `config` dictionary, potentially starting with a copy of `base_config` (`config = base_config.copy()`).
|
||||
* Update specific values as needed for this experiment (e.g., ensure `data_root`, `num_classes` are correct for Penn-Fudan). Add a `config_name` key, e.g., `'pennfudan_maskrcnn_v1'`. This name will be used for naming output folders.
|
||||
```
|
||||
|
||||
## Prompt 4: Core Data Loading (Torch Dataset)
|
||||
|
||||
```text
|
||||
Implement the core dataset loading logic in `utils/data_utils.py`:
|
||||
|
||||
1. Import necessary libraries: `os`, `torch`, `PIL.Image`, `numpy`, `torch.utils.data`.
|
||||
2. Create a class `PennFudanDataset(torch.utils.data.Dataset)`:
|
||||
* In `__init__(self, root, transforms)`:
|
||||
* Store `root` and `transforms`.
|
||||
* Load all image file paths from `root/PNGImages` and sort them.
|
||||
* Load all mask file paths from `root/PedMasks` and sort them. Ensure alignment between images and masks.
|
||||
* In `__getitem__(self, idx)`:
|
||||
* Load the image using `PIL.Image.open` from the path at index `idx`.
|
||||
* Load the corresponding mask using `PIL.Image.open`.
|
||||
* Convert the PIL mask image to a numpy array.
|
||||
* Identify unique object instances in the mask (0 is background). Each unique non-zero value corresponds to a distinct pedestrian instance.
|
||||
* Generate binary masks for each instance.
|
||||
* From the binary masks, calculate bounding boxes (`[xmin, ymin, xmax, ymax]`) for each instance. Exclude instances with zero area.
|
||||
* Create a `target` dictionary containing:
|
||||
* `boxes`: A `torch.FloatTensor` of shape `(N, 4)` where N is the number of instances.
|
||||
* `labels`: A `torch.Int64Tensor` of shape `(N,)` where all labels are `1` (for pedestrian).
|
||||
* `masks`: A `torch.UInt8Tensor` of shape `(N, H, W)`.
|
||||
* `image_id`: A `torch.Int64Tensor` containing `idx`.
|
||||
* `area`: A `torch.FloatTensor` containing the area of each bounding box.
|
||||
* `iscrowd`: A `torch.UInt8Tensor` of shape `(N,)` where all values are `0`.
|
||||
* Apply the `transforms` to the image and target if `transforms` is not None.
|
||||
* Return the transformed image and target.
|
||||
* In `__len__(self)`:
|
||||
* Return the total number of images.
|
||||
|
||||
*(Hint: Refer to the Torchvision Object Detection Finetuning Tutorial for guidance on parsing masks and structuring the target dictionary: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html)*
|
||||
```
|
||||
|
||||
## Prompt 5: Data Utilities (Transforms and Collate)
|
||||
|
||||
```text
|
||||
Add utility functions to `utils/data_utils.py`:
|
||||
|
||||
1. Import `torchvision.transforms.v2 as T`.
|
||||
2. Create a function `get_transform(train)`:
|
||||
* Initialize a list of transforms.
|
||||
* Always include `T.ToImage()` and `T.ToDtype(torch.float32, scale=True)`.
|
||||
* If `train` is `True`, add data augmentation transforms like `T.RandomHorizontalFlip(p=0.5)`. (Keep it simple for now, just horizontal flip).
|
||||
* Return `T.Compose(transforms)`.
|
||||
3. Create a function `collate_fn(batch)`:
|
||||
* This function takes a list of tuples (image, target) and batches them correctly.
|
||||
* It should return a tuple of `(list(images), list(targets))`. Use `tuple(zip(*batch))` for this.
|
||||
```
|
||||
|
||||
## Prompt 6: Model Definition
|
||||
|
||||
```text
|
||||
Implement the model loading function in `models/detection.py`:
|
||||
|
||||
1. Import `torch`, `torchvision`, `torchvision.models.detection`, `torchvision.models.detection.faster_rcnn`, `torchvision.models.detection.mask_rcnn`.
|
||||
2. Create a function `get_maskrcnn_model(num_classes, pretrained=True, pretrained_backbone=True)`:
|
||||
* Load a pre-trained Mask R-CNN model. Use `torchvision.models.detection.maskrcnn_resnet50_fpn_v2` with `weights=MaskRCNN_ResNet50_FPN_V2_Weights.DEFAULT` if `pretrained` is True, otherwise `weights=None`. Set `weights_backbone=ResNet50_Weights.DEFAULT` if `pretrained_backbone` is true and `pretrained` is false, otherwise `weights_backbone=None`.
|
||||
* Get the number of input features for the classifier (`in_features_box`).
|
||||
* Replace the bounding box predictor head (`model.roi_heads.box_predictor`) with a new `FastRCNNPredictor` instance having `in_features_box` and `num_classes`.
|
||||
* Get the number of input features for the mask classifier (`in_features_mask`).
|
||||
* Replace the mask predictor head (`model.roi_heads.mask_predictor`) with a new `MaskRCNNPredictor` instance having `in_features_mask`, 256 hidden layers (default), and `num_classes`.
|
||||
* Return the modified model.
|
||||
```
|
||||
|
||||
## Prompt 7: Basic Training Script Structure (`train.py`)
|
||||
|
||||
```text
|
||||
Set up the basic structure for `train.py`:
|
||||
|
||||
1. Import `torch`, `argparse`, `importlib`, `os`, `random`, `numpy`.
|
||||
2. Import necessary components from `utils.data_utils` (`PennFudanDataset`, `get_transform`, `collate_fn`) and `models.detection` (`get_maskrcnn_model`).
|
||||
3. Define a `main()` function.
|
||||
4. Inside `main()`:
|
||||
* Use `argparse` to create a parser that accepts one required argument: `--config`, the path to the configuration Python file (e.g., `configs/pennfudan_maskrcnn_config.py`).
|
||||
* Parse the arguments.
|
||||
* Load the configuration dictionary dynamically from the specified file path using `importlib`. For example:
|
||||
```python
|
||||
spec = importlib.util.spec_from_file_location("config_module", args.config)
|
||||
config_module = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(config_module)
|
||||
config = config_module.config
|
||||
```
|
||||
* Set random seeds for reproducibility using `random.seed`, `np.random.seed`, `torch.manual_seed` based on `config['seed']`. If using CUDA, also set `torch.cuda.manual_seed_all`.
|
||||
* Determine the device (`torch.device(config['device'] if torch.cuda.is_available() else 'cpu')`). Print the device being used.
|
||||
* Create the output directory structure based on `config['output_dir']` and `config['config_name']`. E.g., `output_path = os.path.join(config['output_dir'], config['config_name'])`. Create this directory and subdirectories like `checkpoints` if they don't exist (`os.makedirs(..., exist_ok=True)`).
|
||||
* Instantiate the `PennFudanDataset` for training (`dataset_train`) using `config['data_root']` and `get_transform(train=True)`.
|
||||
* Instantiate the `DataLoader` for training (`data_loader_train`) using `dataset_train`, `config['batch_size']`, `shuffle=True`, `num_workers=4` (or appropriate number), and `collate_fn=collate_fn`.
|
||||
* Instantiate the model using `get_maskrcnn_model(num_classes=config['num_classes'])`.
|
||||
* Move the model to the determined `device`.
|
||||
5. Add the standard Python entry point guard (`if __name__ == "__main__":`) to call `main()`.
|
||||
```
|
||||
|
||||
## Prompt 8: Minimal Training Step (`train.py`)
|
||||
|
||||
```text
|
||||
Extend `train.py` within the `main()` function to perform a single training step:
|
||||
|
||||
1. After moving the model to the device, instantiate the optimizer. Use `torch.optim.SGD` with parameters from the config (`lr`, `momentum`, `weight_decay`). Pass `model.parameters()` to the optimizer.
|
||||
2. Set the model to training mode: `model.train()`.
|
||||
3. Fetch *one* batch from `data_loader_train`. Use `next(iter(data_loader_train))`.
|
||||
4. Move images and targets to the `device`. Remember images is a list of tensors and targets is a list of dicts. Iterate through them.
|
||||
```python
|
||||
images = list(image.to(device) for image in images)
|
||||
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
|
||||
```
|
||||
5. Perform the forward pass: `loss_dict = model(images, targets)`. Note that in training mode, Mask R-CNN returns a dictionary of losses.
|
||||
6. Calculate the total loss: `losses = sum(loss for loss in loss_dict.values())`.
|
||||
7. Perform the backward pass:
|
||||
* Zero gradients: `optimizer.zero_grad()`.
|
||||
* Backpropagate: `losses.backward()`.
|
||||
* Update weights: `optimizer.step()`.
|
||||
8. Print the `loss_dict` and the total `losses` tensor for this single step.
|
||||
9. **(Important)** For now, after this single step, you can add `print("Single training step completed.")` and `return` or `sys.exit()` within `main()` to prevent further execution until the full loop is implemented.
|
||||
```
|
||||
|
||||
## Prompt 9: Logging Integration
|
||||
|
||||
```text
|
||||
Integrate file and console logging:
|
||||
|
||||
1. Create a new file `utils/log_utils.py`.
|
||||
2. Import `logging` and `os`.
|
||||
3. Define a function `setup_logging(log_dir, config_name)`:
|
||||
* Create the `log_dir` if it doesn't exist.
|
||||
* Define the log file path (e.g., `os.path.join(log_dir, f"{config_name}_train.log")`).
|
||||
* Configure the root logger using `logging.basicConfig`:
|
||||
* Set `level=logging.INFO`.
|
||||
* Set `format='%(asctime)s [%(levelname)s] %(message)s'`.
|
||||
* Set `datefmt='%Y-%m-%d %H:%M:%S'`.
|
||||
* Provide handlers:
|
||||
* A `logging.FileHandler` writing to the log file path.
|
||||
* A `logging.StreamHandler` writing to `sys.stdout`.
|
||||
4. In `train.py`:
|
||||
* Import `logging` and `setup_logging` from `utils.log_utils`.
|
||||
* Immediately after creating the `output_path`, call `setup_logging(output_path, config['config_name'])`.
|
||||
* Replace `print` statements used for informational output (like device used, starting training) with `logging.info()`.
|
||||
* Log the loaded configuration dictionary at the beginning of `main()`.
|
||||
* Log the losses calculated in the single training step using `logging.info(f"Step Loss Dict: {loss_dict}")` and `logging.info(f"Step Total Loss: {losses.item()}")`.
|
||||
```
|
||||
|
||||
## Prompt 10: Full Training Loop (`train.py`)
|
||||
|
||||
```text
|
||||
Implement the full training loop in `train.py`, replacing the single-step logic:
|
||||
|
||||
1. Import `time`.
|
||||
2. **(Remove the `return` or `sys.exit()` added in Prompt 8).**
|
||||
3. After creating the optimizer, create a learning rate scheduler (optional but good practice). Use `torch.optim.lr_scheduler.StepLR` with parameters from the config (`lr_step_size`, `lr_gamma`).
|
||||
4. Add outer loop for epochs: `for epoch in range(config['num_epochs']):`
|
||||
* Log the start of the epoch: `logging.info(f"--- Epoch {epoch+1}/{config['num_epochs']} ---")`.
|
||||
* Set model to train mode: `model.train()`.
|
||||
* Initialize variables to track epoch loss or metrics if needed.
|
||||
* Add inner loop for batches: `for i, (images, targets) in enumerate(data_loader_train):`
|
||||
* Move data to device.
|
||||
* Perform forward pass: `loss_dict = model(images, targets)`.
|
||||
* Calculate total loss: `losses = sum(loss for loss in loss_dict.values())`.
|
||||
* Perform backward pass (zero grad, backward, step).
|
||||
* Log batch loss periodically (e.g., every `config['log_freq']` iterations):
|
||||
```python
|
||||
if (i + 1) % config['log_freq'] == 0:
|
||||
loss_str = f"Epoch {epoch+1}, Iter {i+1}/{len(data_loader_train)}, Loss: {losses.item():.4f}"
|
||||
# Optional: Add individual losses from loss_dict to the log string
|
||||
logging.info(loss_str)
|
||||
```
|
||||
* After the inner loop (end of epoch), step the learning rate scheduler: `lr_scheduler.step()`.
|
||||
5. Log the total training time after the epoch loop finishes.
|
||||
```
|
||||
|
||||
## Prompt 11: Checkpointing (`train.py`)
|
||||
|
||||
```text
|
||||
Add model checkpointing capabilities to `train.py`:
|
||||
|
||||
1. Define the checkpoints directory: `checkpoint_dir = os.path.join(output_path, 'checkpoints')`. Create it if it doesn't exist.
|
||||
2. **(Optional but Recommended: Resume Training Logic)** Before the epoch loop:
|
||||
* Check if any checkpoints exist in `checkpoint_dir`. Find the latest one (e.g., based on epoch number in the filename).
|
||||
* If a checkpoint is found:
|
||||
* Log that training is resuming from the checkpoint.
|
||||
* Load the checkpoint using `torch.load()`.
|
||||
* Load the `model.state_dict()`.
|
||||
* Load the `optimizer.state_dict()`.
|
||||
* Load the starting epoch number (epoch from checkpoint + 1).
|
||||
* Load the `lr_scheduler.state_dict()`.
|
||||
* Handle potential device mismatches if loading a checkpoint saved on a different device (use `map_location`).
|
||||
* If no checkpoint is found, initialize `start_epoch = 0`.
|
||||
* Modify the epoch loop to start from `start_epoch`: `for epoch in range(start_epoch, config['num_epochs']):`
|
||||
3. Inside the epoch loop, after the training batches are processed (e.g., at the end of the epoch):
|
||||
* Check if the current epoch number satisfies the checkpoint frequency (e.g., `(epoch + 1) % config['checkpoint_freq'] == 0` or if it's the last epoch).
|
||||
* If it does, construct the checkpoint filename (e.g., `checkpoint_epoch_{epoch+1}.pth`).
|
||||
* Create a dictionary containing the state:
|
||||
```python
|
||||
checkpoint = {
|
||||
'epoch': epoch + 1,
|
||||
'model_state_dict': model.state_dict(),
|
||||
'optimizer_state_dict': optimizer.state_dict(),
|
||||
'scheduler_state_dict': lr_scheduler.state_dict(),
|
||||
'config': config # Optional: save config used for this checkpoint
|
||||
}
|
||||
```
|
||||
* Save the checkpoint dictionary using `torch.save()` to the `checkpoint_dir`.
|
||||
* Log that a checkpoint has been saved.
|
||||
```
|
||||
|
||||
## Prompt 12: Evaluation Integration (Setup and Basic Function)
|
||||
|
||||
```text
|
||||
Prepare for and implement a basic evaluation function:
|
||||
|
||||
1. Add `pycocotools` (or `pycocotools-windows`) to `pyproject.toml` using `uv add` if planning full mAP later, otherwise skip for now. Add `torchvision`'s `references/detection/` utilities if needed (might require separate download/copying, let's avoid this complexity initially if possible).
|
||||
2. Create `utils/eval_utils.py`. Import `torch`, `logging`, `time`.
|
||||
3. Define a function `evaluate(model, data_loader, device)`:
|
||||
* Set model to evaluation mode: `model.eval()`.
|
||||
* Initialize placeholder storage for results (e.g., list for losses).
|
||||
* Log the start of evaluation.
|
||||
* Start a timer.
|
||||
* Use `with torch.no_grad():` context manager.
|
||||
* Loop through the `data_loader`:
|
||||
* Move images and targets to the `device`.
|
||||
* Perform forward pass: `outputs = model(images, targets)` (in eval mode, it might return predictions OR losses depending on model/targets presence, check Torchvision docs. Let's assume for now targets ARE provided to eval loader and it returns losses similar to train).
|
||||
* If it returns losses: Calculate `loss = sum(l for l in outputs.values())`. Store `loss.item()`.
|
||||
* If it returns predictions (list of dicts with 'boxes', 'labels', 'scores', 'masks'): Store these predictions. (This path is more complex for metrics).
|
||||
* **(Simplification for now):** Calculate the average loss across all evaluation batches.
|
||||
* Log the average evaluation loss and the time taken.
|
||||
* Return the average loss (or a dictionary of metrics if implementing more later).
|
||||
4. In `train.py`:
|
||||
* Import the `evaluate` function.
|
||||
* After instantiating `dataset_train` and `data_loader_train`, also create `dataset_val` and `data_loader_val`. Use `get_transform(train=False)` for the validation dataset. You might need to modify `PennFudanDataset` or create a split if the dataset doesn't have predefined splits (Penn-Fudan doesn't, so maybe use a subset for validation - e.g., first/last N samples, requires modifying `PennFudanDataset` or using `torch.utils.data.Subset`). Let's use a simple Subset approach for now:
|
||||
```python
|
||||
# In train.py, after creating dataset_train
|
||||
indices = torch.randperm(len(dataset_train)).tolist()
|
||||
val_split = int(0.1 * len(dataset_train)) # 10% validation
|
||||
dataset_train = torch.utils.data.Subset(dataset_train, indices[:-val_split])
|
||||
dataset_val = torch.utils.data.Subset(PennFudanDataset(config['data_root'], get_transform(train=False)), indices[-val_split:]) # Re-instance dataset for val transforms
|
||||
|
||||
data_loader_val = torch.utils.data.DataLoader(
|
||||
dataset_val, batch_size=config['batch_size'], shuffle=False,
|
||||
num_workers=4, collate_fn=collate_fn
|
||||
)
|
||||
# Adjust train loader if using Subset on original dataset_train instance
|
||||
data_loader_train = torch.utils.data.DataLoader(
|
||||
dataset_train, batch_size=config['batch_size'], shuffle=True, # Should shuffle subset indices
|
||||
num_workers=4, collate_fn=collate_fn
|
||||
)
|
||||
|
||||
```
|
||||
* Inside the epoch loop, after the training batches and `lr_scheduler.step()`, call `val_metrics = evaluate(model, data_loader_val, device)`.
|
||||
* Log the returned validation metrics.
|
||||
* **(Best Checkpoint Logic - Add Later):** Keep track of the best validation metric seen so far. If the current epoch's metric is better, save a special 'best_model.pth' checkpoint, overwriting the previous best. Also, modify the periodic checkpoint saving to potentially include the validation metric in the filename.
|
||||
```
|
||||
|
||||
## Prompt 13: Testing Script (`test.py`)
|
||||
|
||||
```text
|
||||
Implement the `test.py` script for evaluating a trained model:
|
||||
|
||||
1. Copy the argument parsing, config loading, device setup, dataset/dataloader creation (use `get_transform(train=False)`), and model instantiation logic from `train.py` into a `main()` function in `test.py`.
|
||||
2. Add a required argument `--checkpoint` to `argparse` to specify the path to the `.pth` checkpoint file to load.
|
||||
3. Create a test dataset/dataloader (`dataset_test`, `data_loader_test`). Penn-Fudan doesn't have a standard test split, so reuse the validation split logic (or a dedicated test split if created earlier) for demonstration. Ensure `shuffle=False`.
|
||||
4. Load the specified checkpoint using `torch.load(args.checkpoint, map_location=device)`.
|
||||
5. Load the `model_state_dict` from the checkpoint into the model. Handle potential `module.` prefix if the model was saved using `DataParallel`.
|
||||
6. Import and call the `evaluate` function from `utils.eval_utils` using the `model`, `data_loader_test`, and `device`.
|
||||
7. Log or print the evaluation results returned by the `evaluate` function.
|
||||
8. Ensure logging is set up similarly to `train.py` (maybe log to `test.log`).
|
||||
```
|
||||
|
||||
## Prompt 14: Unit Tests (`tests/`)
|
||||
|
||||
```text
|
||||
Create basic unit tests using `pytest`:
|
||||
|
||||
1. In `tests/conftest.py` (optional): Define fixtures if needed, e.g., a fixture to provide a temporary directory or a minimal config dictionary.
|
||||
2. Create `tests/test_config.py`:
|
||||
* Write a test function `test_load_config()` that attempts to load the `configs/pennfudan_maskrcnn_config.py` file and asserts that the loaded object is a dictionary and contains expected keys (e.g., `data_root`, `num_classes`).
|
||||
3. Create `tests/test_model.py`:
|
||||
* Import `get_maskrcnn_model` from `models.detection`.
|
||||
* Write a test function `test_model_creation()`:
|
||||
* Call `get_maskrcnn_model(num_classes=2, pretrained=False)`.
|
||||
* Assert that the returned object is an instance of `torchvision.models.detection.MaskRCNN`.
|
||||
* Check the output features of `model.roi_heads.box_predictor.cls_score` and `model.roi_heads.mask_predictor.mask_fcn_logits` to ensure they match the requested `num_classes`.
|
||||
4. Create `tests/test_data_utils.py`:
|
||||
* Import `PennFudanDataset`, `get_transform` from `utils.data_utils`.
|
||||
* **(Challenge):** Testing dataset loading often requires actual data or a mock structure. For now, write a test function `test_dataset_instantiation()` that:
|
||||
* Instantiates `PennFudanDataset` pointing to the *actual* downloaded data path (this makes the test dependent on `download_data.sh` being run first). Use `get_transform(train=False)`.
|
||||
* Asserts that `len(dataset)` returns a positive number (e.g., 170 for Penn-Fudan).
|
||||
* Gets the first item using `dataset[0]`.
|
||||
* Asserts that the returned item is a tuple of (Tensor, dict).
|
||||
* Asserts that the target dictionary contains the required keys (`boxes`, `labels`, `masks`, etc.) and that they have plausible shapes/types (e.g., `target['boxes']` is a FloatTensor, `target['labels']` is an Int64Tensor).
|
||||
```
|
||||
|
||||
## Prompt 15: Pre-commit Integration for Tests
|
||||
|
||||
```text
|
||||
Update `pre-commit-config.yaml` to run `pytest`:
|
||||
|
||||
1. Add a new repo section for `pytest`:
|
||||
```yaml
|
||||
- repo: local
|
||||
hooks:
|
||||
- id: pytest
|
||||
name: pytest
|
||||
entry: pytest -v # Add flags as needed, e.g., -x to stop on first failure
|
||||
language: system
|
||||
types: [python]
|
||||
pass_filenames: false # pytest discovers files itself
|
||||
# Optional: Specify files/directories if needed
|
||||
# files: ^tests/
|
||||
```
|
||||
2. Ensure `pytest` is installed in the environment where pre-commit runs (it should be via `uv`).
|
||||
3. Run `pre-commit run --all-files` to test the new hook.
|
||||
```
|
||||
|
||||
## Prompt 16: Refinement and Documentation
|
||||
|
||||
```text
|
||||
Perform final refinements and update documentation:
|
||||
|
||||
1. **Error Handling:** Review `train.py` and `test.py`. Add `try...except` blocks around critical sections like data loading, model forward/backward passes, and checkpoint loading/saving. Log errors appropriately.
|
||||
2. **Config Validation:** Add checks at the beginning of `train.py`/`test.py` to validate essential config values (e.g., check if paths exist, types are correct).
|
||||
3. **Evaluation Metric:** If only average loss was implemented in `evaluate`, attempt to integrate a proper metric like mAP using `torchvision.ops.box_iou` and potentially adapting logic from Torchvision's evaluation scripts or `pycocotools`. Update the `evaluate` function return value and logging. Update the "best model" saving logic in `train.py` to use this metric.
|
||||
4. **Data Augmentation:** Add more relevant data augmentations to `get_transform(train=True)` in `utils/data_utils.py` (e.g., color jitter, resizing/cropping strategies suitable for object detection). Ensure transforms handle bounding boxes/masks correctly (use `torchvision.transforms.v2` which generally does).
|
||||
5. **README.md:** Significantly expand `README.md`:
|
||||
* Include project goals.
|
||||
* Detailed **Setup** instructions (clone repo, install `uv`, run `uv sync`, run `scripts/download_data.sh`, install pre-commit hooks).
|
||||
* **Configuration:** Explain the config files.
|
||||
* **Training:** How to run `train.py` with a config file. Mention output directories and checkpoints. Explain how to resume training.
|
||||
* **Testing:** How to run `test.py` with a config and checkpoint file.
|
||||
* **Project Structure:** Briefly describe the purpose of each directory.
|
||||
* **Dependencies:** List main dependencies.
|
||||
* **(Optional) Results:** Mention expected performance or show sample outputs.
|
||||
6. **Code Quality:** Run `ruff format .` and `ruff check . --fix` one last time to ensure code style and quality. Run `pre-commit run --all-files` to ensure all hooks pass.
|
||||
```
|
||||
133
todo.md
Normal file
133
todo.md
Normal file
@@ -0,0 +1,133 @@
|
||||
# Project To-Do List
|
||||
|
||||
This list outlines the steps required to complete the Torchvision Finetuning project, derived from `prompt_plan.md`.
|
||||
|
||||
## Phase 1: Foundation & Setup
|
||||
|
||||
- [ ] Set up project structure (directories: `configs`, `data`, `models`, `utils`, `tests`, `scripts`).
|
||||
- [ ] Initialize Git repository.
|
||||
- [ ] Create `.gitignore` file (ignore `data`, `outputs`, `logs`, `.venv`, caches, `*.pth`).
|
||||
- [ ] Initialize `pyproject.toml` using `uv init`, set Python 3.10.
|
||||
- [ ] Add core dependencies (`torch`, `torchvision`, `ruff`, `numpy`, `Pillow`, `pytest`) using `uv add`.
|
||||
- [ ] Create `pre-commit-config.yaml` and configure `ruff` hooks (format, lint, import sort).
|
||||
- [ ] Create `__init__.py` files in necessary directories.
|
||||
- [ ] Create empty placeholder files (`train.py`, `test.py`, `configs/base_config.py`, `utils/data_utils.py`, `models/detection.py`, `tests/conftest.py`).
|
||||
- [ ] Create basic `README.md`.
|
||||
- [ ] Install pre-commit hooks (`pre-commit install`).
|
||||
- [ ] Create `scripts/download_data.sh` script.
|
||||
- [ ] Check if data exists.
|
||||
- [ ] Create `data/` directory.
|
||||
- [ ] Use `wget` to download PennFudanPed dataset.
|
||||
- [ ] Use `unzip` to extract data.
|
||||
- [ ] Remove zip file after extraction.
|
||||
- [ ] Add informative print messages.
|
||||
- [ ] Make script executable (`chmod +x`).
|
||||
- [ ] Ensure `.gitignore` ignores `data/`.
|
||||
- [ ] Implement base configuration in `configs/base_config.py` (`base_config` dictionary).
|
||||
- [ ] Implement specific experiment configuration in `configs/pennfudan_maskrcnn_config.py` (`config` dictionary, importing/updating base config).
|
||||
|
||||
## Phase 2: Data Handling & Model
|
||||
|
||||
- [ ] Implement `PennFudanDataset` class in `utils/data_utils.py`.
|
||||
- [ ] `__init__`: Load image and mask paths.
|
||||
- [ ] `__getitem__`: Load image/mask, parse masks, generate targets (boxes, labels, masks, image_id, area, iscrowd), apply transforms.
|
||||
- [ ] `__len__`: Return dataset size.
|
||||
- [ ] Implement `get_transform(train)` function in `utils/data_utils.py` (using `torchvision.transforms.v2`).
|
||||
- [ ] Implement `collate_fn(batch)` function in `utils/data_utils.py`.
|
||||
- [ ] Implement `get_maskrcnn_model(num_classes, ...)` function in `models/detection.py`.
|
||||
- [ ] Load pre-trained Mask R-CNN (`maskrcnn_resnet50_fpn_v2`).
|
||||
- [ ] Replace box predictor head (`FastRCNNPredictor`).
|
||||
- [ ] Replace mask predictor head (`MaskRCNNPredictor`).
|
||||
|
||||
## Phase 3: Training Script & Core Logic
|
||||
|
||||
- [ ] Set up basic `train.py` structure.
|
||||
- [ ] Add imports.
|
||||
- [ ] Implement `argparse` for `--config` argument.
|
||||
- [ ] Implement dynamic config loading (`importlib`).
|
||||
- [ ] Set random seeds.
|
||||
- [ ] Determine compute device (`cuda` or `cpu`).
|
||||
- [ ] Create output directory structure (`outputs/<config_name>/checkpoints`).
|
||||
- [ ] Instantiate `PennFudanDataset` (train).
|
||||
- [ ] Instantiate `DataLoader` (train) using `collate_fn`.
|
||||
- [ ] Instantiate model using `get_maskrcnn_model`.
|
||||
- [ ] Move model to device.
|
||||
- [ ] Add `if __name__ == "__main__":` guard.
|
||||
- [ ] Implement minimal training step in `train.py`.
|
||||
- [ ] Instantiate optimizer (`torch.optim.SGD`).
|
||||
- [ ] Set `model.train()`.
|
||||
- [ ] Fetch one batch.
|
||||
- [ ] Move data to device.
|
||||
- [ ] Perform forward pass (`loss_dict = model(...)`).
|
||||
- [ ] Calculate total loss (`sum(...)`).
|
||||
- [ ] Perform backward pass (`optimizer.zero_grad()`, `loss.backward()`, `optimizer.step()`).
|
||||
- [ ] Print/log loss for the single step (and temporarily exit).
|
||||
- [ ] Implement logging setup in `utils/log_utils.py` (`setup_logging` function).
|
||||
- [ ] Configure `logging.basicConfig` for file and console output.
|
||||
- [ ] Integrate logging into `train.py`.
|
||||
- [ ] Call `setup_logging`.
|
||||
- [ ] Replace `print` with `logging.info`.
|
||||
- [ ] Log config, device, and training progress/losses.
|
||||
- [ ] Implement full training loop in `train.py`.
|
||||
- [ ] Remove single-step exit.
|
||||
- [ ] Add LR scheduler (`torch.optim.lr_scheduler.StepLR`).
|
||||
- [ ] Add epoch loop.
|
||||
- [ ] Add batch loop, integrating the single training step logic.
|
||||
- [ ] Log loss periodically within the batch loop.
|
||||
- [ ] Step the LR scheduler at the end of each epoch.
|
||||
- [ ] Log total training time.
|
||||
- [ ] Implement checkpointing in `train.py`.
|
||||
- [ ] Define checkpoint directory.
|
||||
- [ ] Implement logic to find and load the latest checkpoint (resume training).
|
||||
- [ ] Save checkpoints periodically (based on frequency or final epoch).
|
||||
- [ ] Include epoch, model state, optimizer state, scheduler state, config.
|
||||
- [ ] Log checkpoint loading/saving.
|
||||
|
||||
## Phase 4: Evaluation & Testing
|
||||
|
||||
- [ ] Add evaluation dependencies (`pycocotools` - optional initially).
|
||||
- [ ] Create `utils/eval_utils.py` and implement `evaluate` function.
|
||||
- [ ] Set `model.eval()`.
|
||||
- [ ] Use `torch.no_grad()`.
|
||||
- [ ] Loop through validation/test dataloader.
|
||||
- [ ] Perform forward pass.
|
||||
- [ ] Calculate/aggregate metrics (start with average loss, potentially add mAP later).
|
||||
- [ ] Log evaluation metrics and time.
|
||||
- [ ] Return metrics.
|
||||
- [ ] Integrate evaluation into `train.py`.
|
||||
- [ ] Create validation `Dataset` and `DataLoader` (using `torch.utils.data.Subset`).
|
||||
- [ ] Call `evaluate` at the end of each epoch.
|
||||
- [ ] Log validation metrics.
|
||||
- [ ] (Later) Implement logic to save the *best* model based on validation metric.
|
||||
- [ ] Implement `test.py` script.
|
||||
- [ ] Reuse argument parsing, config loading, device setup, dataset/dataloader (test split), model creation from `train.py`.
|
||||
- [ ] Add `--checkpoint` argument.
|
||||
- [ ] Load model weights from the specified checkpoint.
|
||||
- [ ] Call `evaluate` function using the test dataloader.
|
||||
- [ ] Log/print final evaluation results.
|
||||
- [ ] Setup logging for testing (e.g., `test.log`).
|
||||
- [ ] Create unit tests in `tests/` using `pytest`.
|
||||
- [ ] `tests/test_config.py`: Test config loading.
|
||||
- [ ] `tests/test_model.py`: Test model creation and head configuration.
|
||||
- [ ] `tests/test_data_utils.py`: Test dataset instantiation, length, and item format (requires data).
|
||||
- [ ] (Optional) Use fixtures in `tests/conftest.py` if needed.
|
||||
- [ ] Add `pytest` execution to `pre-commit-config.yaml`.
|
||||
- [ ] Test pre-commit hooks (`pre-commit run --all-files`).
|
||||
|
||||
## Phase 5: Refinement & Documentation
|
||||
|
||||
- [ ] Refine error handling in `train.py` and `test.py` (`try...except`).
|
||||
- [ ] Add configuration validation checks.
|
||||
- [ ] Improve evaluation metrics (e.g., implement mAP in `evaluate` function).
|
||||
- [ ] Add more data augmentations to `get_transform(train=True)`.
|
||||
- [ ] Expand `README.md` significantly.
|
||||
- [ ] Goals
|
||||
- [ ] Detailed Setup
|
||||
- [ ] Configuration explanation
|
||||
- [ ] Training instructions (including resuming)
|
||||
- [ ] Testing instructions
|
||||
- [ ] Project Structure overview
|
||||
- [ ] Dependencies list
|
||||
- [ ] (Optional) Results section
|
||||
- [ ] Perform final code quality checks (`ruff format .`, `ruff check . --fix`).
|
||||
- [ ] Ensure all pre-commit hooks pass.
|
||||
Reference in New Issue
Block a user