# LLM Prompts for Torchvision Finetuning Project ## Prompt 1: Project Foundation Setup ```text Based on the project specification (`project-spec.md`), set up the initial project structure and tooling. 1. Create the following directory structure within the current directory: ``` ├── configs/ ├── data/ ├── models/ ├── utils/ ├── tests/ ├── scripts/ ├── .gitignore # (Created in step 2) ├── pyproject.toml # (Created in step 3) ├── pre-commit-config.yaml # (Created in step 4) ├── README.md # (Created in step 7) ├── train.py # (Created in step 6) └── test.py # (Created in step 6) ``` 2. Initialize a git repository in the `torchvision-tutorial` directory. 3. Create a `.gitignore` file suitable for a Python project, ignoring directories like `data/`, `outputs/`, `logs/`, virtual environment folders (`.venv`), cache files (`__pycache__/`, `.pytest_cache/`, `.ruff_cache/`), and model checkpoints (`*.pth`). 4. Use `uv init` (or manually create) `pyproject.toml`. Specify Python 3.10. Add the following dependencies using `uv add`: `torch>=2.0`, `torchvision>=0.16`, `ruff`, `numpy`, `Pillow`, `pytest`. 5. Create `pre-commit-config.yaml`. Configure `ruff` for formatting (`ruff format`) and linting (`ruff check --select I --fix` for import sorting, and `ruff check --fix` for general linting). 6. Create empty `__init__.py` files in `configs/`, `models/`, `utils/`, and `tests/`. 7. Create empty placeholder files: `train.py`, `test.py`, `configs/base_config.py`, `utils/data_utils.py`, `models/detection.py`, `tests/conftest.py`. 8. Create a basic `README.md` with the project title "Torchvision Vibecoding Project" and a brief description based on `project-spec.md`. 9. Install pre-commit hooks (`pre-commit install`). ``` ## Prompt 2: Data Acquisition Script ```text Create a shell script `scripts/download_data.sh` that performs the following: 1. Checks if the target directory `data/PennFudanPed` already exists. If it does, print a message and exit. 2. Creates the `data/` directory if it doesn't exist. 3. Uses `wget` to download the Penn-Fudan dataset zip file (`PennFudanPed.zip`) from the specified URL (`https://www.cis.upenn.edu/~jshi/ped_html/PennFudanPed.zip`) into the `data/` directory. 4. Uses `unzip` to extract the contents of `PennFudanPed.zip` into the `data/` directory. 5. Removes the downloaded `PennFudanPed.zip` file after successful extraction. 6. Prints informative messages during download and extraction. 7. Make the script executable (`chmod +x`). Ensure the `.gitignore` file correctly ignores the `data/` directory. ``` ## Prompt 3: Configuration System ```text Implement the configuration system using Python dictionaries: 1. In `configs/base_config.py`, define a Python dictionary named `base_config` containing placeholders or default values for common parameters: * `data_root`: Path to the dataset root (e.g., `'data/PennFudanPed'`) * `output_dir`: Base directory for outputs (e.g., `'outputs'`) * `device`: Compute device (e.g., `'cuda'`) * `num_classes`: Number of classes including background (e.g., `2` for Penn-Fudan) * `batch_size`: Training batch size (e.g., `2`) * `num_epochs`: Number of training epochs (e.g., `10`) * `lr`: Learning rate (e.g., `0.005`) * `momentum`: Optimizer momentum (e.g., `0.9`) * `weight_decay`: Optimizer weight decay (e.g., `0.0005`) * `lr_step_size`: Learning rate scheduler step size (e.g., `3`) * `lr_gamma`: Learning rate scheduler gamma (e.g., `0.1`) * `seed`: Random seed (e.g., `42`) * `log_freq`: Logging frequency during training (e.g., `10`) * `checkpoint_freq`: Checkpoint saving frequency (e.g., `1`) 2. In `configs/pennfudan_maskrcnn_config.py`, create a dictionary named `config`. * Import the `base_config` from `configs.base_config`. * Create the `config` dictionary, potentially starting with a copy of `base_config` (`config = base_config.copy()`). * Update specific values as needed for this experiment (e.g., ensure `data_root`, `num_classes` are correct for Penn-Fudan). Add a `config_name` key, e.g., `'pennfudan_maskrcnn_v1'`. This name will be used for naming output folders. ``` ## Prompt 4: Core Data Loading (Torch Dataset) ```text Implement the core dataset loading logic in `utils/data_utils.py`: 1. Import necessary libraries: `os`, `torch`, `PIL.Image`, `numpy`, `torch.utils.data`. 2. Create a class `PennFudanDataset(torch.utils.data.Dataset)`: * In `__init__(self, root, transforms)`: * Store `root` and `transforms`. * Load all image file paths from `root/PNGImages` and sort them. * Load all mask file paths from `root/PedMasks` and sort them. Ensure alignment between images and masks. * In `__getitem__(self, idx)`: * Load the image using `PIL.Image.open` from the path at index `idx`. * Load the corresponding mask using `PIL.Image.open`. * Convert the PIL mask image to a numpy array. * Identify unique object instances in the mask (0 is background). Each unique non-zero value corresponds to a distinct pedestrian instance. * Generate binary masks for each instance. * From the binary masks, calculate bounding boxes (`[xmin, ymin, xmax, ymax]`) for each instance. Exclude instances with zero area. * Create a `target` dictionary containing: * `boxes`: A `torch.FloatTensor` of shape `(N, 4)` where N is the number of instances. * `labels`: A `torch.Int64Tensor` of shape `(N,)` where all labels are `1` (for pedestrian). * `masks`: A `torch.UInt8Tensor` of shape `(N, H, W)`. * `image_id`: A `torch.Int64Tensor` containing `idx`. * `area`: A `torch.FloatTensor` containing the area of each bounding box. * `iscrowd`: A `torch.UInt8Tensor` of shape `(N,)` where all values are `0`. * Apply the `transforms` to the image and target if `transforms` is not None. * Return the transformed image and target. * In `__len__(self)`: * Return the total number of images. ``` ## Prompt 5: Data Utilities (Transforms and Collate) ```text Add utility functions to `utils/data_utils.py`: 1. Import `torchvision.transforms.v2 as T`. 2. Create a function `get_transform(train)`: * Initialize a list of transforms. * Always include `T.ToImage()` and `T.ToDtype(torch.float32, scale=True)`. * If `train` is `True`, add data augmentation transforms like `T.RandomHorizontalFlip(p=0.5)`. (Keep it simple for now, just horizontal flip). * Return `T.Compose(transforms)`. 3. Create a function `collate_fn(batch)`: * This function takes a list of tuples (image, target) and batches them correctly. * It should return a tuple of `(list(images), list(targets))`. Use `tuple(zip(*batch))` for this. ``` ## Prompt 6: Model Definition ```text Implement the model loading function in `models/detection.py`: 1. Import `torch`, `torchvision`, `torchvision.models.detection`, `torchvision.models.detection.faster_rcnn`, `torchvision.models.detection.mask_rcnn`. 2. Create a function `get_maskrcnn_model(num_classes, pretrained=True, pretrained_backbone=True)`: * Load a pre-trained Mask R-CNN model. Use `torchvision.models.detection.maskrcnn_resnet50_fpn_v2` with `weights=MaskRCNN_ResNet50_FPN_V2_Weights.DEFAULT` if `pretrained` is True, otherwise `weights=None`. Set `weights_backbone=ResNet50_Weights.DEFAULT` if `pretrained_backbone` is true and `pretrained` is false, otherwise `weights_backbone=None`. * Get the number of input features for the classifier (`in_features_box`). * Replace the bounding box predictor head (`model.roi_heads.box_predictor`) with a new `FastRCNNPredictor` instance having `in_features_box` and `num_classes`. * Get the number of input features for the mask classifier (`in_features_mask`). * Replace the mask predictor head (`model.roi_heads.mask_predictor`) with a new `MaskRCNNPredictor` instance having `in_features_mask`, 256 hidden layers (default), and `num_classes`. * Return the modified model. ``` ## Prompt 7: Basic Training Script Structure (`train.py`) ```text Set up the basic structure for `train.py`: 1. Import `torch`, `argparse`, `importlib`, `os`, `random`, `numpy`. 2. Import necessary components from `utils.data_utils` (`PennFudanDataset`, `get_transform`, `collate_fn`) and `models.detection` (`get_maskrcnn_model`). 3. Define a `main()` function. 4. Inside `main()`: * Use `argparse` to create a parser that accepts one required argument: `--config`, the path to the configuration Python file (e.g., `configs/pennfudan_maskrcnn_config.py`). * Parse the arguments. * Load the configuration dictionary dynamically from the specified file path using `importlib`. For example: ```python spec = importlib.util.spec_from_file_location("config_module", args.config) config_module = importlib.util.module_from_spec(spec) spec.loader.exec_module(config_module) config = config_module.config ``` * Set random seeds for reproducibility using `random.seed`, `np.random.seed`, `torch.manual_seed` based on `config['seed']`. If using CUDA, also set `torch.cuda.manual_seed_all`. * Determine the device (`torch.device(config['device'] if torch.cuda.is_available() else 'cpu')`). Print the device being used. * Create the output directory structure based on `config['output_dir']` and `config['config_name']`. E.g., `output_path = os.path.join(config['output_dir'], config['config_name'])`. Create this directory and subdirectories like `checkpoints` if they don't exist (`os.makedirs(..., exist_ok=True)`). * Instantiate the `PennFudanDataset` for training (`dataset_train`) using `config['data_root']` and `get_transform(train=True)`. * Instantiate the `DataLoader` for training (`data_loader_train`) using `dataset_train`, `config['batch_size']`, `shuffle=True`, `num_workers=4` (or appropriate number), and `collate_fn=collate_fn`. * Instantiate the model using `get_maskrcnn_model(num_classes=config['num_classes'])`. * Move the model to the determined `device`. 5. Add the standard Python entry point guard (`if __name__ == "__main__":`) to call `main()`. ``` ## Prompt 8: Minimal Training Step (`train.py`) ```text Extend `train.py` within the `main()` function to perform a single training step: 1. After moving the model to the device, instantiate the optimizer. Use `torch.optim.SGD` with parameters from the config (`lr`, `momentum`, `weight_decay`). Pass `model.parameters()` to the optimizer. 2. Set the model to training mode: `model.train()`. 3. Fetch *one* batch from `data_loader_train`. Use `next(iter(data_loader_train))`. 4. Move images and targets to the `device`. Remember images is a list of tensors and targets is a list of dicts. Iterate through them. ```python images = list(image.to(device) for image in images) targets = [{k: v.to(device) for k, v in t.items()} for t in targets] ``` 5. Perform the forward pass: `loss_dict = model(images, targets)`. Note that in training mode, Mask R-CNN returns a dictionary of losses. 6. Calculate the total loss: `losses = sum(loss for loss in loss_dict.values())`. 7. Perform the backward pass: * Zero gradients: `optimizer.zero_grad()`. * Backpropagate: `losses.backward()`. * Update weights: `optimizer.step()`. 8. Print the `loss_dict` and the total `losses` tensor for this single step. 9. **(Important)** For now, after this single step, you can add `print("Single training step completed.")` and `return` or `sys.exit()` within `main()` to prevent further execution until the full loop is implemented. ``` ## Prompt 9: Logging Integration ```text Integrate file and console logging: 1. Create a new file `utils/log_utils.py`. 2. Import `logging` and `os`. 3. Define a function `setup_logging(log_dir, config_name)`: * Create the `log_dir` if it doesn't exist. * Define the log file path (e.g., `os.path.join(log_dir, f"{config_name}_train.log")`). * Configure the root logger using `logging.basicConfig`: * Set `level=logging.INFO`. * Set `format='%(asctime)s [%(levelname)s] %(message)s'`. * Set `datefmt='%Y-%m-%d %H:%M:%S'`. * Provide handlers: * A `logging.FileHandler` writing to the log file path. * A `logging.StreamHandler` writing to `sys.stdout`. 4. In `train.py`: * Import `logging` and `setup_logging` from `utils.log_utils`. * Immediately after creating the `output_path`, call `setup_logging(output_path, config['config_name'])`. * Replace `print` statements used for informational output (like device used, starting training) with `logging.info()`. * Log the loaded configuration dictionary at the beginning of `main()`. * Log the losses calculated in the single training step using `logging.info(f"Step Loss Dict: {loss_dict}")` and `logging.info(f"Step Total Loss: {losses.item()}")`. ``` ## Prompt 10: Full Training Loop (`train.py`) ```text Implement the full training loop in `train.py`, replacing the single-step logic: 1. Import `time`. 2. **(Remove the `return` or `sys.exit()` added in Prompt 8).** 3. After creating the optimizer, create a learning rate scheduler (optional but good practice). Use `torch.optim.lr_scheduler.StepLR` with parameters from the config (`lr_step_size`, `lr_gamma`). 4. Add outer loop for epochs: `for epoch in range(config['num_epochs']):` * Log the start of the epoch: `logging.info(f"--- Epoch {epoch+1}/{config['num_epochs']} ---")`. * Set model to train mode: `model.train()`. * Initialize variables to track epoch loss or metrics if needed. * Add inner loop for batches: `for i, (images, targets) in enumerate(data_loader_train):` * Move data to device. * Perform forward pass: `loss_dict = model(images, targets)`. * Calculate total loss: `losses = sum(loss for loss in loss_dict.values())`. * Perform backward pass (zero grad, backward, step). * Log batch loss periodically (e.g., every `config['log_freq']` iterations): ```python if (i + 1) % config['log_freq'] == 0: loss_str = f"Epoch {epoch+1}, Iter {i+1}/{len(data_loader_train)}, Loss: {losses.item():.4f}" # Optional: Add individual losses from loss_dict to the log string logging.info(loss_str) ``` * After the inner loop (end of epoch), step the learning rate scheduler: `lr_scheduler.step()`. 5. Log the total training time after the epoch loop finishes. ``` ## Prompt 11: Checkpointing (`train.py`) ```text Add model checkpointing capabilities to `train.py`: 1. Define the checkpoints directory: `checkpoint_dir = os.path.join(output_path, 'checkpoints')`. Create it if it doesn't exist. 2. **(Optional but Recommended: Resume Training Logic)** Before the epoch loop: * Check if any checkpoints exist in `checkpoint_dir`. Find the latest one (e.g., based on epoch number in the filename). * If a checkpoint is found: * Log that training is resuming from the checkpoint. * Load the checkpoint using `torch.load()`. * Load the `model.state_dict()`. * Load the `optimizer.state_dict()`. * Load the starting epoch number (epoch from checkpoint + 1). * Load the `lr_scheduler.state_dict()`. * Handle potential device mismatches if loading a checkpoint saved on a different device (use `map_location`). * If no checkpoint is found, initialize `start_epoch = 0`. * Modify the epoch loop to start from `start_epoch`: `for epoch in range(start_epoch, config['num_epochs']):` 3. Inside the epoch loop, after the training batches are processed (e.g., at the end of the epoch): * Check if the current epoch number satisfies the checkpoint frequency (e.g., `(epoch + 1) % config['checkpoint_freq'] == 0` or if it's the last epoch). * If it does, construct the checkpoint filename (e.g., `checkpoint_epoch_{epoch+1}.pth`). * Create a dictionary containing the state: ```python checkpoint = { 'epoch': epoch + 1, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'scheduler_state_dict': lr_scheduler.state_dict(), 'config': config # Optional: save config used for this checkpoint } ``` * Save the checkpoint dictionary using `torch.save()` to the `checkpoint_dir`. * Log that a checkpoint has been saved. ``` ## Prompt 12: Evaluation Integration (Setup and Basic Function) ```text Prepare for and implement a basic evaluation function: 1. Add `pycocotools` (or `pycocotools-windows`) to `pyproject.toml` using `uv add` if planning full mAP later, otherwise skip for now. Add `torchvision`'s `references/detection/` utilities if needed (might require separate download/copying, let's avoid this complexity initially if possible). 2. Create `utils/eval_utils.py`. Import `torch`, `logging`, `time`. 3. Define a function `evaluate(model, data_loader, device)`: * Set model to evaluation mode: `model.eval()`. * Initialize placeholder storage for results (e.g., list for losses). * Log the start of evaluation. * Start a timer. * Use `with torch.no_grad():` context manager. * Loop through the `data_loader`: * Move images and targets to the `device`. * Perform forward pass: `outputs = model(images, targets)` (in eval mode, it might return predictions OR losses depending on model/targets presence, check Torchvision docs. Let's assume for now targets ARE provided to eval loader and it returns losses similar to train). * If it returns losses: Calculate `loss = sum(l for l in outputs.values())`. Store `loss.item()`. * If it returns predictions (list of dicts with 'boxes', 'labels', 'scores', 'masks'): Store these predictions. (This path is more complex for metrics). * **(Simplification for now):** Calculate the average loss across all evaluation batches. * Log the average evaluation loss and the time taken. * Return the average loss (or a dictionary of metrics if implementing more later). 4. In `train.py`: * Import the `evaluate` function. * After instantiating `dataset_train` and `data_loader_train`, also create `dataset_val` and `data_loader_val`. Use `get_transform(train=False)` for the validation dataset. You might need to modify `PennFudanDataset` or create a split if the dataset doesn't have predefined splits (Penn-Fudan doesn't, so maybe use a subset for validation - e.g., first/last N samples, requires modifying `PennFudanDataset` or using `torch.utils.data.Subset`). Let's use a simple Subset approach for now: ```python # In train.py, after creating dataset_train indices = torch.randperm(len(dataset_train)).tolist() val_split = int(0.1 * len(dataset_train)) # 10% validation dataset_train = torch.utils.data.Subset(dataset_train, indices[:-val_split]) dataset_val = torch.utils.data.Subset(PennFudanDataset(config['data_root'], get_transform(train=False)), indices[-val_split:]) # Re-instance dataset for val transforms data_loader_val = torch.utils.data.DataLoader( dataset_val, batch_size=config['batch_size'], shuffle=False, num_workers=4, collate_fn=collate_fn ) # Adjust train loader if using Subset on original dataset_train instance data_loader_train = torch.utils.data.DataLoader( dataset_train, batch_size=config['batch_size'], shuffle=True, # Should shuffle subset indices num_workers=4, collate_fn=collate_fn ) ``` * Inside the epoch loop, after the training batches and `lr_scheduler.step()`, call `val_metrics = evaluate(model, data_loader_val, device)`. * Log the returned validation metrics. * **(Best Checkpoint Logic - Add Later):** Keep track of the best validation metric seen so far. If the current epoch's metric is better, save a special 'best_model.pth' checkpoint, overwriting the previous best. Also, modify the periodic checkpoint saving to potentially include the validation metric in the filename. ``` ## Prompt 13: Testing Script (`test.py`) ```text Implement the `test.py` script for evaluating a trained model: 1. Copy the argument parsing, config loading, device setup, dataset/dataloader creation (use `get_transform(train=False)`), and model instantiation logic from `train.py` into a `main()` function in `test.py`. 2. Add a required argument `--checkpoint` to `argparse` to specify the path to the `.pth` checkpoint file to load. 3. Create a test dataset/dataloader (`dataset_test`, `data_loader_test`). Penn-Fudan doesn't have a standard test split, so reuse the validation split logic (or a dedicated test split if created earlier) for demonstration. Ensure `shuffle=False`. 4. Load the specified checkpoint using `torch.load(args.checkpoint, map_location=device)`. 5. Load the `model_state_dict` from the checkpoint into the model. Handle potential `module.` prefix if the model was saved using `DataParallel`. 6. Import and call the `evaluate` function from `utils.eval_utils` using the `model`, `data_loader_test`, and `device`. 7. Log or print the evaluation results returned by the `evaluate` function. 8. Ensure logging is set up similarly to `train.py` (maybe log to `test.log`). ``` ## Prompt 14: Unit Tests (`tests/`) ```text Create basic unit tests using `pytest`: 1. In `tests/conftest.py` (optional): Define fixtures if needed, e.g., a fixture to provide a temporary directory or a minimal config dictionary. 2. Create `tests/test_config.py`: * Write a test function `test_load_config()` that attempts to load the `configs/pennfudan_maskrcnn_config.py` file and asserts that the loaded object is a dictionary and contains expected keys (e.g., `data_root`, `num_classes`). 3. Create `tests/test_model.py`: * Import `get_maskrcnn_model` from `models.detection`. * Write a test function `test_model_creation()`: * Call `get_maskrcnn_model(num_classes=2, pretrained=False)`. * Assert that the returned object is an instance of `torchvision.models.detection.MaskRCNN`. * Check the output features of `model.roi_heads.box_predictor.cls_score` and `model.roi_heads.mask_predictor.mask_fcn_logits` to ensure they match the requested `num_classes`. 4. Create `tests/test_data_utils.py`: * Import `PennFudanDataset`, `get_transform` from `utils.data_utils`. * **(Challenge):** Testing dataset loading often requires actual data or a mock structure. For now, write a test function `test_dataset_instantiation()` that: * Instantiates `PennFudanDataset` pointing to the *actual* downloaded data path (this makes the test dependent on `download_data.sh` being run first). Use `get_transform(train=False)`. * Asserts that `len(dataset)` returns a positive number (e.g., 170 for Penn-Fudan). * Gets the first item using `dataset[0]`. * Asserts that the returned item is a tuple of (Tensor, dict). * Asserts that the target dictionary contains the required keys (`boxes`, `labels`, `masks`, etc.) and that they have plausible shapes/types (e.g., `target['boxes']` is a FloatTensor, `target['labels']` is an Int64Tensor). ``` ## Prompt 15: Pre-commit Integration for Tests ```text Update `pre-commit-config.yaml` to run `pytest`: 1. Add a new repo section for `pytest`: ```yaml - repo: local hooks: - id: pytest name: pytest entry: pytest -v # Add flags as needed, e.g., -x to stop on first failure language: system types: [python] pass_filenames: false # pytest discovers files itself # Optional: Specify files/directories if needed # files: ^tests/ ``` 2. Ensure `pytest` is installed in the environment where pre-commit runs (it should be via `uv`). 3. Run `pre-commit run --all-files` to test the new hook. ``` ## Prompt 16: Refinement and Documentation ```text Perform final refinements and update documentation: 1. **Error Handling:** Review `train.py` and `test.py`. Add `try...except` blocks around critical sections like data loading, model forward/backward passes, and checkpoint loading/saving. Log errors appropriately. 2. **Config Validation:** Add checks at the beginning of `train.py`/`test.py` to validate essential config values (e.g., check if paths exist, types are correct). 3. **Evaluation Metric:** If only average loss was implemented in `evaluate`, attempt to integrate a proper metric like mAP using `torchvision.ops.box_iou` and potentially adapting logic from Torchvision's evaluation scripts or `pycocotools`. Update the `evaluate` function return value and logging. Update the "best model" saving logic in `train.py` to use this metric. 4. **Data Augmentation:** Add more relevant data augmentations to `get_transform(train=True)` in `utils/data_utils.py` (e.g., color jitter, resizing/cropping strategies suitable for object detection). Ensure transforms handle bounding boxes/masks correctly (use `torchvision.transforms.v2` which generally does). 5. **README.md:** Significantly expand `README.md`: * Include project goals. * Detailed **Setup** instructions (clone repo, install `uv`, run `uv sync`, run `scripts/download_data.sh`, install pre-commit hooks). * **Configuration:** Explain the config files. * **Training:** How to run `train.py` with a config file. Mention output directories and checkpoints. Explain how to resume training. * **Testing:** How to run `test.py` with a config and checkpoint file. * **Project Structure:** Briefly describe the purpose of each directory. * **Dependencies:** List main dependencies. * **(Optional) Results:** Mention expected performance or show sample outputs. 6. **Code Quality:** Run `ruff format .` and `ruff check . --fix` one last time to ensure code style and quality. Run `pre-commit run --all-files` to ensure all hooks pass. ```