This project was a test run using Cursor and "vibe coding" to create a full object detection project. I wrote almost no lines of code to get to this point, which kind of works. The technology is definitely impressive, but really feels more suited to things that can be developed in a more test-driven way. I'll update this later with other things I've learned along the way.

I stopped this project here because it got trapped in a doom loop not being able to fix a bug in the eval code and I wanted this to be an investigation into how well I could do with very low intervention.

Torchvision Vibecoding Project

A PyTorch-based object detection project using Mask R-CNN to detect pedestrians in the Penn-Fudan dataset. This project demonstrates model training, evaluation, and visualization with PyTorch and Torchvision.

Prerequisites
Project Setup
Project Structure
Data Preparation
Configuration
Training
Evaluation
Visualization
Testing
Debugging

Prerequisites

Python 3.10+
uv for package management
CUDA-compatible GPU (optional but recommended)

Project Setup

Clone the repository:

git clone https://github.com/yourusername/torchvision-vibecoding-project.git
cd torchvision-vibecoding-project

Set up the environment with uv:

uv init
uv sync

Install development dependencies:

uv add ruff pytest matplotlib

Set up pre-commit hooks:

pre-commit install

Project Structure

├── configs/                 # Configuration files
│   ├── base_config.py       # Base configuration with defaults
│   ├── debug_config.py      # Configuration for quick debugging
│   └── pennfudan_maskrcnn_config.py  # Configuration for Penn-Fudan dataset
├── data/                    # Dataset directory (not tracked by git)
│   └── PennFudanPed/        # Penn-Fudan pedestrian dataset
├── models/                  # Model definitions
│   └── detection.py         # Mask R-CNN model definition
├── outputs/                 # Training outputs (not tracked by git)
│   └── <config_name>/       # Named by configuration
│       ├── checkpoints/     # Model checkpoints
│       └── *.log            # Log files
├── scripts/                 # Utility scripts
│   ├── download_data.sh     # Script to download dataset
│   ├── test_model.py        # Script for quick model testing
│   └── visualize_predictions.py  # Script for prediction visualization
├── tests/                   # Unit tests
│   ├── conftest.py          # Test fixtures
│   ├── test_data_utils.py   # Tests for data utilities
│   ├── test_model.py        # Tests for model functionality
│   └── test_visualization.py  # Tests for visualization
├── utils/                   # Utility modules
│   ├── common.py            # Common functionality
│   ├── data_utils.py        # Dataset handling
│   ├── eval_utils.py        # Evaluation functions
│   └── log_utils.py         # Logging utilities
├── train.py                 # Training script
├── test.py                  # Evaluation script
├── pyproject.toml           # Project dependencies and configuration
├── .pre-commit-config.yaml  # Pre-commit configuration
└── README.md                # This file

Data Preparation

Download the Penn-Fudan pedestrian dataset:

./scripts/download_data.sh

This will download and extract the dataset to the data/PennFudanPed directory.

Configuration

The project uses Python dictionaries for configuration:

configs/base_config.py: Default configuration values
configs/pennfudan_maskrcnn_config.py: Configuration for training on Penn-Fudan
configs/debug_config.py: Configuration for quick testing (CPU, minimal training)

Key configuration parameters:

data_root: Path to dataset
output_dir: Directory for outputs
device: Computing device ('cuda' or 'cpu')
batch_size: Batch size for training
num_epochs: Number of training epochs
lr, momentum, weight_decay: Optimizer parameters

Training

Run the training script with a configuration file:

python train.py --config configs/pennfudan_maskrcnn_config.py

For quick debugging on CPU:

python train.py --config configs/debug_config.py

To resume training from the latest checkpoint:

python train.py --config configs/pennfudan_maskrcnn_config.py --resume

Training outputs (logs, checkpoints) are saved to outputs/<config_name>/.

Evaluation

Evaluate a trained model:

python test.py --config configs/pennfudan_maskrcnn_config.py --checkpoint outputs/pennfudan_maskrcnn_v1/checkpoints/checkpoint_epoch_10.pth

This runs the model on the test dataset and reports metrics.

Visualization

Visualize model predictions on images:

python scripts/visualize_predictions.py --config configs/pennfudan_maskrcnn_config.py --checkpoint outputs/pennfudan_maskrcnn_v1/checkpoints/checkpoint_epoch_10.pth --index 0 --output prediction.png

Parameters:

--config: Configuration file path
--checkpoint: Model checkpoint path
--index: Image index in dataset (default: 0)
--threshold: Detection confidence threshold (default: 0.5)
--output: Output image path (optional, displays interactively if not specified)

Testing

Run all tests:

python -m pytest

Run specific test file:

python -m pytest tests/test_data_utils.py

Run tests with verbosity:

python -m pytest -v

Debugging

For quick model testing without full training:

python scripts/test_model.py

This verifies:

Model creation
Forward pass
Backward pass
Dataset loading

For training with minimal resources:

python train.py --config configs/debug_config.py

This uses:

CPU computation
Minimal epochs (1)
Small batch size (1)
No multiprocessing

Code Quality

Format code:

ruff format .

Run linter:

ruff check .

Fix auto-fixable issues:

ruff check --fix .

Run pre-commit checks:

pre-commit run --all-files

README.md