Use Agent to create full readme, and add note about this project at the top.
This commit is contained in:
231
README.md
231
README.md
@@ -1,3 +1,232 @@
|
||||
This project was a test run using Cursor and "vibe coding" to create a full object detection project. I wrote almost no lines of code to get to this point, which
|
||||
kind of works. The technology is definitely impressive, but really feels more suited to things that can be developed in a more test-driven way. I'll update this later
|
||||
with other things I've learned along the way.
|
||||
|
||||
# Torchvision Vibecoding Project
|
||||
|
||||
A project demonstrating finetuning torchvision object detection models, built with the help of Vibecoding AI.
|
||||
A PyTorch-based object detection project using Mask R-CNN to detect pedestrians in the Penn-Fudan dataset. This project demonstrates model training, evaluation, and visualization with PyTorch and Torchvision.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Prerequisites](#prerequisites)
|
||||
- [Project Setup](#project-setup)
|
||||
- [Project Structure](#project-structure)
|
||||
- [Data Preparation](#data-preparation)
|
||||
- [Configuration](#configuration)
|
||||
- [Training](#training)
|
||||
- [Evaluation](#evaluation)
|
||||
- [Visualization](#visualization)
|
||||
- [Testing](#testing)
|
||||
- [Debugging](#debugging)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Python 3.10+
|
||||
- [uv](https://github.com/astral-sh/uv) for package management
|
||||
- CUDA-compatible GPU (optional but recommended)
|
||||
|
||||
## Project Setup
|
||||
|
||||
1. Clone the repository:
|
||||
```bash
|
||||
git clone https://github.com/yourusername/torchvision-vibecoding-project.git
|
||||
cd torchvision-vibecoding-project
|
||||
```
|
||||
|
||||
2. Set up the environment with uv:
|
||||
```bash
|
||||
uv init
|
||||
uv sync
|
||||
```
|
||||
|
||||
3. Install development dependencies:
|
||||
```bash
|
||||
uv add ruff pytest matplotlib
|
||||
```
|
||||
|
||||
4. Set up pre-commit hooks:
|
||||
```bash
|
||||
pre-commit install
|
||||
```
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
├── configs/ # Configuration files
|
||||
│ ├── base_config.py # Base configuration with defaults
|
||||
│ ├── debug_config.py # Configuration for quick debugging
|
||||
│ └── pennfudan_maskrcnn_config.py # Configuration for Penn-Fudan dataset
|
||||
├── data/ # Dataset directory (not tracked by git)
|
||||
│ └── PennFudanPed/ # Penn-Fudan pedestrian dataset
|
||||
├── models/ # Model definitions
|
||||
│ └── detection.py # Mask R-CNN model definition
|
||||
├── outputs/ # Training outputs (not tracked by git)
|
||||
│ └── <config_name>/ # Named by configuration
|
||||
│ ├── checkpoints/ # Model checkpoints
|
||||
│ └── *.log # Log files
|
||||
├── scripts/ # Utility scripts
|
||||
│ ├── download_data.sh # Script to download dataset
|
||||
│ ├── test_model.py # Script for quick model testing
|
||||
│ └── visualize_predictions.py # Script for prediction visualization
|
||||
├── tests/ # Unit tests
|
||||
│ ├── conftest.py # Test fixtures
|
||||
│ ├── test_data_utils.py # Tests for data utilities
|
||||
│ ├── test_model.py # Tests for model functionality
|
||||
│ └── test_visualization.py # Tests for visualization
|
||||
├── utils/ # Utility modules
|
||||
│ ├── common.py # Common functionality
|
||||
│ ├── data_utils.py # Dataset handling
|
||||
│ ├── eval_utils.py # Evaluation functions
|
||||
│ └── log_utils.py # Logging utilities
|
||||
├── train.py # Training script
|
||||
├── test.py # Evaluation script
|
||||
├── pyproject.toml # Project dependencies and configuration
|
||||
├── .pre-commit-config.yaml # Pre-commit configuration
|
||||
└── README.md # This file
|
||||
```
|
||||
|
||||
## Data Preparation
|
||||
|
||||
Download the Penn-Fudan pedestrian dataset:
|
||||
|
||||
```bash
|
||||
./scripts/download_data.sh
|
||||
```
|
||||
|
||||
This will download and extract the dataset to the `data/PennFudanPed` directory.
|
||||
|
||||
## Configuration
|
||||
|
||||
The project uses Python dictionaries for configuration:
|
||||
|
||||
- `configs/base_config.py`: Default configuration values
|
||||
- `configs/pennfudan_maskrcnn_config.py`: Configuration for training on Penn-Fudan
|
||||
- `configs/debug_config.py`: Configuration for quick testing (CPU, minimal training)
|
||||
|
||||
Key configuration parameters:
|
||||
|
||||
- `data_root`: Path to dataset
|
||||
- `output_dir`: Directory for outputs
|
||||
- `device`: Computing device ('cuda' or 'cpu')
|
||||
- `batch_size`: Batch size for training
|
||||
- `num_epochs`: Number of training epochs
|
||||
- `lr`, `momentum`, `weight_decay`: Optimizer parameters
|
||||
|
||||
## Training
|
||||
|
||||
Run the training script with a configuration file:
|
||||
|
||||
```bash
|
||||
python train.py --config configs/pennfudan_maskrcnn_config.py
|
||||
```
|
||||
|
||||
For quick debugging on CPU:
|
||||
|
||||
```bash
|
||||
python train.py --config configs/debug_config.py
|
||||
```
|
||||
|
||||
To resume training from the latest checkpoint:
|
||||
|
||||
```bash
|
||||
python train.py --config configs/pennfudan_maskrcnn_config.py --resume
|
||||
```
|
||||
|
||||
Training outputs (logs, checkpoints) are saved to `outputs/<config_name>/`.
|
||||
|
||||
## Evaluation
|
||||
|
||||
Evaluate a trained model:
|
||||
|
||||
```bash
|
||||
python test.py --config configs/pennfudan_maskrcnn_config.py --checkpoint outputs/pennfudan_maskrcnn_v1/checkpoints/checkpoint_epoch_10.pth
|
||||
```
|
||||
|
||||
This runs the model on the test dataset and reports metrics.
|
||||
|
||||
## Visualization
|
||||
|
||||
Visualize model predictions on images:
|
||||
|
||||
```bash
|
||||
python scripts/visualize_predictions.py --config configs/pennfudan_maskrcnn_config.py --checkpoint outputs/pennfudan_maskrcnn_v1/checkpoints/checkpoint_epoch_10.pth --index 0 --output prediction.png
|
||||
```
|
||||
|
||||
Parameters:
|
||||
- `--config`: Configuration file path
|
||||
- `--checkpoint`: Model checkpoint path
|
||||
- `--index`: Image index in dataset (default: 0)
|
||||
- `--threshold`: Detection confidence threshold (default: 0.5)
|
||||
- `--output`: Output image path (optional, displays interactively if not specified)
|
||||
|
||||
## Testing
|
||||
|
||||
Run all tests:
|
||||
|
||||
```bash
|
||||
python -m pytest
|
||||
```
|
||||
|
||||
Run specific test file:
|
||||
|
||||
```bash
|
||||
python -m pytest tests/test_data_utils.py
|
||||
```
|
||||
|
||||
Run tests with verbosity:
|
||||
|
||||
```bash
|
||||
python -m pytest -v
|
||||
```
|
||||
|
||||
## Debugging
|
||||
|
||||
For quick model testing without full training:
|
||||
|
||||
```bash
|
||||
python scripts/test_model.py
|
||||
```
|
||||
|
||||
This verifies:
|
||||
- Model creation
|
||||
- Forward pass
|
||||
- Backward pass
|
||||
- Dataset loading
|
||||
|
||||
For training with minimal resources:
|
||||
|
||||
```bash
|
||||
python train.py --config configs/debug_config.py
|
||||
```
|
||||
|
||||
This uses:
|
||||
- CPU computation
|
||||
- Minimal epochs (1)
|
||||
- Small batch size (1)
|
||||
- No multiprocessing
|
||||
|
||||
## Code Quality
|
||||
|
||||
Format code:
|
||||
|
||||
```bash
|
||||
ruff format .
|
||||
```
|
||||
|
||||
Run linter:
|
||||
|
||||
```bash
|
||||
ruff check .
|
||||
```
|
||||
|
||||
Fix auto-fixable issues:
|
||||
|
||||
```bash
|
||||
ruff check --fix .
|
||||
```
|
||||
|
||||
Run pre-commit checks:
|
||||
|
||||
```bash
|
||||
pre-commit run --all-files
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user