This project implements an image style transfer algorithm using a pre-trained VGG19 model. The goal is to generate a target image that combines the content of a source image with the style of another image.
- Loading and transforming images
- Extracting features from images using a pre-trained VGG19 model
- Calculating the Gram matrix for style representations
- Optimizing the target image to minimize content and style loss
The code is organized into several functions:
This function loads an image from a given path, resizes it if necessary, and transforms it into a PyTorch tensor.
Parameters:
img_path
: Path of the image to load.max_size
: Maximum size for the largest dimension of the image.shape
: Target shape for the image (optional).
This function converts a normalized image tensor into a NumPy image for display.
Utility function to display an image using Matplotlib.
This function passes an image through the VGG19 model and retrieves features for a specified set of layers.
Parameters:
image
: The image to process.model
: The model used (VGG19).layers
: The layers from which to extract features (optional).
This function calculates the Gram matrix of a given tensor. The Gram matrix is used to represent the style of an image.
In the main block, the VGG19 model is loaded, and the content and style images are prepared. Then, the content and style features are extracted, and a target image is optimized using an Adam optimizer.
The code minimizes two types of losses:
- Content Loss: compares the feature representations of the content image and the generated image. Let $ P $ be the feature map of the content image, and $ F $ be the feature map of the generated image, both extracted from layer $ l $ of the network. The content loss is:
- Style Loss: measures the differences between the style representations of the style image and the generated image, using the Gram matrices $ G $ and $ A $ of the generated image and the style image, respectively, extracted at layer $ l $. The Gram matrix $ G^l $ is computed by multiplying the feature map $ F^l $ by its transpose. The style loss is:
- Place the content and style images in the
data/
folder with the namesi.jpg
ands.jpg
respectively. - Run the Python script to generate a target image that combines the content of the source image and the style of the target image. The results will be saved in the
data/
folder with the prefixoutput
.
- Make sure the necessary libraries (PyTorch, torchvision, PIL, matplotlib, numpy, tqdm, tensorboard) are installed before running the script.
- You can install the environment using the provided
environment.yml
file.
conda env create -f environment.yml
conda activate ml
The following images show the content, style, and target images generated by the algorithm:
Content Image | Style Image | First Epoch Result | Last Epoch Result |
---|---|---|---|