Siamese network is a neural network that contain two or more identical subnetwork. The objective of this network is to find the similarity or comparing the relationship between two comparable things. Unlike classification task that uses cross entropy as the loss function, siamese network usually uses contrastive loss or triplet loss.
Siamese network has a lot of function, this repository is trying to use Siamese network to do a dimensionality reduction and image retrieval.
This project follows Hadsell-et-al.'06 [1] by computing the Euclidean distance on the output of the shared network and by optimizing the contrastive loss (see paper for more details). The contastive loss is defined as follows
The is the distance of between the output of the network with the input and the input .
The similarity function is defined as . This function will be activated when the Label equal to 1 and deactivated when is equal to 0. The goal of this function is to minimize the distance of the pairs.
The dissimilarity function is defined as . This function will be activated when the Label is equal to 0 and deactivated when is equal to 1. The goal of this function is to give a penalty of the pairs when the distance is lower than margin .
[1] "Dimensionality Reduction by Learning an Invariant Mapping" http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
The input of these will be image_left, image_right and . Our model uses 5 layer of convolutional layer and pooling followed. We do not use fully convolutonal net because convolution operation is faster on GPU(especially using CUDNN). See http://cs231n.github.io/convolutional-networks/#convert for more information on converting FC layer to Conv layer.
Train the model
git clone https://github.com/ardiya/siamesenetwork-tensorflow
python train.py
Tensorboard Visualization(After training)
tensorboard --logdir=train.log
- Update the API to 1.0
- Cleanup the old code
The images below shows the final Result on MNIST test dataset. By only using 2 features, we can easily separate the input images.
The gif below shows some animation until it somehow converges.
Image retrieval uses the trained model to extract the features and get the most similar image using cosine similarity. See here