A toy example of how better connecting networks can significantly improve performance by several times!
The experiments are based on mostly making sure that the classifier network is properly connected to the other parts of the network.
This are the experiments with a list of comments
This network feeds the classifier network to the location network. The intuition is that the location network encodes some info about the expected class and its already encoded by the classifyer.
This was wrong implementation!!!
Goal: to understand if the lower part of the glimpse net has any real effect.
Result: it does improve the convergence... but don't know why!
Goal: remove the initial point uncertainty.
Result: it does improve the convergence. As the network can learn a better exploration strategy
Goal: just like the previous one, may be by removing the noise in the location the network will learn more reliable strategies.
RESULT: It does not help the network and slows convergence. My guess is that a bit of randomness helps the network discover unexpected exploration paths.
GOAL: to write a custom soft max to use for fast and stable convergence loss experiments
GOAL: The LSTM network should also have some knowledge of the current expectation. We feed the classifier's output, together with the glimpse network output.
RESULT: It works and speeds up the convergence!
GOAL: Rather than starting from the same point, we allow the network to take a very scaled down glimpse of the whole image to select the starting point
RESULT: It works and speeds up the convergence!
GOAL: we implement fast and stable convergence.
RESULT: It does work. Although it does not seem to particularly improve the convergence.
The proposed loss function is:
Where
GOAL: we implement fast and stable convergence.
RESULT: It does not work. Although Fast convergence might be working, it does not seem to particularly improve the convergence. Al the same time stable convergence seems to slow down the convergence instead
The proposed loss function is:
Where
It is worth noting that Stable convergence, seems to naturally go down in a network where it is not set as a loss. Thus we assume that, either the network is optimal when doing something funny, like waiting till the end to provide a result, or there is something wrong with this loss function.
It would be interesting to better explore this interaction!