Skip to content

Derivation of backpropagation equations based on specific activation functions & analysis of the behavior of activation function gradients across Neural Networks of varying depths

Notifications You must be signed in to change notification settings

KyriakosPsa/ActFunc-GradientAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Activation Functions & Gradient Analysis

This work is one of a series of three distinct repositories that collectively constitute the coursework focused on Neural Networks for the Data Science & Information Technologies masters course: Μ124 - Machine Learning at the National and Kapodistrian University of Athens (NKUA) during the Fall 2022 semester. The two other repositories deal with:

This specific repository presents an analysis of various activation functions in the context of Multi-Layer Perceptrons (MLPs), trained on the MNIST dataset of handwritten digits. It provides a study of the behavior of these functions regarding their gradients, and their impact on the learning process and model performance.

Overview

The following tasks have been performed in the included notebook:

  • Task A: The backpropagation equations have been rewritten for specific activation functions: ReLU, hyperbolic tangent, and sigmoid. The range of gradients for each activation function has also been provided. Below we present the visualization of the activation functions and their gradients.

ReLU:

Alternate text for image

Tanh:

Alternate text for image

Sigmoid:

Alternate text for image


  • Task B: The MNIST dataset was used to train a fully-connected neural network to recognize handwritten digits. A comparison study was carried out by varying the number of layers (5, 20, and 40) and activation functions (ReLU, hyperbolic tangent, sigmoid) used in the model.The corresponding test scores for each model were reported, along with insightful observations. Again below we showcase the visualization of the test accuracy scores per epoch for all activation functions and layer depths, to illustrate the vanishing gradient problem.

Shallow network:

Medium depth Network:

Deep Network:

  • Task C: For each model trained in Task B, the maximum gradient value for each layer was calculated for a given mini-batch. A plot showcasing the correlation between layer depth and maximum gradient was created, offering visualization and analysis of the results.

layer depth vs maximum gradient:

  • Task D: A model was trained using the given the architecture from Task B, but with the LeCun activation function. An analysis was performed between the learning curves of models using the LeCun and hyperbolic tangent activation functions. Further, the backpropagation equations and the gradient range for the LeCun activation function were derived. Finally, the gradients for different depth choices were plotted for an untrained model using LeCun and hyperbolic tangent activations.

LeCun activation vs tanh:

About

Derivation of backpropagation equations based on specific activation functions & analysis of the behavior of activation function gradients across Neural Networks of varying depths

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published