Machine learning models to automatically predict credit card frauds
The dataset for this project was obtained from kaggle website.
[https://www.kaggle.com/mlg-ulb/creditcardfraud]
- Python 3.6
- Pandas and Numpy
- Keras 2.1.6
- TensorFlow 1.8
- Sklearn Library 0.19.1
- Python's Matplotlib (2.2.2) for Visualization
1. Checking the target class. (0 - No Fraud 1 - Fraud).
2. Checking Time of Transaction vs Amount of Transaction for each class ((0 - No Fraud 1 - Fraud)
3. Amount per Transaction for each class (0 - No Fraud 1 - Fraud)
4. Correlation Matrix between different features of the dataset
3. Describing the dataset
- Feature Elimination
- Data Normalization
- Balancing the skewed dataset using
3.1 Undersampling technique
3.2 Oversampling using SMOTE technique
Used Sklearn, TensorFlow and Keras libraries to built the following models in Spyder IDE.
- Autoencoder Artificial Neural Networks
- Random Forest
- Logistic Rregression
- 5 fold cross Validation
- Confusion Matrix
- Precision - Recall Curves
- Cohen's Kappa Statistic
- AUC - ROC curves
1.1 Model Loss for Autoencoders - Plot of Loss vs Epoch of training and testing data
Epoch = 88 , Batch_size = 32
1.2 Error Distribution of Autoencoders for each class (0 - No Fraud 1 - Fraud).
reconstruction error with no fraud reconstruction error with fraud
1.3 AUC - ROC and Confusion Matrix for Autoencoders Neural Nets
1.3 Precision- Recall Curves for Autoencoders Neural Nets
2.1 AUC - ROC for Random Forest
2.2 Precision- Recall Curves for Random Forest
2.3 Confusion Matrix for Random Forest
3.1 AUC - ROC for Logistic Rregression
3.2 Precision- Recall Curves for Logistic Rregression
3.3 Confusion Matrix for Logistic Rregression