Predict user scores(1-5) from their Reviews
- Used python evironment(anaconda)
pip install libraries
-
scikit-learn
-
Pandas
-
NumPy
-
os
-
spacy
-
nltk
-
tensorflow, keras
-
seaborn
-
re
-
Also need to download required nltk packages
Data is avaialable in https://www.kaggle.com/snap/amazon-fine-food-reviews, you need to put the data in a folder named "archive" in the same directory.
-
Python files :
- Amazon_Food_Review.ipynb : Main notebook.
-
Python files :
- textpreprocesser.py : Class with functions for preprocessing the text data.
- textpreprocesser_tester.py : Unit tests for textpreprocesser.py
- modelsk.py : Class to build machine learning pipelines by using sklearn.
- modelsk_tester.py : Unit tests for modelsk.py
- modelkeras.py : Class to build neural network models by using keras.
- keras_plotters.py : Functions to plot learning curve for keras model.
- Loading & Exploring the data
- Preprocessing the data
- Converting text to lower case
- Removing html tags
- Removing punctuations
- Removing stop words
- Lemmatization
- Creating Dictionary
- Creating frequency dictionary
- Get top n frequent words
- Removing reviews longer than k words
- Building SKlearn Models
- Naive Bayes
- Support Vector Machine
- Logistic Regression
- Building Keras Models
- RNN
- Also in the end tried to analyze helpful reviews.(But i didn't have much time left)
Here as score 0 refers to 1, 1->2, ..., 4->5
- Since the data is imbalanced we can try resampling the data (downscaling the 5s or upscaling 1-2-3-4)
- Training multiple models then ensemble
- Hyperparameter tuning