Description: This program predicts whether a given comment contains derogatory or abusive content or not. Two machine learning algorithms: support vector machines and multinomial naive bayes are used for this purpose.
Prerequisites: The program is written in Python 3.6 . The following libraries are required to run this program :
- sklearn
- numpy
- pandas
- os
- re
- csv
Installation:
- Pip can be installed by using the following command: sudo easy_install pip
- Scikit-learn can be installed by using the following command: sudo pip install -U numpy scipy scikit-learn
- Pandas can be installed by using the following command: pip install pandas Other libraries can be installed by using the pip command in a similar way shown above.
Instructions to run:
- The program requires the dataset to be present in the directory which can be found here .
- The files needed are train.csv, impermium_verification_labels.csv and test_with_solutions.csv (Note: Remove the column Usage from the file test_with_solutions.csv)
- To increase the number of training instances, we have merged the files train.csv and impermium_verification_labels.csv. train.csv can also be used individually.
- Create two directories where the python code is located. Name them data and cleaned_data/data and store the files from the given link in the directory: data.
- Run the program by using the following command: python abusive_content_detection.py
Authors: The authors for this program are: Prajakta Gaydhani(pag3862), Virtee Parekh(vvp2639), Vaibhav Nagda(vjn4006).