GitHub

SVM for Spam Classification

Training set and testing set are generated by transforming emails into binary feature vectors, which will be used by SVM training algorithm to generate an optimum model. The optimum model is them stored in 'model.mat' and can be be used to predict whether an email is a spam later by running 'prediction.m'.

This project is part of Stanford Machine Learning course on Coursera.

File structure

/spamTrain.mat and /spamTest.mat

contains 4000 training examples of spam

and non-spam email, while spamTest.mat contains 1000 test examples. Each

original email was processed using the processEmail and emailFeatures

functions and converted into a vector.
/vocab.txt

vocabulary list was selected by choosing all words which occur at least a 100 times in the spam corpus,

resulting in a list of 1899 words.
/training.m

It trains a SVM with linear kernel for Spam Classification, and writes 'model' into model.mat aftering training.
/prediction.m

It reads an email(without headers) from 'input.txt' and predicts whether it is a spam or not.
/In processEmail.m,

we have implemented the following email prepro-cessing and normalization steps:

Lower-casing, stripping HTML, normalizing URLs, normalizing email addresses, normalizing numbers, normalizing Dollars, word stemming, removal of non-words.

Tutorial

1.Starting the Octave and move to the folder which contains all the source files.

2.Type

training

in the Octave to train a SVM.

3.Copy an email(without headers) you want to test into 'input.txt'

4.Type

prediction

in the Octave to predict whether an email is a spam or not.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
emailFeatures.m		emailFeatures.m
gaussianKernel.m		gaussianKernel.m
getVocabList.m		getVocabList.m
input.txt		input.txt
linearKernel.m		linearKernel.m
model.mat		model.mat
porterStemmer.m		porterStemmer.m
prediction.m		prediction.m
processEmail.m		processEmail.m
readFile.m		readFile.m
spamTest.mat		spamTest.mat
spamTrain.mat		spamTrain.mat
svmPredict.m		svmPredict.m
svmTrain.m		svmTrain.m
training.m		training.m
vocab.txt		vocab.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SVM for Spam Classification

File structure

Tutorial

About

Releases

Packages

Languages

timothywangdev/SpamClassification

Folders and files

Latest commit

History

Repository files navigation

SVM for Spam Classification

File structure

Tutorial

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages