Language Classifier

DSC 140A: Probabilistic Modeling and Machine Learning at UC San Diego

A simple Least Squares Classifier. Predicts whether a given word is Spanish or French based on a few bi-gram features.

I wrote a function that generates every two letter sequence in the alphabet to use it as a feature; I also manually added some common French and Spanish sequences and preffixes. This model achieved an accuracy of at least 75% on the training data. This model performed well on the unseen data and achieved an accuracy of %84.12 on the leaderboard.

Acknowledgements: Professor Justin Eldridge, UC San Diego.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
classify-spanish-french.py		classify-spanish-french.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Language Classifier

DSC 140A: Probabilistic Modeling and Machine Learning at UC San Diego

About

Releases

Packages

Languages

malmahasnah/languageclassifier

Folders and files

Latest commit

History

Repository files navigation

Language Classifier

DSC 140A: Probabilistic Modeling and Machine Learning at UC San Diego

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages