GitHub - deepthivenkat/Language-detection-using-LSTM

The model performs very well at predicting the next character of a word. Since both English and French share a lot of alphabets in common and lots of words like ‘declaration’ are common for both the languages, using this model to perform language prediction may not be the greatest idea. A simple logistic regression model performs much better The other approaches I have explored include:

Hyper parameter optimization
Using simple lstm and stacked lstm
Classification using language labels – all tough in real world language labels may not always be available
Early stopping using validation split
Calculating distance between language profile and document profile by generating ngrams for different languages. 5 ways to improve the model:
Limiting batch size to a number less than 1000
Running it for an optimal number of epochs to avoid overfitting. This can be achieved to through enabling early stopping on validation loss by setting the patience variable. By Early stopping, the model stops training as soon as it stops learning.
Adding multiple lstm stacks.
Enabling stateful = True in lstm. This helps the recurrent model to reuse the internal states and the model along with the internal states are used as initial states for the next batch
The categorical_crossentropy as loss variable and rmsprop works as optimizer works best along with softmax activation – this model gave the best ROC AUC score and accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Project+2.ipynb		Project+2.ipynb
README.md		README.md
eng.txt		eng.txt
frn.txt		frn.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

deepthivenkat/Language-detection-using-LSTM

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages