You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@Chetan-Yeola According to the presentation page of this other library , langdetect performs poorly on texts with length similar to twitter messages ("For very short text snippets such as Twitter messages, they do not provide adequate results."). Which means anything less than 280 characters might give poor results, assuming the page does not exaggerate the problem. However, the page is a bit vague, and the threshold (if any) might be higher than 280 characters. It also probably depends on the language considered (I guess that some languages may be much easier to detect than others -e.g. consider detecting Hebrew, which uses a rare alphabet, vs. detecting Spanish, which is very similar to other Romance languages).
But you could try and test automatically with a large sample of short texts taken from various language instances of Wikipedia, to see if the error rate is OK relative to your requirements. The previous page does not mention the classification error rate they observed to make this statement, so if your own requirements relative to the error rate are very liberal, it may be worth take the time to test.
What is considered a 'short text' in langdetect, and is there a specific minimum text length threshold for reliable language detection?
The text was updated successfully, but these errors were encountered: