Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group conversation comments classification #15

Open
phpduke opened this issue Jun 9, 2019 · 3 comments
Open

Group conversation comments classification #15

phpduke opened this issue Jun 9, 2019 · 3 comments

Comments

@phpduke
Copy link
Contributor

phpduke commented Jun 9, 2019

Is your feature request related to a problem? Please describe.
Yes. Typically in group conversation of chat apps, we find that one of person says for example 'I got into university X' and there will be 50 messages like - 'wow', 'nice', 'congrats', 'you did it', etc. All of these messages have equal status that of a "message" so when another user comes online 3 hrs later, he has to scroll up through 50 such messages to find out what the main message actually was. So in active group people miss a lot of imp messages.

Describe the solution you'd like
Ideally such messages should be identified as comment and by default shown in collapsed mode under the main post. In above scenario, main message would be "I got into university X" and below it should be "50 comments made" and no more, clicking on this link should open them all.

Describe alternatives you've considered
Providing a default comment option but that is user dependent and will require educating users to use chat app more responsibly which is not always practical. Its better if we can do this using NLP.

Additional context
One challenge using NLP approach to solving this problem is, for multiple languages, we will need multiple trained modules which is not scalable across multiple languages. This is an open problem for us at the moment and it will be amazing if someone can come up with a solution that is scalable across multiple low resource languages.

@Mishrasubha
Copy link
Contributor

Mishrasubha commented Mar 18, 2020

Hey, I am applying in RGSoC this year and would like to get assigned to this issue.

@phpduke
Copy link
Contributor Author

phpduke commented Mar 20, 2020

Hey, I am applying in RGSoC this year and would like to get assigned to this issue.

Sure @Mishrasubha . Can you please post ideas that you have in mind for solving this problem in the slack group?

@Mishrasubha
Copy link
Contributor

A.'I got into university X' and there will be 50 messages like - 'wow', 'nice', 'congrats', 'you did it'

  1. this task can be modeled as a classification task to predict whether a response is relevant for a query or statement.
  2. If multiple responses are relevant for a query or proposition, then all the messages/responses will be coalesced under a single category.
  3. This can be done in different ways. The representation of a message and corresponding responses is the key.
    a. A netwok consisting of LSTM (shared weights) to represent the message and response using word embeddings.
    b. A dot product of final hidden states can determine the relevance of a response conditioned on the original message.
    c. The final output layer includes a sigmoid with mean squared error as the loss function.

B. For multiple languages

  1. First task is the indentification of language for each word.
  2. We will consider the approaches used in language identification in code-mixed settings (where different words are written in roman script).
  3. The language identification task is a classification task where each class denoted a language.
    a. For each word, we will use embeddings for each character and a LSTM/BiLSTM learns the representation of each word.
    b. The final layer is a softmax where no of nodes in the output layer = no of languages we have
  4. Once the language is identified, the words corresponding to a specific language represent the response.
  5. For every language, we can have models for task A to find the relevance of the responses.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants