Skip to content
This repository has been archived by the owner on Aug 15, 2020. It is now read-only.

Output layer question #178

Open
bkj opened this issue Jun 8, 2018 · 6 comments
Open

Output layer question #178

bkj opened this issue Jun 8, 2018 · 6 comments

Comments

@bkj
Copy link

bkj commented Jun 8, 2018

The output layer in these networks is often a bottleneck, because you have to do a (batch_size, hidden_dim) by (hidden_dim, num_classes) dense matrix multiplication. It doesn't seem like you'd get a speed up just by avoiding storing/multiplying by zeros -- are you doing any kinds of tricks here to reduce the cost of that operation?

Thanks
~ Ben

@scottlegrand
Copy link
Contributor

We have some ideas here based on approximate kNN methods. Stay tuned.

@bkj
Copy link
Author

bkj commented Jun 12, 2018

Interesting -- are you thinking just for inference or for both inference and training?

@scottlegrand
Copy link
Contributor

scottlegrand commented Jun 12, 2018 via email

@bkj
Copy link
Author

bkj commented Jun 12, 2018

Relevant code and paper, if you haven't seen it:
https://github.com/rdspring1/LSH_DeepLearning
I don't think they did it on GPUs or big models, but maybe interesting

@scottlegrand
Copy link
Contributor

scottlegrand commented Jun 12, 2018 via email

@bkj
Copy link
Author

bkj commented Nov 21, 2018

Any updates on this? I'm trying to find an example of a library that uses approximate kNN methods to speed up the output layer.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants