Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] make autobatching matrix multiplies more flexible #566

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yoavg
Copy link
Contributor

@yoavg yoavg commented May 25, 2017

This introduces two batchable versions of matmul: one in which the first argument is part of the signature, and a second in which it is not. The first version is triggered for cases where the first argument is shared with >2 matmul nodes.

@neubig
Copy link
Contributor

neubig commented May 26, 2017

Hmm, I really don't like the special handling of matrix multiplies here.
Even if we do introduce something like this I think we should try to be more general.

@yoavg
Copy link
Contributor Author

yoavg commented May 26, 2017

We can easily count how many different nodes a given node is an arg of, instead of how many matmuls. But why is it better? Where is this relevant besides matmul (and maybe affine transform in the future)?

@neubig
Copy link
Contributor

neubig commented May 26, 2017

Ones that are highly relevant in addition to matmul are affine transform, conv2d, tensor contraction, etc. (I'm probably forgetting several). All of these will be much faster if you share parameters vs. iterating over them.

Everything else with an arity over 2 is somewhat relevant. Sharing parameters will reduce the need for a memory copy at least.

Basically my main priority is either that all nodes be treated the same, or alternatively that we have a couple of equivalence classes like "benefit largely from grouping" or "don't benefit much from grouping", so when we implement a new node we can specify which one it belongs to.

@neubig
Copy link
Contributor

neubig commented May 26, 2017

I thought about this a little more. What about keeping a reference count for each node, then providing this reference count to the autobatch_profile() function? For operations where grouping is beneficial, the profile can then decide to group together arguments based on their reference count. For example, we may want to group together all arguments that have a reference count equal to the maximum of all the incoming arguments. This will work for matmul of course (the parameter matrix should have more references than others), and will work for AffineTransform as well, because the weight matrix and bias vector should both have the same reference count most of the time. How does this sound? If it seems reasonable I can try it.

@yoavg
Copy link
Contributor Author

yoavg commented May 26, 2017

I don't fully understand the details of your proposal, but it sounds like extending the count from being first args of matmul to being an arg of anything, and a cleverer way of setting the threshold. sure, sounds good for me, go for it.

another option I thought about (which is a little less automatic) is to add a "shared" flag to nodes. Param nodes will have this on by default, for others it can be turned on. The autobatch_sig and autobatch_concat methods will look at this flag for the args and act accordingly. But if you can get your proposal to work, lets go for it.

@neubig
Copy link
Contributor

neubig commented May 26, 2017

Cool, I'll give this a shot.

@neubig neubig changed the title introduce a version of matmul which does not depend on first arg [WIP] make autobatching matrix multiplies more flexible Sep 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants