Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds nn.Normalize #341

Merged
merged 1 commit into from
Sep 14, 2015
Merged

Adds nn.Normalize #341

merged 1 commit into from
Sep 14, 2015

Conversation

fmassa
Copy link
Contributor

@fmassa fmassa commented Aug 1, 2015

Generalizes #260 from @karpathy to accept arbitrary L_p norms.
A few remarks:

  • Maybe LpNormalize or Normalize is a better name ?
  • Only accepts 1D/2D inputs. @nicholas-leonard proposed to add a dim parameter to make it equivalent to torch.norm. Is it worth it given that we have SpatialBatchNormalization ?
  • updateGradInput is quite memory consuming for large d.

cc: @bamos

@szagoruyko
Copy link
Member

definitely the name has to be Normalize, not Norm
is it hard to extend to other dimensions? SpatialBatchNormalization is different

@fmassa
Copy link
Contributor Author

fmassa commented Aug 2, 2015

@szagoruyko Maybe it's not too difficult to extend it to work for other dimensions, by viewing the input to be 2D with the last dimension the desired normalized dimension, but one has to take care about batched/non-batched inputs as well. Maybe we could add a setNumInputDims function, as in nn.View ?
I'll spend some more time trying to make it more generic.

Everybody ok to renaming it to Normalize ?

@karpathy
Copy link
Contributor

karpathy commented Aug 2, 2015

Why is it obvious that the name should be Normalize and not Norm? As far as I can tell almost all operations from torch are ported without name changes to nn layers. E.g. analogous to max operation there is a nn.Max, so why is it obvious that norm operation should be nn.Normalize? Shouldn't we have nn.Maximize then? My first instinct would be to stick with the current naming for naming consistency.

@soumith
Copy link
Member

soumith commented Aug 2, 2015

torch.norm returns the p-norm. isn't this module also normalizing the input according to the returned norm?

@karpathy
Copy link
Contributor

karpathy commented Aug 2, 2015

Good point. Almost suggest that there should be both nn.Norm that does exactly what norm does, and then nn.Normalize that also does a div right afterwards. But perhaps that gets a bit too hairy then :)

@soumith
Copy link
Member

soumith commented Aug 2, 2015

That's actually a good idea :)

@soumith soumith changed the title Adds Norm Adds Normalize Sep 4, 2015
@soumith soumith changed the title Adds Normalize Adds nn.Normalize Sep 4, 2015
@soumith
Copy link
Member

soumith commented Sep 4, 2015

@fmassa looks like it's good to go is it?

@fmassa
Copy link
Contributor Author

fmassa commented Sep 4, 2015

It's good. I just need to squash the commits.
I didn't had the time to add a dimension parameter though, it would complicate a bit the logic because of the setNumInputDims function. But it could be added later if needed.

@soumith
Copy link
Member

soumith commented Sep 11, 2015

squash, good to go.

@fmassa
Copy link
Contributor Author

fmassa commented Sep 12, 2015

@soumith I just squashed the commits.

soumith added a commit that referenced this pull request Sep 14, 2015
@soumith soumith merged commit 5fa1a8b into torch:master Sep 14, 2015
@soumith
Copy link
Member

soumith commented Sep 14, 2015

thanks a lot Francisco, for the excellent PR.

@fmassa fmassa deleted the norm branch September 14, 2015 06:22
@ffmpbgrnn
Copy link

Lots of memory is needed in backprop of this module. One reason might be creating the eyeExpand matrix and later doing multiplication. In my case, when the batch size is 64, input dimension is 4800, two Normalize layer would be out of memory for one 4G memory GPU. Any idea to implement a more space efficient Normalize layer?

@fmassa
Copy link
Contributor Author

fmassa commented Oct 3, 2015

@ffmpbgrnn here is a version of Normalize which uses much less memory (it doesn't depend on the batch size anymore). It should be slower on GPU. The tests passes so it should be fine. Use it with fastMode(false).
fmassa@015ba9c

@ffmpbgrnn
Copy link

@fmassa thanks for your work. I will give a try!

@fmassa
Copy link
Contributor Author

fmassa commented Oct 5, 2015

@ffmpbgrnn so, does this patch works fine for you ? Is it much slower than the previous version ?
Maybe we could push this simplified version to master (taking of the faster mode to make things simple) ?
cc @soumith

@ffmpbgrnn
Copy link

@fmassa , sorry for my late response..Yes, it passes the test. But when I test fastMode(false) mode on GPU, it's even faster than the fastMode(true).

require 'nn'
require 'cutorch'
require 'cunn'

local module = nn.Normalize(2):cuda()
module:fastMode(false)
local input = torch.rand(64, 2400):cuda()
local t = torch.Timer()
for i = 1, 100 do
    module:forward(input)
    module:backward(input, input)
    print(i)
end
print(t:time().real/100)

fastMode(false), output: 0.09
fastMode(true), output: 0.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants