-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added reset() function to sequential model #1908
Conversation
Ideally this pull request fixes #609 |
Hmm, but how is this different from just re-calling compile() if you are recompiling anyway? |
For me compiling didn't reset the weights. I'm not entirely sure why that's the case. I needed both - calling layer.build() for each layer (which does the actual weight randomizing) and model.compile(). Otherwise it was still reusing the old weights. And to add to that: after rebuilding each layer, compiling would fail, claiming that the layers are ill-specified. So additionally after resetting the layer parameters I had to restore the model structure (layer hyper parameters) from the config. The new model = model.reset() does all that in one go. But please - if there is any easier / cleaner way of doing that, please completely disregard my changes, and create a better pull request. My solution feels a bit hacky. |
@@ -181,6 +181,9 @@ def model_from_config(config, custom_objects={}): | |||
elif model_name == 'Sequential': | |||
model.__class__ = Sequential | |||
model.name = model_name | |||
if reset: | |||
for layer in model.layers: | |||
layer.build() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will break if layer
is a container.
You shouldn't need to re-compile, because not having to recompile is maybe the single most important aspect of a What a reset function should do is actually the equivalent of |
@fchollet I tried the following: at the end of model.Sequential.compile() I added a line to store the weights of each layer in a list. def reset(self, use_stored=False):
if use_stored:
print("using soft reset")
for i in range(len(self.stored_weights)):
self.layers[i].set_weights(self.stored_weights[i]) Now this works, as in: it resets the weights (tested with a few examples) and it doesn't need to recompile. BUT. But if I try to actually randomize the inputs again, which is -to my understanding- done in each layer's To verify this, I added a 'hard reset' function, that would go through each layer, that I marked as resettable (like Dense, but not Activation). In each of those layers it would call the build function. And I actually looked at the weights before the layer.build() and after and they are completely different. The weights are randomized again and the biases are set to zero in case of the Dense. To see what I mean: def reset(self, use_stored=False):
if use_stored:
print("using soft reset") # this works
for i in range(len(self.stored_weights)):
self.layers[i].set_weights(self.stored_weights[i])
else:
print("using hard reset") # this doesn't work
for i in range(len(self.layers)):
print (self.layers[i].get_weights()[0]) # these are the old weights
self.layers[i].build()
print (self.layers[i].get_weights()[0]) # these weights are entirely different from the old weights above
self.layers[i].set_weights(self.layers[i].get_weights()) # I tried with and without this line... no difference
# I mean the get_weights() gives different weights after the build()... how does this not work and the thing above does work?
|
....ahhh, I think I get it. The backend of Keras isn't using the actual objects, but the pointers to specific objects that were used in the function. Therefore the soft reset works because both the old weights and the stored weights point to the same variable, which was used in the loss function. But in case of the hard reset I am creating an entirely new variable in memory that no function is using. I'll try that out. |
okay, yes, that was the issue. I solved it for the Dense layer (as an example) like this (not taking into account initial weights if there were any): def reinit_weights(self):
input_dim = self.input_shape[1]
new_weights = self.init((input_dim, self.output_dim), name='{}_W'.format(self.name))
self.trainable_weights[0].set_value(new_weights.get_value()) This will write the new weights into the same shared Theano variable that the loss function is using. Only problem: now I have to write this for each layer, because each layer has a different init. I'll do this in the next few days and then submit a new pull request. |
@Kielland After @fchollet 's input I created another pull request at #2079 |
@fgolemo , thanks for that. Based on your comments above, I gave it a shot to solve this in my model code, based on your ideas above. It didn't succeed, however, so I've asked the community. Feel free to add any insights :-) Thanks! |
@fgolemo, just to follow up. I found a practical solution by @jkleint here: |
Why is this closed? There is no reset function for graphical models. Is the solution to place model bulding code within the for loop to ensure weights and optimizer states get re-initialized? |
There is a great solution posted here. You can use |
once the model is created and trained, it can be rest like this:
@fchollet:
If this shouldn't be a clean way of doing that, please suggest a different solution.