You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered some problems using the masking layer. The network, instead of skipping the padded timestamps, computes the gradients obtaining nan values. More in detail, I have padded the sequences with the value -1.0 using the pad_sequences function implemented in keras. Then, I have trained the model using the train_on_batch method.
Do you already face these kinds of problems?
Can be this explanation a reason for such problems? "If any downstream layer does not support masking yet receives such an input mask, an exception will be raised." -- keras documentation
The text was updated successfully, but these errors were encountered:
Hi thanks for comment! Do you have a reproducible example? I've never used pad_sequences myself.
In any case (when it's working) mask layer will multiply loss function by 0/1 mask if all above layer propagates the mask. So if any of the outputs is NaN then endresult would be NaN after summation
I encountered some problems using the masking layer. The network, instead of skipping the padded timestamps, computes the gradients obtaining nan values. More in detail, I have padded the sequences with the value -1.0 using the pad_sequences function implemented in keras. Then, I have trained the model using the train_on_batch method.
Do you already face these kinds of problems?
Can be this explanation a reason for such problems? "If any downstream layer does not support masking yet receives such an input mask, an exception will be raised." -- keras documentation
The text was updated successfully, but these errors were encountered: