-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Weibull loglikelihood issues #59
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loglikelihoods = u * \ | ||
K.log(K.exp(hazard1 - hazard0) - (1.0 - epsilon)) - hazard1 | ||
K.log((1.0 + epsilon) - K.exp(hazard0 - hazard1)) - hazard0 | ||
return loglikelihoods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the discrete case it’s equivalent. It’s just a matter of where we put epsilon:
K.log((1.0 + epsilon) - K.exp(hazard0 - hazard1)) - hazard0
=K.log((1.0 + epsilon) - K.exp(hazard0)/K.exp(hazard1)) - hazard0
=K.log(K.exp(hazard0)[(1.0 + epsilon)/K.exp(hazard0) - 1/K.exp(hazard1)]) - hazard0
=K.log([(1.0 + epsilon)*K.exp(-hazard0) - K.exp(-hazard1)])
=K.log(K.exp(-hazard1)[(1.0 + epsilon)*K.exp(hazard1-hazard0) - 1])
=K.log([(1.0 + epsilon)*K.exp(hazard1-hazard0) - 1])-hazard1
=K.log([(1.0)*K.exp(hazard1-hazard0) - 1])-hazard1
~K.log([K.exp(hazard1-hazard0) - 1])-hazard1
Which form is right is just a matter of style. My reasons is that I like to emphasis that for positive distributions discretized loglikelihood can always be written on the form u*log[exp(Λ(y+1)-Λ(y))-1]-Λ(y+1)
, which is a form I chose because empirically it seemed most numerically stable/efficient and that automatic differentiation seemed to unfold it efficiently. I may be wrong about this however so please provide me with counterexamples if you have them.
See proposition 2.26.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the only true difference was in case of u=0
.
loglikelihoods = u * (K.log(b) + b * K.log(ya)) - K.pow(ya, b) | ||
loglikelihoods = u * (K.log(b) - K.log(a) + (b-1) * K.log(ya)) - K.pow(ya, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the only terms that differ:
b * K.log(ya)
vs
- K.log(a) + (b-1) * K.log(ya)
Where the latter can be written
- K.log(a)+b* K.log(ya)-1* K.log(ya)
=[b* K.log(ya)]- K.log(a)-1* K.log(ya)
=[b* K.log(ya)]- K.log(a)-K.log(y+eps)+K.log(a)
=[b* K.log(ya)]- K.log(y+eps)
With regards to the parameters, this is proportional to
∝b * K.log(ya)
Since K.log(y+eps)
has zero gradient, so it's unnecessary to compute it. The standard in almost every statistical/ml package I've seen is - for computational reasons- to implement loss functions using the only the terms that are proportional- rather than equal to the log-likelihood.
The upsides are very marginal computational benefit. The downsides are that it can be confusing if one expects that exp(-loss)="probability of seeing this data with these parameters"
. Log-likelihood are hence rarely interpretable or directly comparable across distributions and implementations except in proportionality.
I'm leaning towards that the upsides is not worth the downsides and would be open for a loss function that is equal, rather than proportional. I'm a bit worried about touching these equations too much, they are battle tested in numerous fights against NaN and have proven themselves very stable and performant. A form like
loglikelihoods = u * (K.log(b) + b* K.log(ya)- K.log(y+eps)) - K.pow(ya, b)
Seems like a pretty safe and cheap alternative however.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now when I come think of it once again, you're right! There is no need to add redundant constant term to the log-likelihood in continuous case.
This is a fix for both continuous and discrete Weibull log-likelihood issues #58 and #56