Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Weibull loglikelihood issues #59

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions python/wtte/wtte.py
Original file line number Diff line number Diff line change
Expand Up @@ -166,13 +166,13 @@ def loglik_discrete(y, u, a, b, epsilon=K.epsilon()):
hazard1 = K.pow((y + 1.0) / a, b)

loglikelihoods = u * \
K.log(K.exp(hazard1 - hazard0) - (1.0 - epsilon)) - hazard1
K.log((1.0 + epsilon) - K.exp(hazard0 - hazard1)) - hazard0
return loglikelihoods
Comment on lines 168 to 170
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the discrete case it’s equivalent. It’s just a matter of where we put epsilon:

 K.log((1.0 + epsilon) - K.exp(hazard0 - hazard1)) - hazard0
=K.log((1.0 + epsilon) - K.exp(hazard0)/K.exp(hazard1)) - hazard0
=K.log(K.exp(hazard0)[(1.0 + epsilon)/K.exp(hazard0) - 1/K.exp(hazard1)]) - hazard0
=K.log([(1.0 + epsilon)*K.exp(-hazard0) - K.exp(-hazard1)])
=K.log(K.exp(-hazard1)[(1.0 + epsilon)*K.exp(hazard1-hazard0) - 1])
=K.log([(1.0 + epsilon)*K.exp(hazard1-hazard0) - 1])-hazard1
=K.log([(1.0)*K.exp(hazard1-hazard0) - 1])-hazard1
~K.log([K.exp(hazard1-hazard0) - 1])-hazard1

Which form is right is just a matter of style. My reasons is that I like to emphasis that for positive distributions discretized loglikelihood can always be written on the form u*log[exp(Λ(y+1)-Λ(y))-1]-Λ(y+1), which is a form I chose because empirically it seemed most numerically stable/efficient and that automatic differentiation seemed to unfold it efficiently. I may be wrong about this however so please provide me with counterexamples if you have them.
See proposition 2.26.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the only true difference was in case of u=0.



def loglik_continuous(y, u, a, b, epsilon=K.epsilon()):
ya = (y + epsilon) / a
loglikelihoods = u * (K.log(b) + b * K.log(ya)) - K.pow(ya, b)
loglikelihoods = u * (K.log(b) - K.log(a) + (b-1) * K.log(ya)) - K.pow(ya, b)
Copy link
Owner

@ragulpr ragulpr Oct 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the only terms that differ:

b * K.log(ya)
vs
- K.log(a) + (b-1) * K.log(ya)

Where the latter can be written

- K.log(a)+b* K.log(ya)-1* K.log(ya)
=[b* K.log(ya)]- K.log(a)-1* K.log(ya)
=[b* K.log(ya)]- K.log(a)-K.log(y+eps)+K.log(a)
=[b* K.log(ya)]- K.log(y+eps)

With regards to the parameters, this is proportional to

∝b * K.log(ya)

Since K.log(y+eps) has zero gradient, so it's unnecessary to compute it. The standard in almost every statistical/ml package I've seen is - for computational reasons- to implement loss functions using the only the terms that are proportional- rather than equal to the log-likelihood.

The upsides are very marginal computational benefit. The downsides are that it can be confusing if one expects that exp(-loss)="probability of seeing this data with these parameters". Log-likelihood are hence rarely interpretable or directly comparable across distributions and implementations except in proportionality.

I'm leaning towards that the upsides is not worth the downsides and would be open for a loss function that is equal, rather than proportional. I'm a bit worried about touching these equations too much, they are battle tested in numerous fights against NaN and have proven themselves very stable and performant. A form like

     loglikelihoods = u * (K.log(b) + b* K.log(ya)- K.log(y+eps)) - K.pow(ya, b)

Seems like a pretty safe and cheap alternative however.

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now when I come think of it once again, you're right! There is no need to add redundant constant term to the log-likelihood in continuous case.

return loglikelihoods


Expand Down