Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to find a model above threshold 0.5. Returning None. what is a recommended score_threshold #87

Open
zoezhang106 opened this issue Sep 12, 2024 · 0 comments
Assignees

Comments

@zoezhang106
Copy link

Hi,

Thank for the amazing package.

I tried the advanced survival analysis demo code in the homepage.

There seem to be two small bugs:
1.
eval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1]
should be:
eval_time_horizons = list(np.linspace(T.min(), T.max(), 5)[1:-1])
otherwise, you will get an error message saying:
eval_time_horizons is not a list.

  1. get an error message saying:

[2024-09-12T12:32:21.809928+0200][25888][CRITICAL] Unable to find a model above threshold 0.5. Returning None

Inside RiskEstimationStudy, I saw that score is based on, and the default threshold is 0.65.
score = metrics["raw"]["c_index"][0] - metrics["raw"]["brier_score"][0]

From, my understanding 0.5 is already a bit low, by seting a low cutoff like 0.2 can make it work, I am just curious what the lowest cutoff for this score is acceptable.

The complete code from homepage showed below:

# stdlib
import os
from pathlib import Path

# third party
import numpy as np
from pycox import datasets

# autoprognosis absolute
from autoprognosis.studies.risk_estimation import RiskEstimationStudy
from autoprognosis.utils.serialization import load_model_from_file
from autoprognosis.utils.tester import evaluate_survival_estimator

df = datasets.gbsg.read_df()
df = df[df["duration"] > 0]

X = df.drop(columns = ["duration"])
T = df["duration"]
Y = df["event"]

eval_time_horizons = np.linspace(T.min(), T.max(), 5)[1:-1]

workspace = Path("workspace")
study_name = "example_risks"

study = RiskEstimationStudy(
    study_name=study_name,
    dataset=df,
    target="event",
    time_to_event="duration",
    time_horizons=eval_time_horizons,
    num_iter=10,
    num_study_iter=1,
    timeout=10,
    risk_estimators=["cox_ph", "survival_xgboost"],
    score_threshold=0.5,
    workspace=workspace,
)

study.run()

output = workspace / study_name / "model.p"

model = load_model_from_file(output)
# <model> contains the optimal architecture, but the model is not trained yet. You need to call fit() to use it.
# This way, we can further benchmark the selected model on the training set.

metrics = evaluate_survival_estimator(model, X, T, Y, eval_time_horizons)

print(f"Model {model.name()} score: {metrics['str']}")

# Train the model
model.fit(X, T, Y)

# Predict using the model
model.predict(X, eval_time_horizons)
@DrShushen DrShushen self-assigned this Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants