Skip to content

Commit

Permalink
Document extending the simulation functions (#328)
Browse files Browse the repository at this point in the history
  • Loading branch information
gowerc authored May 22, 2024
1 parent 65f56a0 commit 9e80436
Show file tree
Hide file tree
Showing 2 changed files with 188 additions and 0 deletions.
2 changes: 2 additions & 0 deletions inst/WORDLIST
Original file line number Diff line number Diff line change
Expand Up @@ -138,3 +138,5 @@ SimSurvival
geq
LLoQ
groupwise
du
int
186 changes: 186 additions & 0 deletions vignettes/extending-jmpost.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -429,6 +429,192 @@ object@fun(
)
```

## Custom Simulation Functions

To assist with testing and debugging the joint models fitted via `jmpost` the `SimJointData`
constructor function is provided to generate joint data from known parameters.

### Custom Survival Simulation Functions

Before describing how to implement custom survival functions it is important to understand how
the survival simulation framework works more generally in `jmpost`.
To simulate event times the simulation function takes advantage of the following details:

$$
\begin{align}
S(t) &\sim U(0, 1) \\
S(t) &= exp\left(-\int_0^t h(u)\ du\right)
\end{align}
$$

That is, it first samples a survival probability $p$ from a uniform distribution and then calculates
the required event time $t$ to produce that survival probability. The current implementation
approximates the integral by sequentially summing up the hazard after a given step size and
declaring an event once the sampled probability value has been exceeded. This gives rise to two
key parameters that need to be defined by the user:

- `time_max`: The maximum time to simulate up to (events occurring after this time are censored)
- `time_step`: How much of a gap to leave between time points to calculate the hazard at

Note that there is currently an outstanding development item to convert this to use numerical
integration to remove the need for these parameters
(see [issue #329](https://github.com/Genentech/jmpost/issues/329)).


Custom survival distribution simulations are implemented as classes that inherit from `SimSurvival`
providing key parameter values and in particular provide a log-hazard function that will be
used as described above in combination with the covariate and link contributions.
The following is rough example of how the Weibull distribution has been implemented:

```R
SimSurvivalWeibullPH <- function(
lambda,
gamma,
time_max = 2000,
time_step = 1,
lambda_censor = 1 / 3000,
beta_cont = 0.2,
beta_cat = c("A" = 0, "B" = -0.4, "C" = 0.2)
) {
SimSurvival(
time_max = time_max,
time_step = time_step,
lambda_censor = lambda_censor,
beta_cont = beta_cont,
beta_cat = beta_cat,
loghazard = function(time) {
log(lambda) + log(gamma) + (gamma - 1) * log(time)
},
name = "SimSurvivalWeibullPH"
)
}
```
That is, the function is a essentially a constructor function for a `SimSurvival` object. This object
then has the following slots defined:

- `time_max`: The maximum time to simulate up to (as explained mentioned above)
- `time_step`: How much of a gap to leave between time points to calculate the hazard at (as explained mentioned above)
- `beta_cont`: The $\beta$ coefficient for the continuous covariate (sampled from a standard normal distribution for each subject)
- `beta_cat`: The $\beta$ coefficients for the categorical covariates (evenly sampled from `names(beta_cat)` for each subject)
- `loghazard`: The log-hazard function of the baseline survival distribution
- `name`: The name of the simulation function; only used for printing purposes


### Custom Longitudinal Simulation Functions

Custom longitudinal simulation functions are slightly more involved. Essentially the user needs to
define a new class which inherits from `SimLongitudinal` and then implement the
`sampleSubjects` and `sampleObservations` methods for the new class. The object itself should
contain all the required parameters for the model as well as a `times` slot which is a vector of
timepoints for observations to be generated at.

The `sampleSubjects` method is responsible for sampling the subject specific parameters e.g.
individual parameters for a random effects model. The `sampleObservations` method is responsible
for calculating the tumour size at each provided time point. The following is a rough example of how
the `SimLongitudinalGSF` class is implemented:

```R
# Declare the new class
.SimLongitudinalGSF <- setClass(
"SimLongitudinalGSF",
contains = "SimLongitudinal",
slots = c(
sigma = "numeric",
mu_s = "numeric",
mu_g = "numeric",
mu_b = "numeric",
a_phi = "numeric",
b_phi = "numeric",
omega_b = "numeric",
omega_s = "numeric",
omega_g = "numeric",
link_dsld = "numeric",
link_ttg = "numeric",
link_identity = "numeric"
)
)

# Define constructor function with sensible default values
SimLongitudinalGSF <- function(
times = c(-100, -50, 0, 50, 100, 150, 250, 350, 450, 550) / 365,
sigma = 0.01,
mu_s = c(0.6, 0.4),
mu_g = c(0.25, 0.35),
mu_b = 60,
a_phi = c(4, 6),
b_phi = c(4, 6),
omega_b = 0.2,
omega_s = 0.2,
omega_g = 0.2,
link_dsld = 0,
link_ttg = 0,
link_identity = 0
) {
.SimLongitudinalGSF(
times = times,
sigma = sigma,
mu_s = mu_s,
mu_g = mu_g,
mu_b = mu_b,
a_phi = a_phi,
b_phi = b_phi,
omega_b = omega_b,
omega_s = omega_s,
omega_g = omega_g,
link_dsld = link_dsld,
link_ttg = link_ttg,
link_identity = link_identity
)
}

sampleSubjects.SimLongitudinalGSF <- function(object, subjects_df) {
res <- subjects_df |>
dplyr::mutate(study_idx = as.numeric(.data$study)) |>
dplyr::mutate(arm_idx = as.numeric(.data$arm)) |>
dplyr::mutate(psi_b = stats::rlnorm(dplyr::n(), log(object@mu_b[.data$study_idx]), object@omega_b)) |>
dplyr::mutate(psi_s = stats::rlnorm(dplyr::n(), log(object@mu_s[.data$arm_idx]), object@omega_s)) |>
dplyr::mutate(psi_g = stats::rlnorm(dplyr::n(), log(object@mu_g[.data$arm_idx]), object@omega_g)) |>
dplyr::mutate(psi_phi = stats::rbeta(dplyr::n(), object@a_phi[.data$arm_idx], object@b_phi[.data$arm_idx]))
res[, c("pt", "arm", "study", "psi_b", "psi_s", "psi_g", "psi_phi")]
}

sampleObservations.SimLongitudinalGSF <- function(object, times_df) {
times_df |>
dplyr::mutate(mu_sld = gsf_sld(.data$time, .data$psi_b, .data$psi_s, .data$psi_g, .data$psi_phi)) |>
dplyr::mutate(dsld = gsf_dsld(.data$time, .data$psi_b, .data$psi_s, .data$psi_g, .data$psi_phi)) |>
dplyr::mutate(ttg = gsf_ttg(.data$time, .data$psi_b, .data$psi_s, .data$psi_g, .data$psi_phi)) |>
dplyr::mutate(sld = stats::rnorm(dplyr::n(), .data$mu_sld, .data$mu_sld * object@sigma)) |>
dplyr::mutate(
log_haz_link =
(object@link_dsld * .data$dsld) +
(object@link_ttg * .data$ttg) +
(object@link_identity * .data$mu_sld)
)
}
```


The `subjects_df` argument to the `sampleSubjects` method is a `data.frame` with the following columns:

- `pt`: The subject identifier
- `arm`: The treatment arm that the subject belongs to
- `study`: The study that the subject belongs to

Of note is that this dataset contains one row per subject. The return value must be a `data.frame` with the
same number of rows as the input dataset as well as the `pt`, `arm` and `study` columns. The remaining
columns are the subject specific parameters and can have any arbitrary name.

The `times_df` argument to the `sampleObservations` method is the same `data.frame` that was
generated in the `sampleSubjects` method but duplicated once per required timepoint with an
additional `time` column that contains said timepoint. The return value must be a `data.frame` with
the same number of rows as the input dataset as well as the original columns
`pt`, `arm`, `study` and `time`.
In addition to the original columns the function must also define the following new columns:

- `sld`: The tumour size at the given timepoint
- `log_haz_link`: The contribution to the hazard function at that timepoint (set this to 0 if not
defining a link function)

## Formatting Stan Files

Under the hood this library works by merging multiple Stan programs together into a single
Expand Down

0 comments on commit 9e80436

Please sign in to comment.