 Jun 2020

michaelbarrowman.co.uk michaelbarrowman.co.uk

We report our work in line with the TRIPOD guidelines for development and validation of clinical prediction models [13], [14].
move to Methods

o develop a CPM
be more specific  a MSCPM that allows all the transitions of interest to be modelled holistically etc.

flawed
Should say what the concrete implications of this are: e.g. risk probabilties may be misinterpreted when the model is used etc.

Discusssion
@Mark would be good to have something on clinical implication here too

nonuntreated
you mean treated? A little hard to follow here.

three models
think now agreed to just report one (with the others in a supplement)

uch predictions
need to say why such predictions are clinically useful  @Mark.

Introduction
this is commonly background + objectives in an abstract. So should also include the objective (to develop the MSCPM)


michaelbarrowman.co.uk michaelbarrowman.co.uk

the top and bottom 5% of all simulation estimates will be omitted
this seems an unusual thing to do to me. Have you seen this before? Wouldn't it mean that standard errors, coverage etc are underestimated? I'd suggest instead using all data to calculate things like standard error, but show 595% intervals on the plots?
Looking at the Results, and at your code, I think this removal of the top and bottom 5% could have quite unpredictable implications on bias, SE and coverage.
Please can we see the results without the extremes dropped?

s performance measures
in terms of estimating our target parameter theta that measures the calibration in the large.

Expected outco
recall again here  these come from the 'true' models that match the DGMs. Also important to mention that this use of 'truth' doesn't (in my opinion) bias results in our favour since all methods are able to use this 'true' expected event proportion.
However, you could argue that the 'true' IPCW is favouring our approach  you could mention this as a limitation in the Discussion: that for finite samples one of the simpler approaches might outperform IPCW because it doesn't need to estimate the IPCW

To combat this, we rescale the values produced to be with this range and perform the regression as normal.
we really need to get guidance from Paul Lambert on this: have you managed to get hold of him?

This is a minor issue and can be dealt with by most software packages
Presumably this is no problem at all as long as there is a weighting option (as there is in glm())

LU
worth noting that this is a very bad idea and nobody (hopefully!) would do this in practice.

the value of the KM curve at the current time is taken to be the average Observed number of events
suggest cutting this (it's not an accurate description of what the KM estimate is, so I suggest just say we use the KM estimate as the observed proportion of events)

FPFPF_P was chosen
make it very clear that you are assuming these models are known: not attempting to estimate them  and explain why this is the right thing to do (i.e. this part of the process is not of interest)

a high number of patients was chosen to improve precision of our estimates
as here we are interested in bias rather than variability of the estimates

all the values
in a factorial design

This combines to give a simulated survival function, SSS as S(tZ=z)=exp(−eβZtη+1η+1)S(tZ=z)=exp(−eβZtη+1η+1) S(tZ=z) = \exp\left(\frac{e^{\beta Z}t^{\eta+1}}{\eta+1}\right) and a simulated censoring function, ScScS_c as Sc(tZ=z)=exp(−eγZt)
Suggest cutting: giving the hazard functions is enough (corresponding survival functions are obvious to derive).

Survival times were simulated with a baseline hazard λ0(t)=tηλ0(t)=tη\lambda_0(t) = t^{\eta} (i.e. Weibull), and a proportional hazard of eβZeβZe^{\beta Z}.
just write down the equation all together  feels a bit strange to separate out like this

Patients
Observations

Each population was simulated with three parameters: ββ\beta, γγ\gamma and ηη\eta, which defined the proportional hazards coefficients for the survival and censoring distributions and the baseline hazard function, respectively.
suggest moving a bit later  since I'm immediately wondering what the models are  so I'd present the models first then say these will be the parameters that we vary.

n unweighted estimate
 other ways that censoring handled  e.g. poharperme approach etc.

to
with

estimating the censoring distribution
model for censoring?

that they are exchangeable conditional on the measured covariates
to be precise, exchangeable conditional on the covariates in the model used to construct the IPCW, assuming that this model is correctly specified.

their probability
the inverse of the

assumes
also assumes

overall
stick with either mean calibration or calibration in the large, for consistency.

overcome
addressed? (overcome a bit strong given what we go on to say...)

???
yes: qrisk only provide a point estimate of the baseline hazard at 10y  and even for that you have to go digging

o
how and whether

More complicated
Replace to 'Other'

nt
add comma

They need to be validated before they are used and
repetition of prev sentence

Inverse Probability Weighting Adjustment of the Logistic Regression CalibrationintheLarge
This title needs changing: doesn't capture the contents at all. How about 'Using inverse probability of censoring weights to estimate calibrationinthelarge for timetoevent models'

 May 2020

michaelbarrowman.co.uk michaelbarrowman.co.uk

relaxes the assumption that patients who were censored are identical to those that remain at risk
and replaces with the assumption that they are identical/exchangeable conditional on the measured covariates.

ew and improved
Looks good. For bias I wonder if absolute bias might be clearer for eyeballing?

Cox regression
change to time to event model

0(t)=tη
i.e. Weibull(?)

γ=0
in this case censoring is just random  so wouldn't we expect all methods to do well here?

0.1,−0.1)
0.2?

z)=1
express as a proportional hazards model instead. We can either estimate as a proportional baseline Weibull hazard or using Cox, I assume it wouldn't matter which

In
I know I wrote this(!), but this para is quite hard to follow  suggest moving the points that address the first two of the three ways to where they are introduced  then introduce the third way (censoring) and say that is our focus.

In these papers a fractional polynomial approach to estimating the baseline survival function (and thus being able to share it efficiently) is also provided.
move above to where we introduce the challenge of sharing baseline hazard.

perfect
what is 'perfect'? Coverage should be 95%. Too high and too low are both bad

this coverage is reduced compared to the previous set of results shown (approximately 75% throughout)
but why? I would expect the coverage to be ok


michaelbarrowman.co.uk michaelbarrowman.co.uk

models
i think this section can be shortened.Only need a brief 'dismissal' of the existing models  no need to labour the point.

Model Design
there seems to be a lot of information missing from the Methods section:
what model was used missing data handling model selection
etc
work through TRIPOD guidelines

out performing
the models with different number of states are modelling different outcomes, and probably answering slightly different clinical questions. So I think clinical considerations should primarily inform which of 2, 3, 4 state model is used. Then a question about whether all three need to be presented in this paper

multiple outcomes with a single model
not sure this quite captures. Multiple outcomes to me suggests multivariate prediction models (like Glen's MRC grant and https://arxiv.org/abs/2001.07624 ). I think here we still have a single outcome but complex, i.e. changes over time as captured by a multistate model.

no sample size calculations were performed prior to recruitment
but shouldn't sample size calculations be done to inform how many predictors to include in the models? https://www.bmj.com/content/368/bmj.m441.abstract

ESRD
paper is quite acronym heavy. Suggest cutting them down.
Tags
Annotators
URL

 Apr 2020

michaelbarrowman.co.uk michaelbarrowman.co.uk

We did not assess the viability of these models as it was believed this assumption to make our results more understandable.
I think it is necessary to at least check the PH assumption. For example, there might be a strong nonproportionality across gender: then it would be entirely reasonable to fit separate models by gender (e.g. as QRISK does)

This timelessness of the model means it can be applied to any patient at any time during their CKD journey
Paper needs to explain more how you can indeed apply at any point in the journey. Presume that you mean only before any state transition has occurred?


michaelbarrowman.co.uk michaelbarrowman.co.uk

The values produced by PO will have to be artificially capped between 0 and 1,
is that what is recommended in the Lambert paper / or the PoharPerme one?
