Multivariable Survival Analysis

Step 1: Importing the Data and Initial Setup

This block of code imports the required data and sets up some preliminary parameters for analysis.

use "https://github.com/muzaale/ikesa/raw/main/nhanes.dta", clear
di "obs: `c(N)', vars: `c(k)'"
global subgroup ridreth3
global subgroupvar: var lab ridreth3
cls
gen years = permth_exm / 12
stset years, fail(mortstat)

Step 2: Generating the Survival Graph

Here, we generate the survival graph for the various subgroups defined by the variable ridreth3.

#delimit ;
sts graph if inlist($subgroup,1,2,3,4,6,7),
    by($subgroup)
    fail
    ti("Mortality in NHANES III",pos(11))
    subti("by self report: ${subgroupvar}",pos(11))
    yti("%",orientation(horizontal))
    xti("Years")
    per(100)
    ylab(0(5)20,
        format(%3.0f)
        angle(360)
    )
    legend(on
        lab(1 "Mexican")
        lab(2 "Hispanic")
        lab(3 "White")
        lab(4 "Black")
        lab(5 "Asian")
        lab(6 "Other")
        ring(0)
        pos(11)
        col(1)
        order(3 4 1 2  5)
    )
    note("Source: RDC/NCHS/CDC/DHHS")  
;
#delimit cr

Step 3: Cox Proportional Hazard Model

Next, we run the Cox proportional hazards model using the defined subgroups and a set of confounding variables.

cd "~/dropbox/1f.ἡἔρις,κ/1.ontology/alpha"
global confounders ridageyr diq010 bmxbmi smq020 lbdscrsi lbxgh
stcox i.$subgroup $confounders if inlist(${subgroup}, 1, 2, 3, 4, 6, 7), basesurv(s0) //best when centered

Step 4: Matrices Definition

We define three matrices \(m\), \(b\), and \(V\), to store the table of results, coefficients, and variance-covariance matrix, respectively.

matrix define m = r(table)
matrix b = e(b)
matrix V = e(V)

Step 5: Scenario Vector (SV)

The Scenario Vector defines specific values for a given scenario. For instance, this can represent a 60-year-old black individual with certain medical conditions. Note that the first six positions in the vector are for the subgroups of race, and the last six are for the confounding variables.

//SV: black, 60yo, diabetic, BMI=36, h/o smoking, SCr=1.5, HbA1c=7.1
matrix SV = (0, 0, 0, 1, 0, 0, 60, 1, 36, 1, 1.5, 7.1)

Step 6: Calculating the Risk Score

The risk score (\(\rho\)) is calculated by multiplying the scenario vector with the transpose of the coefficients:

matrix risk_score = SV * b'

Step 7: Log Hazard Ratio

The log hazard ratio for the specified scenario is displayed.

//log HR for scenario vector above
matrix list risk_score 

Step 8: Hazard Ratio for Scenario

This calculates the hazard ratio (HR) by taking the exponential of the log hazard ratio:

//HR for scenario described compared with "base-case"
di exp(risk_score[1,1])

Step 9: Variance and Standard Error of Prediction

Here, we calculate the variance (\(\sigma^2\)) and standard error (SE) of the prediction for the scenario.

matrix var_prediction = SV * V * SV'
matrix se_prediction = sqrt(var_prediction[1,1])

Step 10: 10-Year Mortality for Scenario

Finally, the 10-year mortality for the specified scenario is plotted.

//10-year mortality for scenario 
gen f0 = (1 - s0) * 100
gen f1 = f0 * exp(risk_score[1,1])
drop if _t > 10
line f1 _t, sort connect(step step) ylab(0(5)20) xlab(0(2)10)
graph export nhanes_scenario.png, replace  

This series of steps takes the reader through the process of estimating the risk score and the associated 10-year mortality for a given scenario. It uses a semi-parametric model to create the underlying hazard function and uses specific scenario vectors to estimate the individualized risk for different populations or conditions.


Comment About Base-Case:

Interpretation would be more intuitive if centering were used. Without centering, the base-case is for a specific reference group (e.g., Mexican, 0 years old, non-diabetic, BMI=0, etc.) which is not very useful.

To adapt this portion of the code for another project or scenario, the analyst would need to redefine the scenario vector (SV) with the appropriate values for the new context. They would also need to ensure that the model’s coefficients and variance-covariance matrix are correctly extracted, and that the calculations are aligned with the new scenario’s specifications.