SurvBal_Vignette
Overview
SurvBal enables the selection of microbiome balances in
relation to censored survival or time-to-event outcomes, which are of
considerable interest in many biomedical studies. The most commonly used
survival models – the Cox proportional hazards and parametric survival
(including accelerated failure time) models are included in the package,
which are used in combination with step-wise selection procedures to
identify the optimal associated balance of microbiome, i.e., the ratio
of the geometric means of two groups of taxa’s relative abundances.
Installation
Please make sure you install the R package zCompositions
with version number less than or equal to 1.4.0.1 before installing
SurvBal:
require(remotes)
install_version("zCompositions", version = "1.4.0.1", repos = "http://cran.us.r-project.org")Then you can install SurvBal from GitHub by:
Please skip the recommended update for
zCompositions.
Getting Started
You can load the sample data in SurvBal using the
following code.
bacteria is a raw taxon count table with 63 samples and
139 taxa. gvhd is a survival object containing time and
censoring/event information for each sample.
Examples
Select balance based on the Cox proportional hazards model
We can select the balance of microbiome based on the Cox proportional
hazards Model by setting model = "coxph".
Here, we specify mult_repl = TRUE to use the geometric
Bayesian multiplicative replacement method to impute the inflated zeros.
If you would like to process the data by adding a small pseudo-count
(0.5) to all raw counts instead, set mult_repl = FALSE.
We specify selection_criterion = "min_decrement_pvalue"
so that the decrement of p-value along the forward selection path will
be calculated, the balance before the first decrement that is smaller
than selection_threshold (the default value is 0.15) will be selected.
If “min_pvalue” is used, the balance that has the smallest p-value along
the forward selection path will be selected. The default is
“min_decrement_pvalue”.
select_coxph = balance_selection(
Surv_obj = gvhd,
data = bacteria,
mult_repl = TRUE,
model = "coxph",
selection_criterion = "min_decrement_pvalue"
)We can check the omnibus p-value from the global community-level association test, MiRKAT-S, which examines whether there is an overall shift in the microbiome composition (presence-absence status and abundance, encoded by Bray-Curtis and Jaccard distances) regarding the survival outcome.
A p-value greater than 0.05 indicates that there is no significant
community-level association between the microbiome and survival outcome.
Though it is not obvious whether the community-level test truly informs
the validity of the balance – community-level testing is best used when
there are concerted differences among a large number of taxa while the
analysis of balances is about variable selection wherein the outcome is
related to imbalances of a few taxa, it is recommended to interpret the
final balance by SurvBal with caution, as the current
algorithm may still select a balance when no associations exist.
We can find the selection path in selection_path:
select_coxph$selection_path[,1:2]
#> taxon_id sign
#> Dysosmobacter 33 0
#> Pseudoflavonifractor capillosus/Clostridium phoceensis 38 1
#> Clostridiales Family XIII. Incertae Sedis 90 0
#> Anaerotignum lactatifermentans 95 1
#> Lachnospira eligens 127 0
#> Eisenbergiella massiliensis 124 1
#> Massilimicrobiota timonensis 81 0
#> Acidaminococcus intestini 43 1
#> Agathobaculum butyriciproducens 100 0
#> [Clostridium] leptum 26 0
#> Bifidobacterium catenulatum/pseudocatenulatum 55 0
#> Parasutterella excrementihominis 48 1
#> Anaerobutyricum soehngenii 129 0
#> Bacteroides xylanisolvens 67 0
#> Eubacterium 25 0
#> Anaerostipes 134 0
#> Alistipes putredinis 58 0
#> Eubacterium ventriosum 12 0
#> Bilophila wadsworthia 45 0We can print the final fitted Cox PH model for the survival outcome,
gvhd , with the selected balance:
select_coxph$survival_model
#> Call:
#> coxph(formula = Surv_obj ~ ., data = df)
#>
#> coef exp(coef) se(coef) z p
#> balance 1.3747 3.9537 0.1806 7.612 2.7e-14
#>
#> Likelihood ratio test=55.99 on 1 df, p=7.275e-14
#> n= 63, number of events= 47
summary(select_coxph$survival_model)
#> Call:
#> coxph(formula = Surv_obj ~ ., data = df)
#>
#> n= 63, number of events= 47
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> balance 1.3747 3.9537 0.1806 7.612 2.7e-14 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> exp(coef) exp(-coef) lower .95 upper .95
#> balance 3.954 0.2529 2.775 5.633
#>
#> Concordance= 0.824 (se = 0.031 )
#> Likelihood ratio test= 55.99 on 1 df, p=7e-14
#> Wald test = 57.94 on 1 df, p=3e-14
#> Score (logrank) test = 68.07 on 1 df, p=<2e-16We can check which taxa are included in the numerator of the selected balance, as well as taxa in the denominator of the selected balance:
select_coxph$balance_name$numerator
#> [1] "Pseudoflavonifractor capillosus/Clostridium phoceensis"
#> [2] "Anaerotignum lactatifermentans"
#> [3] "Eisenbergiella massiliensis"
#> [4] "Acidaminococcus intestini"
#> [5] "Parasutterella excrementihominis"
select_coxph$balance_name$denominator
#> [1] "Dysosmobacter"
#> [2] "Clostridiales Family XIII. Incertae Sedis"
#> [3] "Lachnospira eligens"
#> [4] "Massilimicrobiota timonensis"
#> [5] "Agathobaculum butyriciproducens"
#> [6] "[Clostridium] leptum"
#> [7] "Bifidobacterium catenulatum/pseudocatenulatum"
#> [8] "Anaerobutyricum soehngenii"
#> [9] "Bacteroides xylanisolvens"
#> [10] "Eubacterium"
#> [11] "Anaerostipes"
#> [12] "Alistipes putredinis"
#> [13] "Eubacterium ventriosum"
#> [14] "Bilophila wadsworthia"And here is the value of the final selected balance of microbiome for each sample:
select_coxph$balance
#> M-2000 M-2001 M-2002 M-2003 M-2004 M-2005
#> 0.837486253 -1.406619798 -1.421036705 -0.204348362 -1.228283054 -0.005997298
#> M-2006 M-2007 M-2008 M-2009 M-2010 M-2013
#> 1.779247033 -0.806398850 0.357369533 3.752290061 0.551966669 0.292634946
#> M-2014 M-2018 M-2021 M-2024 M-2030 M-2031
#> -1.228931554 -1.239757883 1.684575012 -0.042165288 -0.679476501 -0.564948424
#> M-2035 M-2040 M-2041 M-2043 M-2045 M-2046
#> 0.393028222 -0.414389042 -0.884891857 0.352006472 0.044044368 0.850689152
#> M-2049 M-2050 M-2051 M-2054 M-2056 M-2059
#> 0.792721982 -0.434479547 0.104701085 -1.269892983 -0.326736863 -0.368146020
#> M-2062 M-2064 M-2066 M-2067 M-2068 M-2069
#> 0.715060166 -0.646442335 0.191908746 -1.174924149 1.220369196 0.676698660
#> M-2074 M-2077 M-2079 M-2082 M-2084 M-2085
#> -0.248523821 -0.221243932 -0.732856713 -0.402435682 -0.362810395 0.165101075
#> M-2087 M-2090 M-2091 M-2092 M-2094 M-2095
#> -0.073229491 0.044846394 1.492174624 -1.665548697 -1.025799358 -0.179534268
#> M-2096 M-2098 M-2099 M-2101 M-2108 M-2109
#> -0.818456580 2.142694112 -0.430141030 1.974103121 -0.609331350 0.921126209
#> M-2111 M-2115 M-2122 M-2123 M-2124 M-2125
#> 0.600699874 0.183049924 -0.749427856 0.550366098 -0.667610499 -1.546930652
#> M-2128 M-2168 M-2169
#> 0.587784622 -0.647886850 -0.233900172Finally, we can visualize the survival plot with respect to several
quantiles of the final selected balance of microbiome with
1-alpha confidence interval. The quantiles of the final
selected balance for plot are specified through the argument
quantile_plotted, of which the default value is
c(0.25, 0.75). The default value of alpha is
0.05.
From the plot, we can see that the 25th percentile (lower quartile) of the sample microbiome balance is -0.6735, and the 75th percentile (upper quartile) is 0.5512. The survival curves for the two corresponding values of the microbiome balance are shown in the plot.
Select balance based on the parametric survival model
We can also select the balance based on the parametric survival model
by setting model = "parametric" , and specify the
distribution via the argument dist.
select_weibull = balance_selection(
Surv_obj = gvhd,
data = bacteria,
mult_repl = FALSE,
model = "parametric",
dist = "weibull",
selection_criterion = "min_decrement_pvalue"
)Similarly, We can check the omnibus p-value from the global
community-level association test in global_p:
we can find the selection path in selection_path:
select_weibull$selection_path[,1:2]
#> taxon_id sign
#> [Ruminococcus] faecis 8 1
#> Dorea 102 0
#> Clostridiales Family XIII. Incertae Sedis 90 1
#> Anaerotignum lactatifermentans 95 0
#> Coprococcus 14 1
#> Acidaminococcus intestini 43 0
#> Neglecta 29 0
#> Anaerobutyricum soehngenii 129 1
#> Intestinimonas v2 36 0
#> Massilimicrobiota timonensis 81 1
#> Longicatena caecimuris 75 0
#> Anaerobutyricum hallii 131 0
#> Coprococcus comes 112 1
#> Dysosmobacter 33 1
#> Agathobaculum butyriciproducens 100 1We can print the final fitted parametric survival model for the
survival outcome, gvhd, with the selected balance:
select_weibull$survival_model
#> Call:
#> survreg(formula = Surv_obj ~ ., data = df, dist = dist)
#>
#> Coefficients:
#> (Intercept) balance
#> 5.769075 1.650078
#>
#> Scale= 1.199659
#>
#> Loglik(model)= -258.6 Loglik(intercept only)= -295.3
#> Chisq= 73.41 on 1 degrees of freedom, p= <2e-16
#> n= 63We can check which taxa are included in the numerator of the selected balance, as well as taxa in the denominator of the selected balance:
select_weibull$balance_name
#> $numerator
#> [1] "[Ruminococcus] faecis"
#> [2] "Clostridiales Family XIII. Incertae Sedis"
#> [3] "Coprococcus"
#> [4] "Anaerobutyricum soehngenii"
#> [5] "Massilimicrobiota timonensis"
#> [6] "Coprococcus comes"
#> [7] "Dysosmobacter"
#> [8] "Agathobaculum butyriciproducens"
#>
#> $denominator
#> [1] "Dorea" "Anaerotignum lactatifermentans"
#> [3] "Acidaminococcus intestini" "Neglecta"
#> [5] "Intestinimonas v2" "Longicatena caecimuris"
#> [7] "Anaerobutyricum hallii"And here is the value of the final selected balance of microbiome for each sample:
select_weibull$balance
#> M-2000 M-2001 M-2002 M-2003 M-2004 M-2005
#> 0.00000000 2.04057566 -0.01319373 -0.75699431 0.93550620 1.35663561
#> M-2006 M-2007 M-2008 M-2009 M-2010 M-2013
#> -0.47998351 0.65783627 -1.05440428 -3.55409033 -1.73662766 -0.70293243
#> M-2014 M-2018 M-2021 M-2024 M-2030 M-2031
#> 1.15979605 1.34995970 -0.86751417 -1.16936496 2.08546971 -0.62975327
#> M-2035 M-2040 M-2041 M-2043 M-2045 M-2046
#> -0.72669106 -1.60148802 1.16110388 0.25503500 -1.32013808 -0.93820266
#> M-2049 M-2050 M-2051 M-2054 M-2056 M-2059
#> -0.14414414 -0.17656269 -1.26209604 1.16819286 -1.25011071 0.41197961
#> M-2062 M-2064 M-2066 M-2067 M-2068 M-2069
#> -1.38818211 -0.04957122 -1.20049269 -0.59210268 -1.56215146 -1.02600843
#> M-2074 M-2077 M-2079 M-2082 M-2084 M-2085
#> -0.31388923 0.27465307 1.09060275 -1.27005905 0.11770846 -0.61724786
#> M-2087 M-2090 M-2091 M-2092 M-2094 M-2095
#> -0.06916086 -1.72846110 -1.79710092 1.03588867 1.24154467 -0.43826587
#> M-2096 M-2098 M-2099 M-2101 M-2108 M-2109
#> -0.22965446 -0.95661076 -0.17400633 -1.52564103 -0.40173157 -0.95808157
#> M-2111 M-2115 M-2122 M-2123 M-2124 M-2125
#> -1.87655712 0.05168391 0.11770846 -1.57673097 0.07865761 1.31621642
#> M-2128 M-2168 M-2169
#> -0.80668017 -1.71446159 0.27465307Finally, we can visualize the survival plot: