MAZE: Mediation Analysis for ZEro-inflated mediators

1. Introduction

The causal mediation analysis is a statistical technique to investigate and identify relationships in a causal mechanism involving one or more intermediate variables (i.e., mediators) between an independent variable and an outcome. In addition to a better understanding of the causal pathways in proposed theoretical mechanisms, mediation analyses can help to confirm and refine treatments when it is not possible or ethical to intervene the independent variable.

However, challenges arise in mediation analyses in datasets with an excessive number of zero data point for mediators, especially in count data or non-negative measurements. The standard mediation analysis approaches may not be valid due to the violation of distributional assumptions. Moreover, the excessive zero mediator values could contain both true and false zeros. A true zero means that the measurement is truly zero, while a false zero means the measurement is positive but might be too small to be detected given the accuracy of devices used. Therefore, there is an unmet need for mediation analysis approaches to account for the zero-inflated structures of these mediators.

To address the difficulties, we proposed a novel mediation analysis approach to estimate and test direct and indirect effects to handle zero-inflated mediators that are non-negative. The zero-inflated log-normal (ZILoN), zero-inflated negative binomial (ZINB), and zero-inflated Poisson (ZIP) mediators were considered as the possible options of distributions for these mediators.

The R package MAZE implements the proposed causal mediation analysis approach for zero-inflated mediators to estimate and test natural indirect effect (NIE), natural direct effect (NDE), and controlled direct effect (CDE). Given the zero-inflated nature, the mediation effect (i.e., NIE) can be decomposed into two components NIE1 and NIE2.

2. Model

For simplicity, the subject index is suppressed, and confounders are not included in the equations, but they have been incorporated into MAZE.

For an independent variable X, a zero-inflated mediator M and a continuous outcome variable Y, the following regression equation is used to model the association between Y and (X, M): where Yxm1(m > 0) is the potential outcome of Y when (X, M, 1(M > 0)) take the value of (x, m, 1(m > 0)), 1(⋅) is an indicator function. Equation () is an regression model where β0, β1, β2, β3, β4, β5 are regression coefficients and ϵ is the random error following the normal distribution N(0, δ2). Notice that interactions between X and the two mediators M and 1(M > 0) can be accommodated by the product terms β4X1(M > 0) and β5XM in the model, which is an advantage of potential-outcomes mediation analysis approaches. Users can specify whether to include either one, both, or none of the two possible interactions using the argument XMint.

2.1 Zero-inflated mediators

2.1.1 Zero-inflated log-normal (ZILoN) mediators

For a ZILoN mediator, its two-part density function can be rewritten as: where ϕ(⋅) is the density function of the log-normal distribution indexed by the parameters μ and σ which are the expected value and standard deviation, respectively, of the random variable after natural-log transformation.

The ZILoN mediator M depends on X through the following equations: Equations (), () and () together form the full mediation model for a ZILoN mediator and a continuous outcome.

2.1.2 Zero-inflated negative binomial (ZINB) mediators

The two-part density function for a ZINB mediator M is given by: where the parameter vector (μ, r)T controls the number of zeros generated from the NB distribution, 0 < Δ* < 1 is the parameter controlling the number of excessive zeros (i.e., not generated from the NB distribution), r is the dispersion parameter, and μ is the expectation of the negative binomial distribution. The ZINB mediator M depends on X through the following equations: Equations (), () and () together form the full mediation model for a ZINB mediator and a continuous outcome.

2.1.3 Zero-inflated Poisson (ZIP) mediators

The two-part density function for a ZIP mediator can be rewritten as: where λ > 0 is the mean of the Poisson distribution. λ controls the number of zeros generated by the data generating process underlying the Poisson distribution, while 0 < Δ* < 1 controls the number of excessive zeros in addition to zeros from the Poisson distribution. The ZIP mediator M depends on X through the following equations: Equations (1), () and () together form the full mediation model for a ZIP mediator and a continuous outcome.

2.2 Probability mechanism for observing false zeros

It is common to observe two types of zeros for M in a data set with excessive zeros: true zeros and false zeros. We use M to denote the true value of the mediator and use M* for the observed value of M. When the observed value of the mediator is positive (i.e., M* > 0), we assume M* = M. However, when M* = 0, we don’t know whether M is truly zero or M is positive but incorrectly observed as zero. We consider the following mechanism for observing a zero:

where the parameter η needs to be estimated, and B > 0 is a known constant. The value of B can be informed on the basis of the insights and judgements of professionals in the specific field from which the data arose.

2.3 Mediation effect and direct effect

The natural indirect effect (NIE), natural direct effects (NDE) and controlled direct effect (CDE) are derived for the proposed mediation model. The NIE is also called the mediation effect. The total effect of the independent variable X is equal to the summation of NIE and NDE. Let Mx denote the value of M if X is taking the value of x. Let 1(Mx > 0) denote the value of 1(M > 0) if X takes the value of x. The average NIE, NDE and CDE if X changes from x1 to x2 are given by:

Based on the sequential order of the two mediators M and 1(M > 0), NIE can be further decomposed: where NIE1 is the mediation effect through M summing the two causal pathways X → M → Y and X → M → 1(M > 0) → Y, and NIE2 is the mediation effect through only 1(M > 0) on the causal pathway X → 1(M > 0) → Y.

2.3.1 Effects for ZILoN mediators

2.3.2 Effects for ZINB mediators

2.3.3 Effects for ZIP mediators

3. Installing the R package

The R package MAZE can be installed from the Github webpage.

require(devtools)
devtools::install_github("https://github.com/meilinjiang/MAZE", build_vignettes = TRUE)

4. Main function MAZE()

To estimate and test NIE, NIE1, and NIE2, NDE, and CDE, the R function MAZE is used to implement the proposed mediation analysis approach for zero-inflated mediators.

4.1 Input arguments

The input arguments to the function are

  • data: a data frame containing variables: an independent variable , a mediator , an outcome , and confounder variables (if any). See example dataset: data(zinb10)

  • distM: a vector with choices of the distribution of mediator to try with. One or more of ‘zilonm’, ‘zinbm’, and ‘zipm’ for zero-inflated log-normal, negative binomial, and Poisson mediators respectively. Default is c('zilonm', 'zinbm', 'zipm') where all three distributions are fitted and the final mediation model is selected by model selection criterion selection

  • K: a vector with choices of the number of component in the zero-inflated mixture mediators to try with. Default is for zero-inflated (non-mixture) mediators

  • selection: model selection criterion when more than one model (combination of different values in distM and K) is fitted. Either ‘AIC’ or ‘BIC’. Default is ‘AIC

  • X: name of the independent variable. Can be continuous or discrete

  • M: name of the mediator variable. Non-negative values

  • Y: name of the outcome variable. Continuous values

  • Z: name(s) of confounder variables (if any)

  • XMint: a logical vector of length 2 indicating whether to include the two exposure-mediator interaction terms between (i) and and (ii) and . Default is c(TRUE, FALSE), which only includes the first

  • x1: the first value of independent variable of interest

  • x2: the second value of independent variable of interest

  • zval: the value of confounders to be conditional on when estimating effects

  • mval: the fixed value of mediator to be conditional on when estimating CDE

  • B: the upper bound value to be used in the probability mechanism of observing false zeros

  • seed: an optional seed number to control randomness for reproducibility. The default is 1

  • ncore: number of cores available for parallel computing

4.2 Outputs

A list object containing

  • results_effects: a data frame for the results of estimated effects (NIE1, NIE2, NIE, NDE, and CDE). “_cond” for conditional effects at zval and “_avg” for average effects

  • results_parameters: a data frame for the results of model parameters

  • selected_model_name: a string for the distribution of and number of components selected in the final mediation model

  • BIC: a numeric value for the BIC of the final mediation model

  • AIC: a numeric value for the AIC of the final mediation model

  • models: a list with all fitted models

  • analysis2_out: a list with output from analysis2() function (used for internal check)

5. Example

The MAZE package contains an example dataset zinb10 that was generated using the proposed model with a zero-inflated negative binomial mediator (K = 1). It is a data frame with 100 observations and 3 variables: a continuous independent variable X, a continuous outcome Y, and a count mediator variable Mobs. The mediator variable contains 10% zero values in which half are false zeros.

library(MAZE)
#> Loading required package: flexmix
#> Loading required package: lattice
#> Loading required package: numDeriv
#> Loading required package: pracma
#> 
#> Attaching package: 'pracma'
#> The following objects are masked from 'package:numDeriv':
#> 
#>     grad, hessian, jacobian
# load the example dataset "zinb10"
data(zinb10)
# call MAZE() to perform mediation analysis
maze_out <- MAZE(data = zinb10,
                 distM = c('zilonm', 'zinbm', 'zipm'),  K = 1,
                 selection = 'AIC',
                 X = 'X', M = 'Mobs', Y = 'Y', Z = NULL,
                 XMint = c(TRUE, FALSE),
                 x1 = 0, x2 = 1, zval = NULL, mval = 0,
                 B = 20, seed = 1)
#> $zilonm_K1
#> $zilonm_K1$init
#> (Intercept)        Mobs                       X                         
#>  1.78569404  0.12005503  0.00000000  0.85118891  0.00000000  0.95403184 
#>                                     (Intercept)           X             
#>  1.23291057  0.06162764 -0.44558991 -2.21212075 -0.35269450  0.01000000 
#> 
#> $zilonm_K1$est
#> (Intercept)        Mobs                       X                         
#>  1.30167999  0.12005557  0.48393825  0.15873694  0.69218518  0.93329877 
#>                                     (Intercept)           X             
#>  1.23217909  0.06211809 -0.44458704 -2.21739677 -0.35363937  2.49329787 
#> 
#> $zilonm_K1$countEM
#> [1] 33
#> 
#> $zilonm_K1$AIC
#> [1] 756.3761
#> 
#> $zilonm_K1$BIC
#> [1] 787.6382
#> 
#> $zilonm_K1$tau2
#> $zilonm_K1$tau2$tauG1
#>       [,1]
#>  [1,]    1
#>  [2,]    1
#>  [3,]    1
#>  [4,]    1
#>  [5,]    1
#>  [6,]    1
#>  [7,]    1
#>  [8,]    1
#>  [9,]    1
#> [10,]    1
#> [11,]    1
#> [12,]    1
#> [13,]    1
#> [14,]    1
#> [15,]    1
#> [16,]    1
#> [17,]    1
#> [18,]    1
#> [19,]    1
#> [20,]    1
#> [21,]    1
#> [22,]    1
#> [23,]    1
#> [24,]    1
#> [25,]    1
#> [26,]    1
#> [27,]    1
#> [28,]    1
#> [29,]    1
#> [30,]    1
#> [31,]    1
#> [32,]    1
#> [33,]    1
#> [34,]    1
#> [35,]    1
#> [36,]    1
#> [37,]    1
#> [38,]    1
#> [39,]    1
#> [40,]    1
#> [41,]    1
#> [42,]    1
#> [43,]    1
#> [44,]    1
#> [45,]    1
#> [46,]    1
#> [47,]    1
#> [48,]    1
#> [49,]    1
#> [50,]    1
#> [51,]    1
#> [52,]    1
#> [53,]    1
#> [54,]    1
#> [55,]    1
#> [56,]    1
#> [57,]    1
#> [58,]    1
#> [59,]    1
#> [60,]    1
#> [61,]    1
#> [62,]    1
#> [63,]    1
#> [64,]    1
#> [65,]    1
#> [66,]    1
#> [67,]    1
#> [68,]    1
#> [69,]    1
#> [70,]    1
#> [71,]    1
#> [72,]    1
#> [73,]    1
#> [74,]    1
#> [75,]    1
#> [76,]    1
#> [77,]    1
#> [78,]    1
#> [79,]    1
#> [80,]    1
#> [81,]    1
#> [82,]    1
#> [83,]    1
#> [84,]    1
#> [85,]    1
#> [86,]    1
#> [87,]    1
#> [88,]    1
#> [89,]    1
#> [90,]    1
#> 
#> $zilonm_K1$tau2$tauG2
#>            [,1]         [,2]
#>  [1,] 0.9972266 0.0027734091
#>  [2,] 0.9977148 0.0022851621
#>  [3,] 0.9952583 0.0047417365
#>  [4,] 0.9957374 0.0042625746
#>  [5,] 0.9980420 0.0019580465
#>  [6,] 0.9973702 0.0026298407
#>  [7,] 0.9968183 0.0031816703
#>  [8,] 0.9990487 0.0009512697
#>  [9,] 0.9969852 0.0030147807
#> [10,] 0.9969090 0.0030909859
#> 
#> 
#> 
#> $zinbm_K1
#> $zinbm_K1$init
#> (Intercept)        Mobs                       X                         
#>  1.78517270  0.12015440  0.00000000  0.85104549  0.00000000  0.95403971 
#> (Intercept)           X             (Intercept)           X             
#>  1.41269766  0.09257633  5.68623319 -2.21212075 -0.35269450  0.01000000 
#> 
#> $zinbm_K1$est
#> (Intercept)        Mobs                       X                         
#>   1.3032717   0.1200885   0.4823095   0.1533072   0.6979238   0.9333399 
#> (Intercept)           X             (Intercept)           X             
#>   1.3645683   0.1069317   6.7790122  -2.8245948  -0.3554505   2.9869217 
#> 
#> $zinbm_K1$countEM
#> [1] 85
#> 
#> $zinbm_K1$AIC
#> [1] 750.4574
#> 
#> $zinbm_K1$BIC
#> [1] 781.7195
#> 
#> $zinbm_K1$tau2
#> $zinbm_K1$tau2$tauG1
#>  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [39] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [77] 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> 
#> $zinbm_K1$tau2$tauG2
#>            [,1]         [,2]
#>  [1,] 0.9998879 1.120824e-04
#>  [2,] 0.9999099 9.011999e-05
#>  [3,] 0.9997935 2.065316e-04
#>  [4,] 0.9998199 1.801039e-04
#>  [5,] 0.9999185 8.152604e-05
#>  [6,] 0.9998969 1.030726e-04
#>  [7,] 0.9998685 1.315202e-04
#>  [8,] 0.9999623 3.774995e-05
#>  [9,] 0.9998760 1.239638e-04
#> [10,] 0.9998716 1.283670e-04
#> 
#> 
#> 
#> $zipm_K1
#> $zipm_K1$init
#>      (Intercept)             Mobs                                 X 
#>       1.78512362       0.12019466       0.00000000       0.85116028 
#>                                   coef.(Intercept)           coef.X 
#>       0.00000000       0.95413982       1.41225480       0.09506801 
#>      (Intercept)                X                  
#>      -2.21212075      -0.35269450       0.01000000 
#> 
#> $zipm_K1$est
#>      (Intercept)             Mobs                                 X 
#>        1.3038644        0.1200770        0.4818120        0.1554076 
#>                                   coef.(Intercept)           coef.X 
#>        0.6957494        0.9334036        1.3930176        0.1025632 
#>      (Intercept)                X                  
#>       -2.4125566       -0.3399581        3.8115853 
#> 
#> $zipm_K1$countEM
#> [1] 31
#> 
#> $zipm_K1$AIC
#> [1] 757.0581
#> 
#> $zipm_K1$BIC
#> [1] 785.715
#> 
#> $zipm_K1$tau2
#> $zipm_K1$tau2$tauG1
#>       [,1]
#>  [1,]    1
#>  [2,]    1
#>  [3,]    1
#>  [4,]    1
#>  [5,]    1
#>  [6,]    1
#>  [7,]    1
#>  [8,]    1
#>  [9,]    1
#> [10,]    1
#> [11,]    1
#> [12,]    1
#> [13,]    1
#> [14,]    1
#> [15,]    1
#> [16,]    1
#> [17,]    1
#> [18,]    1
#> [19,]    1
#> [20,]    1
#> [21,]    1
#> [22,]    1
#> [23,]    1
#> [24,]    1
#> [25,]    1
#> [26,]    1
#> [27,]    1
#> [28,]    1
#> [29,]    1
#> [30,]    1
#> [31,]    1
#> [32,]    1
#> [33,]    1
#> [34,]    1
#> [35,]    1
#> [36,]    1
#> [37,]    1
#> [38,]    1
#> [39,]    1
#> [40,]    1
#> [41,]    1
#> [42,]    1
#> [43,]    1
#> [44,]    1
#> [45,]    1
#> [46,]    1
#> [47,]    1
#> [48,]    1
#> [49,]    1
#> [50,]    1
#> [51,]    1
#> [52,]    1
#> [53,]    1
#> [54,]    1
#> [55,]    1
#> [56,]    1
#> [57,]    1
#> [58,]    1
#> [59,]    1
#> [60,]    1
#> [61,]    1
#> [62,]    1
#> [63,]    1
#> [64,]    1
#> [65,]    1
#> [66,]    1
#> [67,]    1
#> [68,]    1
#> [69,]    1
#> [70,]    1
#> [71,]    1
#> [72,]    1
#> [73,]    1
#> [74,]    1
#> [75,]    1
#> [76,]    1
#> [77,]    1
#> [78,]    1
#> [79,]    1
#> [80,]    1
#> [81,]    1
#> [82,]    1
#> [83,]    1
#> [84,]    1
#> [85,]    1
#> [86,]    1
#> [87,]    1
#> [88,]    1
#> [89,]    1
#> [90,]    1
#> 
#> $zipm_K1$tau2$tauG2
#>            [,1]         [,2]
#>  [1,] 0.9999997 2.634686e-07
#>  [2,] 0.9999998 2.113970e-07
#>  [3,] 0.9999995 4.860296e-07
#>  [4,] 0.9999996 4.119145e-07
#>  [5,] 0.9999998 2.054828e-07
#>  [6,] 0.9999998 2.450092e-07
#>  [7,] 0.9999997 3.003230e-07
#>  [8,] 0.9999999 8.004548e-08
#>  [9,] 0.9999997 2.852396e-07
#> [10,] 0.9999997 2.730768e-07
#> 
#> 
#> 
#> [1] "selected_model_name: zinbm_K1"
## results of selected mediation model
maze_out$results_effects # indirect and direct effects. "_cond" for conditional effects and "_avg" for average effects 
#>             Estimate         SE    CI_lower   CI_upper       Pvalue
#> NIE1_cond 0.05848985 0.04004099 -0.01998904 0.13696874 1.440842e-01
#> NIE2_cond 0.03049883 0.02884348 -0.02603336 0.08703102 2.903333e-01
#> NIE_cond  0.08898868 0.05507323 -0.01895287 0.19693023 1.061322e-01
#> NDE_cond  0.78215096 0.10561449  0.57515037 0.98915155 1.303402e-13
#> CDE_cond  0.15330724 0.44685318 -0.72250891 1.02912338 7.315368e-01
#> NIE1_avg  0.05848985 0.04004099 -0.01998904 0.13696874 1.440842e-01
#> NIE2_avg  0.03049883 0.02884348 -0.02603336 0.08703102 2.903333e-01
#> NIE_avg   0.08898868 0.05507323 -0.01895287 0.19693023 1.061322e-01
#> NDE_avg   0.78215096 0.10561449  0.57515037 0.98915155 1.303402e-13
#> CDE_avg   0.15330724 0.44685318 -0.72250891 1.02912338 7.315368e-01
maze_out$selected_model_name # distribution of the mediator and number of components K in the selected mediation model
#> [1] "zinbm_K1"
maze_out$results_parameters # model parameters
#>            Initials   Estimate          SE     CI_lower   CI_upper       Pvalue
#> beta0    1.78517270  1.3032717  0.30847932   0.69866338  1.9078801 2.391245e-05
#> beta1    0.12015440  0.1200885  0.04070186   0.04031429  0.1998626 3.173197e-03
#> beta2    0.00000000  0.4823095  0.36486355  -0.23280989  1.1974290 1.862047e-01
#> beta3    0.85104549  0.1533072  0.44685318  -0.72250891  1.0291234 7.315368e-01
#> beta4    0.00000000  0.6979238  0.45876488  -0.20123887  1.5970864 1.281820e-01
#> delta    0.95403971  0.9333399  0.06599795   0.80398635  1.0626935 0.000000e+00
#> alpha10  1.41269766  1.3645683  0.07253340   1.22240549  1.5067312 0.000000e+00
#> alpha11  0.09257633  0.1069317  0.06954885  -0.02938149  0.2432450 1.241695e-01
#> r        5.68623319  6.7790122  3.36669264   0.18041588 13.3776085 4.405654e-02
#> gamma0  -2.21212075 -2.8245948  0.67815514  -4.15375450 -1.4954352 3.111943e-05
#> gamma1  -0.35269450 -0.3554505  0.51607623  -1.36694127  0.6560404 4.909770e-01
#> eta      0.01000000  2.9869217 14.53529043 -25.50172408 31.4755674 8.371858e-01
maze_out$BIC; maze_out$AIC # BIC and AIC of the selected mediation model
#> zinbm_K1 
#> 781.7195
#> zinbm_K1 
#> 750.4574

Session Info

sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] MAZE_0.0.3          pracma_2.4.4        numDeriv_2016.8-1.1
#> [4] flexmix_2.3-19      lattice_0.22-6      rmarkdown_2.29     
#> 
#> loaded via a namespace (and not attached):
#>  [1] doParallel_1.0.17 cli_3.6.3         knitr_1.48        rlang_1.1.4      
#>  [5] xfun_0.49         jsonlite_1.8.9    buildtools_1.0.0  htmltools_0.5.8.1
#>  [9] maketools_1.3.1   nnet_7.3-19       sys_3.4.3         sass_0.4.9       
#> [13] stats4_4.4.2      modeltools_0.2-23 grid_4.4.2        evaluate_1.0.1   
#> [17] jquerylib_0.1.4   MASS_7.3-61       fastmap_1.2.0     foreach_1.5.2    
#> [21] yaml_2.3.10       lifecycle_1.0.4   compiler_4.4.2    codetools_0.2-20 
#> [25] Rcpp_1.0.13-1     digest_0.6.37     R6_2.5.1          parallel_4.4.2   
#> [29] bslib_0.8.0       tools_4.4.2       iterators_1.0.14  cachem_1.1.0

  1. University of Florida, ↩︎

  2. University of Florida, ↩︎