Misidentification of Structural Associations (Series)

Estimating a causal effect is extremely difficult independent of the causal framework used to guide the evaluation. As I perceive it, this is because complicated structural networks commonly exist around the causal relationships we try to estimate. The misidentification of this structural network will likely bias the estimated effects (i.e., the results of any model) away from the true causal effects. This, however, just represents a vague guess about how estimates could vary when key characteristics of structural networks are misidentified by applied statisticians. I simply don’t know the direction or the magnitude of the bias that could exist for each misidentification. Born from this lack of knowledge is the Misidentification of Structural Associations Series. This series is intended to illustrate how far model estimates are from the true direct causal association between X and Y when the structural network is misidentified. Each week we will explore the direction and the magnitude of bias for a different misidentification of the structural network. All of the series entries, and the dates posted, are provided below (*identify entries that have been updated after the initial publication date).

The first structural network that we will explore is illustrated in the figure below (R-code). Briefly, the straight lines represent causal pathways, the curved lines represent covariances, EXk are independent variables only (EX = exogenous variables), LENk are both independent and dependent variables (LEN = lagged endogenous variable), and ENk are dependent variables only (EN = endogenous variable). We are interested in the direct causal effect of X1 on Y1. The direct causal effect of X1 on Y1 was set equal to 1.00, where a 1 point increase in X1 directly causes a 1 point increase in Y1. All other causal pathways are specified to randomly vary. By randomly varying the causal pathways between the other constructs, we can calculate the average difference (using 10,000 R-loops) between the direct effect of X1 on Y1when the structural network is misidentified and the true direct effects of X1 on Y1 (1.00). Importantly, confounders, colliders, mediators, and non-causal covariates exist within this structural network.

Figure 1: The true structural network existing around the causal association of X (the primary independent variable) on Y (the primary dependent variable).

Series Entries:

Misidentification 1: Bivariate regression of Y1 on X1 (1/8/2021).

Misidentification 2: Y1 regressed on all variables in the system (1/15/2021)

Misidentification 3: Y1 regressed on X1, EX1, & LEN7 (1/22/2021).

Misidentification 4: Y1 regressed on X1, EN7-EN12 regressed on X1, and Y1 covarying with EN7-EN12 (1/29/2021).*

Misidentification 5: Previous entry plus X1 regressed on EX31-EX33 (2/5/2021).

Misidentification 6: X1 regressed on EX31-EX33, Y1 on X1 and LEN1-LEN4 (2/12/2021).*

Misidentification 7: Previous Post Plus LEN1-LEN4 Regressed on X1 (2/19/2021).

Misidentification 8: The Right Estimate (2/26/2021).