Article: Silver, I. A., Wooldredge, J., Sullivan, C. J., & Nedelec, J. L. (2021). Longitudinal Propensity Score Matching: A Demonstration of Counterfactual Conditions Adjusted for Longitudinal Clustering. Journal of Quantitative Criminology, 37, 267–301. https://doi.org/10.1007/s10940-020-09455-9.
1. Background (PDF)
Evaluations of programs, policies, or practices implemented by criminal legal system agencies often rely on quasi-experimental designs to better understand if the effort has led to substantive changes in departmental functioning or individuals’ lives. Aligned with the reliance on quasi-experimental designs, propensity score matching (PSM) and inverse probability weighting (IPW) has become a widely used tool for generating inferences about the effectiveness of programs, policies, or practices implemented by criminal legal system agencies. These techniques balance differences between the cases who received or did not receive the intervention and create a counterfactual condition (what would have happened to cases had they not received treatment). Traditionally, these techniques rely on time-independent models to create the propensity score or IPW. Time-independent models, in the current context, refers to models that do not account for when a treatment case received the programs, policies, or practices. Not accounting for when a treatment case received the programs, policies, or practices implicitly assumes that:
- The probability of treatment exposure is constant across time
- The relationship between covariates and treatment does not vary across when a treatment program was administered.
To provide an example, if we evaluate the effects of Thinking for a Change (T4C) on misconduct using a time-independent model, we are assuming that the: 1) eligibility/exclusionary criteria has remained constant, 2) waitlist/selection process for the program has remained constant, and 3) rewards for participating in the program has remained constant. While these assumptions might be valid in the short-term, changes to the eligibility/exclusionary criteria, the waitlist/selection process, and rewards for program participation will likely change if it is administered over a long period of time (2 or more years). Furthermore, individuals’ characteristics, opportunities, and motivational factors might differentially effect enrollment in T4C over time.
Longitudinal Propensity Score Matching (LPSM) and Longitudinal IPW was proposed to address violating these key assumptions. LPSM uses random-intercept and random-slope longitudinal models to estimate the probability of treatment at each time point, producing more accurate counterfactual matches when treatment opportunities vary across individuals and over time.
2. Summary of Findings
The study’s simulation component illustrates how cross-sectional and longitudinal models diverge when the likelihood of a case participating in a program, policy, or practice changes over time. Analyses showed that time-independent models could produce biased estimates and, more importantly, were more likely to result in incorrect conclusions about the effectiveness of the program, policy, or practice. Time-dependent models generally produced less biased estimates and resulted in interpretations that aligned with the true effectiveness of the program, policy, or practice.
The second component of the study applies LPSM to real administrative data capturing prison programming in Ohio. Post-matching analyses indicated that LPSM yielded treatment effect estimates that were more stable and substantively consistent with theoretical expectations, whereas cross-sectional models produced attenuated results.
Collectively, the simulation and empirical demonstration illustrate that LPSM can be a valuable tool when evaluating the effects of a program, policy, or practice. However, consistent with best practice, various weighting and matching techniques should be implemented to triangulate the true effects of a program, policy, or practice.
3. Implications
The findings suggest that LPSM could serve as an additional analytical technique for scholars and analysts to implement when conducting an evaluation of a program, policy, or practice. Importantly, policymakers should be cautious when interpreting evaluation studies that rely solely on time-independent PSM or IPWs for programs, policies and practices that are administered to participants over an extended period of time (2 or more years). Although LPSM requires more careful design and computational effort than cross-sectional PSM, the statistical procedures could improve the accuracy of estimated treatment effects under certain conditions.
4. Data and Methods
The authors’ methodological framework incorporated both simulated and real-world longitudinal data. The simulations included sample sizes ranging from 1,000 to 50,000 individuals, each possessing 12 opportunities for treatment exposure. Exposure at each period was determined by a combination of five covariates and a set of time-specific random error terms, producing a realistic structure of clustering and diminishing predictability across periods. After estimating logistic and longitudinal models, the authors matched treated and untreated individuals using nearest neighbor matching with a caliper of .05. They then assessed balance and compared post-matching treatment effects to the true effect value of 1.00 established in the data-generating process.
The empirical demonstration used administrative records from the Ohio Department of Rehabilitation and Correction, focusing on 63,899 individuals incarcerated between 2008 and 2012. Treatment was defined as participation in any reentry-approved or non-reentry program during any of the 12 three-month intervals within the first three years of incarceration. Covariates included demographic characteristics, cognitive ability categories, prior incarceration indicators, security level, risk scores, marital status, and sex offense status. The authors first estimated a cross-sectional logistic regression model predicting aggregated treatment exposure and then estimated a longitudinal random-intercept and random-slope model predicting treatment in each three-month block. After generating the two sets of propensity scores, they matched individuals and evaluated balance and treatment effects.
5. Conclusion
The article provides compelling evidence that longitudinal clustering meaningfully affects propensity score estimation and matching when treatment exposure occurs over multiple time periods. LPSM advances the methodological toolkit available to criminologists and policy analysts by offering a practical and robust approach to modeling time-varying treatment processes. The evaluation suggests that LPSM could out-perform PSM when cases are exposed to a program, policy, or practice at distinct times over the course of an extended enrollment period, providing an enhanced ability to understand the effects of a treatment. Given the widespread use of PSM in criminal justice research and the prevalence of programs with staggered or prolonged exposure windows, adopting LPSM provides an additional analytical tool to strengthen the inferences generated about the effectiveness of a program, policy, or practice and, in turn, improve the rigor of the evidence base supporting criminal legal system agencies.
Disclosure: This research brief was prepared by ChatGPT and reviewed/edited by Ian A. Silver.