John Nay (Vanderbilt University, School of Engineering) conducted the most comprehensive analysis of law-making forecasting to date last year. In Predicting and Understanding Law-Making with Machine Learning, he writes:
We compared five models across three performance measures and two data conditions on 68,863 bills over 14 years. We created a model with consistently high predictive performance that effectively integrates heterogeneous data. A model using only bill text outperforms a model using only bill context for newest data, while context-only outperforms text-only for oldest data. In all conditions text consistently adds predictive power after controlling for non-textual variables.
In addition to accurate predictions, we are able to improve our understanding of bill content by using a text model designed to explore differences across chamber and enactment status for important topics. Our textual analysis serves as an exploratory tool for investigating subtle distinctions across categories that were previously impossible to investigate at this scale. The same analysis can be applied to any words in the large legislative vocabulary. The global sensitivity analysis of the full model provides insights into the factors affecting predicted probabilities of enactment. For instance, when predicting bills as they are first introduced, the text of the bill and the proportion of the chamber in the bill sponsor’s party have similarly strong positive effects. The full text of the bill is by far the most important predictor when using the most up-to-date data. The oldest data model relies more on title predictions than the newest data model, which makes sense given that titles rarely change after bill introduction. Comparing effects across time conditions and across models not including text suggests that controlling for accurate estimates of the text probability is important for estimating the effects of non-textual variables.
Although the effect estimates are not causal and estimates on predictors correlated with each other may be biased, they represent our best estimates of predictive relationships within a model with the strongest predictive performance and are thus useful for understanding law-making. This methodology can be applied to analyze any predictive model by treating it as a “black-box” data-generating process, therefore predictive power of a model can be optimized and subsequent analysis can uncover interpretable relationships between predictors and output. Our work provides guidance on effectively combining text and context for prediction and analysis of complex systems with highly imbalanced outcomes that are related to textual data. Our system for determining the probability of enactment across the thousands of bills currently under consideration (predictgov.com/projects/congress) focuses effort on legislation that is likely to matter, allowing the public to identify policy signal amid political and procedural noise.