Announcing a major update to merTools

merTools is an R package that is designed to make working with multilevel models from lme4, particularly large models with many random effects, fast and easy. With merTools you can generate prediction intervals that incorporate various components of uncertainty (fixed effect, random effect, and model uncertainty), you can get the expected rank of individual random effect levels (a combination of magnitude and precision of the estimate) and you can explore the substantive effect of variables in the model using a Shiny application interactively!

Recently, we've updated the package to significantly improve performance and accuracy. You can get it on CRAN now. 

Below are some updates from the NEWS.md. To learn more check out the package development on GitHub. You can also read previous a previous blog entry discussing the package and its uses.

merTools 0.3.0

  • Improve handling of formulas. If the original merMod has functions specified in the formula, the draw and wiggle functions will check for this and attempt to respect these variable transformations. Where this is not possible a warning will be issued. Most common transformations are respected as long as the the original variable is passed untransformed to the model.
  • Change the calculations of the residual variance. Previously residual variance was used to inflate both the variance around the fixed parameters and around the predicted values themselves. This was incorrect and resulted in overly conservative estimates. Now the residual variance is appropriately only used around the final predictions
  • New option for predictInterval that allows the user to return the full interval, the fixed component, the random component, or the fixed and each random component separately for each observation
  • Fixed a bug with slope+intercept random terms that caused a miscalculation of the random component
  • Add comparison to rstanarm to the Vignette
  • Make expectedRank output more tidy like and allow function to calculate expected rank for all terms at once
    • Note, this breaks the API by changing the names of the columns in the output of this function
  • Remove tests that test for timing to avoid issues with R-devel JIT compiler
  • Remove plyr and replace with dplyr
  • Fix issue #62 varList will now throw an error if == is used instead of =
  • Fix issue #54 predictInterval did not included random effects in calculations when newdata had more than 1000 rows and/or user specified parallel=TRUE. Note: fix was to disable the .paropts option for predictInterval ... user can still specify for temporary backward compatibility but this should be either removed or fixed in the permanent solution.
  • Fix issue #53 about problems with predictInterval when only specific levels of a grouping factor are in newdata with the colon specification of interactions
  • Fix issue #52 ICC wrong calculations ... we just needed to square the standard deviations that we pulled

On Early Warning Systems in Education

Recently the NPR program Marketplace did a story about the rise of the use of dropout early warning systems in public schools that you can read or listen to online. I was lucky enough to be interviewed for the piece because of the role I have played in creating the Wisconsin Dropout Early Warning System. Marketplace did a great job explaining the nuances of how these systems fit into the ways schools and districts work. I wanted to use this as an opportunity just write a few thoughts about early warning systems based on my work in this area. 

Not discussed in the story was the more wonky but important question of how these predictions are obtained. While much academic research discusses the merits of various models in terms of their ability to correctly identify students, there is not as much work done discussing the choice of which system to use in application. By its nature, the problem of identifying dropouts early presents a fundamental trade-off between simplicity and accuracy. When deploying an EWS to educators in the field, then, analysts should focus on not how accurate a model is, but if it is accurate enough to be useful and actionable. Unfortunately, most of the research literature on early warning systems focuses on the accuracy of a specific model and not the question of sufficient accuracy. 

Part of the reason for this focus is that each model tended to have its own definition of accuracy. A welcome and recent shift in the field to using ROC curves to measure the trade-off between false-positives and false-negatives now allows for these discussions of simple vs. complex to use a common and robust accuracy metric. (Hat tip to Alex Bowers for working to provide these metrics for dozens of published early warning indicators.) For example, a recent report by the Chicago Consortium on School Research (CCSR) demonstrates how simple indicators such as grade 8 GPA and attendance can be used to accurately project whether a student will be on-track in grade 9 or not. Using ROC curves, the CCSR can demonstrate on a common scale how accurate these indicators are relative to other more complex indicators and make a compelling case that in Chicago Public Schools these indicators are sufficiently accurate to merit use.  

However, in many cases these simple approaches will not be sufficiently accurate to merit use in decision making in schools. Many middle school indicators in the published literature have true dropout identification rates that are quite low, and false-positive rates that are quite high (Bowers, Sprott and Taff 2013). Furthermore, local conditions may mean that a linkage between GPA and dropout that holds in Chicago Public Schools is not nearly as predictive in another context. Additionally, though not empirically testable in most cases, many EWS indicator systems simply serve to provide a numeric account of information that is apparent to schools in other ways -- that is, the indicators selected identify only "obvious" cases of students at risk of dropping out. In this case the overhead of collecting data and conducting identification using the model does not generate a payoff of new actionable information with which to intervene. 

More complex models have begun to see use perhaps in part to respond to the challenge of providing value added beyond simple checklist indicators. Unlike checklist or indicator systems, machine learning approaches determine the risk factors empirically from historical data. Instead of asserting that an attendance rate above 95% is necessary to be on-track to graduate, a machine learning algorithm identifies the attendance rate cutoff that that best predicts successful graduation. Better still, the algorithm can do this while jointly considering several other factors simultaneously. This approach is the approach I have previously written about taking in Wisconsin, and has also been developed in Montgomery County Public Schools by Data Science for Social Good fellows

In fact, the machine learning model is much more flexible than a checklist approach. Once you have moved away from the desire to provide simple indicators that can be applied by users on the fly, and are willing to deliver analytics much like another piece of data, the sky is the limit. Perhaps the biggest advantage to users is that machine learning approaches allow analysts to help schools understand the degree of student risk. Instead of providing a simple yes or no indicator, these approaches can assign probabilities to student completion, allowing the school to use this information to decide on the appropriate level of response. 

This concept of degree is important because not all dropouts are simply the lowest performing students in their respective classes. While low performing students do represent a majority of dropouts in many schools, these students are often already identified and being served because of their low-performance. A true early warning system, then, should seek to identify both students who are already identified by schools and those students who are likely non-completers, but who may not already be receiving intervention services. To live up to their name, early warning systems should identify students earlier than after they have started showing acute signs of low performance or disengagement in school. This is where the most value can be delivered to schools. 

Despite the improvements possible with a machine learning approach, a lot of work remains to be done. One issue that was raised in the piece in the Marketplace story is understanding how schools put this information to work. An EWS alone will not improve outcomes for students -- it only enables schools more time to make changes. There has not been much research on how schools use information like an early warning system to make decisions about students. There needs to be more work done to understand how schools as organizations respond to analytics like early warning indicators. What are their misconceptions? How do they work together? What are the barriers to trusting these more complex calculations and the data that underlie them?

The drawback of the machine learning approach, as the authors of the CCSR report note, is that the results are not intuitive to school staff and this makes the resulting intervention strategy seem less clear. This trade-off strikes at the heart of the changing ways in which data analysis is assisting humans in making decisions. The lack of transparency in the approach must be balanced by an effort on the part of the analysts providing the prediction to communicate the results. Communication can make the results easier to interpret, can build trust in the underlying data, and build capacity within organizations to create the feedback loops necessary to sustain the system. Analysts must actively seek out feedback on the performance of the model, learn where users are struggling to understand it, and where users are finding it clash with their own observations. This is a critical piece in ensuring that the trade-off in complexity does not undermine the usefulness of the entire system. 

EWS work represents just the beginning for meaningful analytics to replace the deluge of data in K-12 schools. Schools don't need more data, they need actionable information that reduces the time not spent on instruction and student services. Analysts don't need more student data, they need meaningful feedback loops with educators who are tasked with interpreting these analyses and applying the interventions to drive real change. As more work is done to integrate machine learning and eased data collection into the school system, much more work must be done to understand the interface between school organizations, individual educators, and analytics. Analysts and educators must work together to continually refine what information schools and teachers need to be successful and how best to deliver that information in an easy to use fashion at the right time. 

Further Reading

Read about the machine learning approach applied in Montgomery County Public Schools

Learn about the ROC metric and how various early warning indicators have performed relative to one another in this paper by Bowers, Sprott, and Taff. 

Learn about the Wisconsin DEWS machine learning system and how it was developed

Read the comparison of many early warning indicators and their performance within Chicago Public Schools. 

Of Needles and Haystacks: Building an Accurate Statewide Dropout Early Warning System in Wisconsin

For the past two years I have been working on the Wisconsin Dropout Early Warning System, a predictive model of on time high school graduation for students in grades 6-9 in Wisconsin. The goal of this project is to help schools and educators have an early indication of the likely graduation of each of their students, early enough to allow time for individualized intervention. The result is that nearly 225,000 students receive an individualized prediction at the start and end of the school year. The workflow for the system is mapped out in the diagram below:

The system is moving into its second year of use this fall and I recently completed a research paper describing the predictive analytic approach taken within DEWS. The research paper is intended to serve as a description and guide of the decisions made in developing an automated prediction system using administrative data. The paper covers both the data preparation and model building process as well as a review of the results. A preview is shown below which demonstrates how the EWS models trained in Wisconsin compare to the accuracy reported in the research literature - represented by the points on the graph. The accuracy is measured using the ROC curve. The article is now available via figshare.

The colored lines represent different types of ensembled statistical models and their accuracy across various thresholds of their predicted probabilities. The points represent the accuracy of comparable models in the research literature using reported accuracy from a paper by Alex Bowers

Bowers, A.J., Sprott, R.*, Taff, S.A.* (2013) Do we Know Who Will Drop Out? A Review of the Predictors of Dropping out of High School: Precision, Sensitivity and Specificity. The 
High School Journal, 96(2), 77-100. doi:10.1353/hsj.2013.0000. This article serves as good background and grounds the benchmarking of the models built in Wisconsin and for others when benchmarking their own models. 

Article Abstract:

The state of Wisconsin has one of the highest four year graduation rates in the nation, but deep disparities among student subgroups remain. To address this the state has created the Wisconsin Dropout Early Warning System (DEWS), a predictive model of student dropout risk for students in grades six through nine. The Wisconsin DEWS is in use statewide and currently provides predictions on the likelihood of graduation for over 225,000 students. DEWS represents a novel statistical learning based approach to the challenge of assessing the risk of non-graduation for students and provides highly accurate predictions for students in the middle grades without expanding beyond mandated administrative data collections.

Similar dropout early warning systems are in place in many jurisdictions across the country. Prior research has shown that in many cases the indicators used by such systems do a poor job of balancing the trade off between correct classification of likely dropouts and false-alarm (Bowers et al., 2013). Building on this work, DEWS uses the receiver-operating characteristic (ROC) metric to identify the best possible set of statistical models for making predictions about individual students. 

This paper describes the DEWS approach and the software behind it, which leverages the open source statistical language R (R Core Team, 2013). As a result DEWS is a flexible series of software modules that can adapt to new data, new algorithms, and new outcome variables to not only predict dropout, but also impute key predictors as well. The design and implementation of each of these modules is described in detail as well as the open-source R package, EWStools, that serves as the core of DEWS (Knowles, 2014). 

Code:

The code that powers the EWS is an open source R extension of the caret package which is available on GitHub: EWStools on GitHub