Since 2016, the world has witnessed a resurgence in measles cases and deaths as more people choose not to vaccinate their children—a decision that is often influenced by misinformation spread online through social media platforms such as Facebook and YouTube. By identifying families at greatest risk of not getting vaccinated, computer models could enable health officials and physicians to talk with parents at the stage when they remain undecided about vaccines.
“The reason why this could be useful is that, while it’s very hard to persuade someone once they’ve made up their mind, it might be easier if we know early enough and approach them in a friendly manner explaining why it’s important that their children be vaccinated,” says Tin Oreskovic, a data scientist at IBM’s Chief Analytics Office.
Families who choose to not get the MMR (measles, mumps, rubella) vaccine may expose their neighbors and communities to the risk of serious illness and death. In 2017, there were 110,000 measles deaths worldwide. Most of these fatalities involved children under the age of five, according to the World Health Organization (WHO). Before the measles vaccine became available in 1963, measles epidemics regularly swept the globe, killing approximately 2.6 million people each year.
It’s important to ensure that at least 95 percent of the population gets immunity through two vaccine doses (or sometimes prior exposure to the virus). That 95 percent “herd immunity” threshold limits the possible spread of measles outbreaks and helps protect infants who are too young to be vaccinated as well as people who cannot be immunized because of other diseases or conditions. But many countries have seen second-dose vaccination rates fall below the herd immunity threshold, including 34 out of 53 countries in the WHO’s European region in 2017.
To help boost vaccination rates, Oreskovic initiated and coordinated a University of Chicago Data Science for Social Good project aimed at predicting the likelihood of Croatian children getting vaccinated by the end of their first-grade school year. Working with the Croatian Institute of Public Health, researchers from France, Portugal, and the United States worked together to train machine learning algorithms on the electronic health records of 48,000 children who entered the first grade between 2011 and 2018.
After comparing the results from four machine learning models, researchers decided upon a LASSO logistic regression model that identified vaccine-hesitant families with 72-percent precision. The model pruned the large number of possible data features affecting vaccination rates down to just 25 of the most important features—something that improved the chance of the model’s predictive power holding up for other groups of children beyond those in the training datasets. (Some features that raised child risk scores included having children who sat, walked, and spoke at a later age than their peers.)
Just as importantly, the team chose the LASSO model because it presented the results for child risk scores in a way that humans could understand. Interpretability is never a guarantee with many machine learning models, but in this case it allowed both data scientists and health officials to understand and trust the LASSO model’s reasons for singling out certain families as being at higher risk of hesitating to vaccinate.
The project also created an “Early Warning and Monitoring System” Web dashboard that presents vaccination rates and child risk scores to public health officials and physicians at national, county, and local health clinic levels. The next project being considered will likely involve a randomized controlled trial to see whether the child risk scores help officials and physicians to intervene effectively with vaccine-hesitant families and improve vaccination rates. But that next step would likely take place no sooner than the 2020–2021 school year.
Some important issues have to be resolved before this type of predictive population analysis can be widely deployed. Any project that applies machine learning or related artificial intelligence techniques to analyzing personal health data has to take privacy and security concerns into consideration. In this case, Oreskovic’s team and Croatian public health officials took special precautions to ensure that the electronic health records of children were always made anonymous. The researchers accessed the records through the Croatian Institute for Public Health’s online server and never even downloaded any anonymized data.
Another issue for data scientists to keep in mind, says Oreskovic, is whether machine learning and AI might identify data features that contribute to biased policymaker actions regarding certain communities. He cautioned against the idea of deploying models that highlight features such as ethnicity or religion—factors that were excluded from the data in the Croatian study.
The United States has seen recent measles outbreaks, caused mostly by non-vaccinated U.S. residents returning from international travel and spreading the disease in communities with lower vaccination rates. Among these were some ultra-Orthodox Jewish communities in New York State’s Rockland County and in New York City. Most prominent ultra-Orthodox rabbis urge their congregations to get vaccinated, but the anti-vaccine movement has gained some traction nonetheless. Similar skepticism toward vaccines has spread among both liberal and conservative communities across the United States.
If a computer model did hypothetically flag certain religious affiliations, it raises the risk of officials acting upon the information in a way that woulld stigmatize entire religious groups. “The question is: Does the extra attention hurt the community, either through prejudice or too much policy intervention,” Oreskovic says.
This story was updated on 8 May 2019.