March 2020 – Crowdsourced Health

Internet advertising systems know a lot about us. If you want to know how much, head to Google’s web page on ad personalization (https://adssettings.google.com/u/0/authenticated). On a recent visit I found out that Google knows of my upcoming travel plans to Italy and Texas, a few of my hobbies, and several academic topics that I’m learning more about. Unsurprisingly, it wasn’t correct on everything (I’m not into extreme sports), but it knew a lot more than I would have imagined before I visited their web page.

Over the past few years our and other research groups have shown that interactions with search engines can be used to screen people for a variety of serious medical conditions, both mental and physical. These include depression, eating disorders, Parkinson’s disease, several types of solid tumor cancer, and diabetes. However, informing people about these inferences is challenging both technically and ethically.

This week we published a paper (https://dl.acm.org/doi/10.1145/3373720) showing how to leverage the information that Internet advertisers have about us, to screen for 3 types of cancer. Our results suggest that it’s indeed possible to screen people for their likelihood of suffering from cancer before they are diagnosed by a doctor.

Here’s how it works: when an advertiser uses Bing or Google to advertise, they select keywords such that when a user searches for these keywords their ads are shown. A more sophisticated form of advertising happens when, in addition, advertisers tell Google or Bing whenever a user who saw the ads buys the product they were trying to sell. When advertisers do this, the advertising systems learn to predict who, among all people use the keywords, are likely to buy a product (technically this is known as conversion optimization). This learning is based on the information that advertising systems have about users, including their interests, locations and demographics.

What we did was to leverage this capability and use the advertising system to screen people for cancer. We achieved this by showing an ad to people who searched for information on self-diagnosis of lung, breast or colon cancers. The ad suggested help in understanding the severity of the symptoms that people were experiencing. People who chose to click the ads were directed to a website where, after explaining the experimental nature of the system and asking for their consent, they were given a clinical questionnaire about their demographics and symptoms. People who answered the questions were given one of two indications: either that they should urgently seek medical attention, because their symptoms were deemed serious, or that it was likely that their symptoms weren’t indicative of cancer but medical advice should be sought, though not urgently.

When the questionnaire indicated that a person was likely suffering from cancer, we informed the advertising system that the person “bought” our “product”. Within 3 weeks, the advertising systems learned to focus on those people who probably have cancer, such that approximately 1 in 10 people who completed the questionnaires were likely suffering from it, up from the baseline rate of under 1%. This rate was similar for all three types of cancer.

The use of ads offers a method for interacting with people who might be suffering from as-of-yet undiagnosed cancer. By providing ads with an offer of help and empowering people to select whether or not they wished to receive this help, we overcome many of the ethical challenges associated with unsolicited diagnosis. Our use of the sophisticated capabilities and knowledge about users that advertising systems have, allows us to identify people with serious disease, without having to have access to sensitive individual-level search data.

Interestingly the people who use the system most came from countries with high Internet use and lower life span. The latter is a known proxy for the quality of the health system.

Many health organizations use internet advertising for awareness campaigns and for campaigns designed to encourage healthier behaviors. Our results lead us to suggest that health systems should leverage the information that advertising systems collect about people in order to improve population level screening programs.

Jeffrey Hammerbacher (at the time at Facebook) once commented that “The best minds of my generation are thinking about how to make people click ads. That sucks.” Let’s make use of the products of those great minds to improve outcomes for people with serious disease.

One of the most common ways to model the spread of an infectious disease in a population is through compartment models, so called because they divide the population into compartments, with each person residing in one compartment. Perhaps the most common variant is the Susceptible-Infected-Recovered (SIR) model, where people are in one of those 3 compartments. A simple set of 3 differential equations describes the movement of people between these compartments. Thus, for example, the number of infected people in the next time step is dependent on the number of currently susceptible individuals, the number of infected people they come into contact with, and the infection rate, minus the number of people who recover in a time step.

However, in most cases people don’t just belong to one compartment, because populations are not homogenous. For example, it makes sense to divide the population not just to the SIR compartments but also according to the country they live in.

Today we publish an extended SIR model which can model homogeneous populations, divided, for example, by area of residence, age group, etc. By fitting the model to Google Trends data for two common viruses, we reveal information about the complex spatial structure of disease spread.

The viruses we rested were Respiratory Syncytial Virus (RSV) and West Nile Virus (WNV). No COVID-19 data here. Sorry.

Although we make no prior assumptions on spatial structure, human movement patterns in the US explain 27%–30% of the estimated inter-state transmission rates. The transmission rates within states are correlated with known demographic indicators, such as population density and average age.

Our model also allows prediction of disease spread in subsequent seasons using the model parameters estimated for previous seasons and as few as 7 weeks of data from the current season.

The work was done mostly by our then intern, Dr. Inbar Seroussi.

The full paper:
https://www.sciencedirect.com/science/article/pii/S0378437120301692?via%3Dihub

Month: March 2020

This ad may save your life

Multi-season analysis reveals the spatial structure of disease spread