Should we expect a surge of COVID19 babies?

When the COVID19 pandemic began spreading, people started making predictions on what its short-term effects would be. There were predictions of a global recession, more home cooking, and even a rise in the divorce rate. One prediction was highly specific: There would be many “COVID babies.

I’ve been trying to figure out if that prediction is true using search data. Here’s the trend of searches for pregnancy tests in the USA for the past 5 years, taken from Google Trends. It tells an interesting story.

As you can see, every year at roughly the end of march or early April there’s a spike of searches. I’ve marked them with triangles. There is also a wave of searches around the July timeframe.

The spikes correspond to the week or two after spring break. You can guess why… The July surge might be related to planned spring babies or perhaps it’s summer love?

But what happened this year? Interestingly, there’s a drop in searches corresponding to the time of the beginning of the pandemic. Perhaps people couldn’t go out to buy pregnancy tests or perhaps they were under stress due to the pandemic so they couldn’t care for those tests. Interestingly, spring break spike is there in all its glory. So is the July surge. In fact, it’s probably larger than in most years, seemingly compensating for the March dip.

Therefore, the bottom line is, there’s no abnormal spike in searches for pregnancy tests in the USA since the pandemic began. Does that mean there won’t be a surge of babies in another few months? I don’t know, but my guess is, probably not.

Black Lives Matter – Is it a party-political issue?

Black Lives Matter protests have been dominating my Twitter feed for the past several days. Google Trends data shows a similar trend:

(Interest in “Black Lives Matter” in the US over the past 30 days)

But is it the same experience across the US?

It turns out that interest is strongly related to political affiliation. Here is a scatter plot of Google Trends interest in “Black Lives Matter” at the US state level, compared to the percentage of the vote for Clinton in the 2016 presidential elections.

(50 states, excluding Washington DC)

The fit is pretty remarkable (55%), with states that have more Democrat votes showing more interest in the topic.


State which defy the trend are Utah, Idaho, and Wyoming (more interest in the topic than expected by their voting patterns) and, on the opposing side, Mississippi, South Dakota, and Florida (less interest than expected). Also, anecdotally, the “Related queries” shown by Google Trends in California and Oregon are related to donations to Black Lives Matter, whereas in Mississippi and Florida they are for merchandise.


I also tested interest in the terms “protests” and “looting” across the different states. The former behaves similarly to “Black Lives Matter” while the latter had a similar trend to that of “Black Lives Matter”, only breaking for highly Democrat states, where there was far less interest in looting than expected by the overall trend.


Political scientists may want to theorize if this will change election results, but at least overall the pattern seems to suggest that this is (still?) an issue where interest depends on who you vote for.

Addendum (16 June 2020):

In July 2016 large-scale Black Lives Matter demonstrations were held in 88 cities. Google Trends data from that period (May – July 2016) shows no correlation (R2=0.00) with voting patterns.

A few thoughts about virtual conferences

I attended the first virtual edition of TheWebConference last week. The conference was planned as a conference with physical attendance. The organizers decided to make it a virtual conference a few weeks before it began. I have to applaud the organizers who managed to make this change which is extremely challenging on many levels.

I attended several sessions and I have to say that my experience was not wholly positive (by no fault of the organizers), partly due to objective reasons and partly because of things that we might learn to do differently.

Objective problems: A virtual conference offers the possibility for many more people to attend. On the other hand, it also means that there are significant time zone problems. Owing to the location of Taiwan, I’m guessing that people from time zones of India and up to Australia could probably comfortably attend the entire conference. People in the west coast of the Americas likely attended the morning sessions and those in Europe the afternoon sessions. I’m not sure what people on the east coast of the Americas did… I don’t see how we can overcome this problem, but time zone differences mean that the conference audience is spread over multiple sessions. All the sessions I attended had fewer participants than what I would have expected in a physical conference.

Things we can do differently:

Questions and participation: For some reason, it felt as though people were less comfortable asking questions. This was true even at a virtual poster session which I participated in. Perhaps, just as we have a Session Chair, we should have a secret “session question asker”, who will ask the first questions to help others participate? (and no, the fact that the Chair asks a question didn’t prove to be a good solution)

Socializing: One of the main reasons I go to a conference is to have informal conversations with people. This didn’t happen in the virtual setting, and I don’t know how we can make it work. Perhaps hold virtual lunches?

Disconnect from other work: One advantage of going to a conference somewhere is that I (mostly) disconnect from other work and dedicate those few days to being immersed in the conference. Since I was home, it was much harder to disconnect. I felt as though the conference was the side show to my usual work. This is probably something I can learn to do…

Virtual conferences potentially have advantages over physical conferences, for example, in the fact that they open their (virtual) doors to more people, some of whom would not be able to travel to a physical conference. As more conferences will be moving to a virtual setting, at least for the next few months, we should give more thought to how we maintain the benefits of the physical conference as well as realize the benefits of being on a virtual conference.

This ad may save your life

Internet advertising systems know a lot about us. If you want to know how much, head to Google’s web page on ad personalization (https://adssettings.google.com/u/0/authenticated). On a recent visit I found out that Google knows of my upcoming travel plans to Italy and Texas, a few of my hobbies, and several academic topics that I’m learning more about. Unsurprisingly, it wasn’t correct on everything (I’m not into extreme sports), but it knew a lot more than I would have imagined before I visited their web page.

Over the past few years our and other research groups have shown that interactions with search engines can be used to screen people for a variety of serious medical conditions, both mental and physical. These include depression, eating disorders, Parkinson’s disease, several types of solid tumor cancer, and diabetes. However, informing people about these inferences is challenging both technically and ethically.

This week we published a paper (https://dl.acm.org/doi/10.1145/3373720) showing how to leverage the information that Internet advertisers have about us, to screen for 3 types of cancer. Our results suggest that it’s indeed possible to screen people for their likelihood of suffering from cancer before they are diagnosed by a doctor.

Here’s how it works: when an advertiser uses Bing or Google to advertise, they select keywords such that when a user searches for these keywords their ads are shown. A more sophisticated form of advertising happens when, in addition, advertisers tell Google or Bing whenever a user who saw the ads buys the product they were trying to sell. When advertisers do this, the advertising systems learn to predict who, among all people use the keywords, are likely to buy a product (technically this is known as conversion optimization). This learning is based on the information that advertising systems have about users, including their interests, locations and demographics.

What we did was to leverage this capability and use the advertising system to screen people for cancer. We achieved this by showing an ad to people who searched for information on self-diagnosis of lung, breast or colon cancers. The ad suggested help in understanding the severity of the symptoms that people were experiencing. People who chose to click the ads were directed to a website where, after explaining the experimental nature of the system and asking for their consent, they were given a clinical questionnaire about their demographics and symptoms. People who answered the questions were given one of two indications: either that they should urgently seek medical attention, because their symptoms were deemed serious, or that it was likely that their symptoms weren’t indicative of cancer but medical advice should be sought, though not urgently.

When the questionnaire indicated that a person was likely suffering from cancer, we informed the advertising system that the person “bought” our “product”. Within 3 weeks, the advertising systems learned to focus on those people who probably have cancer, such that approximately 1 in 10 people who completed the questionnaires were likely suffering from it, up from the baseline rate of under 1%. This rate was similar for all three types of cancer.

The use of ads offers a method for interacting with people who might be suffering from as-of-yet undiagnosed cancer. By providing ads with an offer of help and empowering people to select whether or not they wished to receive this help, we overcome many of the ethical challenges associated with unsolicited diagnosis. Our use of the sophisticated capabilities and knowledge about users that advertising systems have, allows us to identify people with serious disease, without having to have access to sensitive individual-level search data.

Interestingly the people who use the system most came from countries with high Internet use and lower life span. The latter is a known proxy for the quality of the health system.

Many health organizations use internet advertising for awareness campaigns and for campaigns designed to encourage healthier behaviors. Our results lead us to suggest that health systems should leverage the information that advertising systems collect about people in order to improve population level screening programs.

Jeffrey Hammerbacher (at the time at Facebook) once commented that “The best minds of my generation are thinking about how to make people click ads. That sucks.” Let’s make use of the products of those great minds to improve outcomes for people with serious disease.

Multi-season analysis reveals the spatial structure of disease spread

One of the most common ways to model the spread of an infectious disease in a population is through compartment models, so called because they divide the population into compartments, with each person residing in one compartment. Perhaps the most common variant is the Susceptible-Infected-Recovered (SIR) model, where people are in one of those 3 compartments. A simple set of 3 differential equations describes the movement of people between these compartments. Thus, for example, the number of infected people in the next time step is dependent on the number of currently susceptible individuals, the number of infected people they come into contact with, and the infection rate, minus the number of people who recover in a time step.

However, in most cases people don’t just belong to one compartment, because populations are not homogenous. For example, it makes sense to divide the population not just to the SIR compartments but also according to the country they live in.

Today we publish an extended SIR model which can model homogeneous populations, divided, for example, by area of residence, age group, etc. By fitting the model to Google Trends data for two common viruses, we reveal information about the complex spatial structure of disease spread.

The viruses we rested were Respiratory Syncytial Virus (RSV) and West Nile Virus (WNV). No COVID-19 data here. Sorry.

Although we make no prior assumptions on spatial structure, human movement patterns in the US explain 27%–30% of the estimated inter-state transmission rates. The transmission rates within states are correlated with known demographic indicators, such as population density and average age.

Our model also allows prediction of disease spread in subsequent seasons using the model parameters estimated for previous seasons and as few as 7 weeks of data from the current season.

The work was done mostly by our then intern, Dr. Inbar Seroussi.

The full paper:
https://www.sciencedirect.com/science/article/pii/S0378437120301692?via%3Dihub