Today, something a little different and not entirely related to health: Antisemitism. It’s not entirely divorced from health either, as the bones of my forefathers, scattered from Spain to Poland will testify, but thankfully these days the physical aspects of antisemitism are on a somewhat less grandiose scale than in previous generations. I used Google Trends to see which Jewish conspiracies were searched in different countries. Unfortunately, it isn’t always easy to capture an entire topic with a single query, so I couldn’t encompass all the hate around this issue, but here is the volume of queries since 2004 for several antisemitic tropes:
The Protocols of the Elders of Zion
Zionist Occupation Government
Prevalence of queries for common Jewish conspiracy theories
It’s interesting to see that each conspiracy has its own fan base, though some countries (perhaps owing to Internet penetration and population size) are represented in the maps of more than one conspiracy. It’s also notable that there is little correlation between the size of the Jewish population in a country and the volume of antisemitic searches therein: Pakistan, Norway, etc., have tiny Jewish populations (if any) and are not neighbors of Israel, and yet too many people in those countries have a favorite Jewish conspiracy theory.
How prevalent are those “theories”? It’s difficult to say with confidence. Querying for something doesn’t mean that a person believes in it, only that they are interested in the topic. Google Trends data has a few other drawbacks. Nevertheless, anecdotally, Jewish conspiracy theories seem to have volume (worldwide) of the same order of magnitude as common anti-Muslim theories. However, there are around 100 times as many Muslims as there are Jews.
Is there a lesson here? I don’t know. Perhaps it’s just that antisemitism is too common and that it manifests itself in a variety of ways. Perhaps it’s another demonstration that online activity into all aspects of human behavior, even the less savory ones.
China has, once again, instructed Bing to turn off the autosuggest feature of the search engine. The reason given by China’s State Information Office is, to quote from TheRegister article, that “Bad use of algorithms affects the normal communication order, market order and social order, posing challenges to maintaining ideological security, social fairness and justice and the legitimate rights and interests of netizens.”
I don’t know the details of why the Chinese government asked to remove autosuggest, nor whether and why Bing complied, but it seems to me that there is a lesson here for search engine operators and for people interested in algorithmic fairness.
Search engines are perhaps the most widely used internet service. They’ve replaced libraries for many of the information acquisition tasks we perform. When Google started, its stated mission was “to organize the world’s information and make it universally accessible and useful.” This implies that the results it provides reflect the world’s information. Indeed, many writers (e.g., this one in The Atlantic) wrote about the idea that search engines are automatic and reflect the knowledge available in the world. More recently, Google’s CEO said in testimony to the US Congress that “We use a robust methodology to reflect what is being said about any given topic at any particular time. It is in our interest to make sure we reflect what’s happening out there in the best objective manner possible. I can commit to you and I can assure you, we do it without regards to political ideology. Our algorithms do it with no notion of political sentiment.”
Unfortunately, as anyone in this business knows, a lot of manual work goes into an automated search engine. That manual work is done by people who have opinions, as do their managers. These people’s opinion can affect the results, and there is currently quite a lot of evidence that results don’t reflect the world’s knowledge anymore. Instead, they reflect the world as some people would like it to be.
I could provide many examples that seem to have this bias, but let’s take one of my favorites: Consider the search results for the innocuous query “Renaissance Europe art” below. Before you do, think to yourself what art you think should be shown. Botticelli? The Mona Lisa? The Sistine Chapel?
Now click on the spoiler below to see a screenshot of the Bing results for this query.
Click to see the Bing results
Notice the preference for paintings of particular people?
It seems to me that governments have taken notice of the fact that “algorithmic results” are no longer algorithmic (and in fact, they probably never were entirely algorithmic). If results are human-generated, they say, why shouldn’t we, the representative of the people, decide what the results should be? Why should workers at internet platforms who may have specific views of the world get to decide that these are the “right” views?
This is a logical argument, though the devil, as they say, is in the details. If a government takes a heavy hand and decides to censor views it doesn’t like, what happens to how people learn about the world? There are parallels between this problem and that of book banning at public libraries (see, for example, this overview), especially now that search engines have replaced libraries.
It is hard to say if this situation could have been averted and if so, perhaps this Pandora’s box has already opened. But I do wonder if a little more modesty in changing algorithmic results would have prevented the place where we are at today.
I work for Microsoft, which operates Bing. The views in this post (as indeed, the entire blog) are my own and not those of my employer. I do not have any inside information on which queries undergo manual editing.
The other day I was talking to a friend of mine, a senior medical doctor at a research hospital. We were discussing clinical trials and how the recent staff shortages in the US made it difficult to start new clinical trials there. He mentioned off hand that clinical trials have been difficult to do in Europe for a few years now because of GDPR.
The European Union’s General Data Protection Regulation, or GDPR, is a regulation on data protection and privacy. It provides people with rights related to their data including, for example, the right to ask companies for data they collect about an individual (Article 15). GDPR is implemented in countries of the European Union, members of the European Economic Area and other countries which chose to implement it. The latter group includes Andorra, Argentina, Canada (only commercial organizations), Faroe Islands, Guernsey, Israel, Isle of Man, Jersey, New Zealand, Switzerland, Uruguay, Japan, the United Kingdom and South Korea.
The flip side of GDPR is that, for both companies and other organizations, it’s much harder to collect and process data. This may be a good idea when we’re thinking about companies which use these data to sell us more stuff, but it may be that these regulations have a less than beneficial effect for medical science. I wanted to see if there’s evidence for the latter.
In recent years medical researchers have begun registering their clinical trials on the US government’s ClinicalTrials.gov website. This can help patients find relevant trials, improve recruitment, and also reduce the likelihood of cheating (see Ben Goldacre’s wonderful talk on this subject). I took these data and extracted from them the country where each clinical trial is held (some clinical trials are held in multiple countries and I accounted for those) and the date at which it was first registered.
The figure below shows the number of clinical trials registered each month between January 2010 and July 2021. I divided the countries where the trials were held into three groups: the United States, countries where GDPR was implemented, and all other countries of the world.
In the graph the light colors are the number of clinical trials per month prior to the implementation of GDPR and the dark colors are the same numbers after it. I’ve also fit linear regression curves to each of these. As one can see, the number of clinical trials up to May 2018, when GDPR was implemented, rises slowly. Interestingly, it rises more slowly in the US than in the other two groups.
After May 2018 the rise in the number of clinical trials in the US and in countries where GDPR was implemented abruptly stops and flattens. However, in countries which did not implement GDPR (and are not the US), the pace of growth rises dramatically and accounts for the expected growth in both this group of countries and most of what we would have expected in the previous two groups. It seems as though GDPR put a break on clinical trials in countries where it was implemented, as well as in the United States.
Which countries benefited from this move from the US and GDPR-implementing countries? To test this, I computed for each country, the fraction of clinical trials conducted after the implementation of GDPR from all trials in the registry. I only looked for countries which had at least 500 clinical trials in the data. The 5 countries which had the largest fraction of trials post-GDPR are Pakistan, Egypt, Turkey, Indonesia, and China. Unfortunately, these countries are not bastions of human rights. According to Freedom House they are judged either “Not free” or “Partly free”.
Thus, it seems that one of the negative aspects of GDPR was the movement of clinical trials from countries which implemented it to those which did not. Whether this is a price worth paying is a personal judgment. To me, it seems that GDPR must be changed so that studies which improve the lives of people should be able to continue even at minimal cost to data privacy.
The current state of things reminds me of a story, possibly apocryphal, told to me by a lecturer during my graduate studies: A colleague of my lecturer who was a pain researcher from one of the industrialized countries took his sabbatical in Libya. This was, I think, in the late 1980s. My lecturer said that he asked the researcher, “why Libya?”. The reply was “it’s easier to do work there”…
Let’s not have GDPR cause medical research to move to countries which don’t take human rights seriously.
Caveat: I know there may be confounders that appeared at similar times. This isn’t a scientific paper, so take my explanations above with a grain of salt.
One of the more recent ideas was to repurpose an anti-parasitic medication, Ivermectin, to treat COVID19. This drug is licensed for use in both humans and livestock, leading to the derogatory “cow dewormer” moniker. The evidence for effectiveness of this drug came initially from lab studies, but doses were far greater than approved for human use.
Several randomized controlled trials followed, with the most recent meta-analysis finding an interesting outcome: Studies in some countries outside the US found the drug to be effective, while those conducted in the US did not. It may be that in countries where parasitic infections are common, treating these infections helps people defeat COVID19, but it doesn’t help those who don’t have it.
Nevertheless, some media channels and politicians recommended using the drug, and if you believe recent media stories, many people decided to use ivermectin rather than chose the more effective solution and vaccinate against COVID19. It seems that overdoses of the drug became more common.
However, I wanted to see, how many people were really interested in ivermectin, compared to the vaccine?
As usual, I looked at Google trends data (at the state level) for Ivermectin, Hydroxychloroquine, and the COVID19 vaccine. The volume of searches for ivermectin is negatively correlated with interest in the vaccine during 2021. However, there is no such relationship for hydroxychloroquine. In the graphs below the axes are search volumes.
Second, interest in ivermectin is small compared to interest in the COVID19 vaccine, even in the states where it had the highest search volume. Below are figures for the entire USA and for Oklahoma.
I tried to see if the voting results for the presidential elections in 2016, 2020 and the current governor of each state were a predictor of the search volume for the vaccine or for ivermectin. The most predictive factor for the 2016 election results is interest in vaccination. The accuracy of the prediction is very high (Area Under the Receiver Operating Curve of 0.91), meaning that more interest in the vaccine correlated with voting for a democrat in the 2016 elections. Outcomes of the 2020 elections are much harder to predict using interest in the vaccine (AUC=0.64).
Interest in hydroxychloroquine doesn’t predict election results, but search volume for ivermectin, and even better the ratio of search volume for ivermectin to the volume for vaccine predicts the 2016 election results (AUC=0.86). Here higher ratios of ivermectin to vaccine searches predict a vote for Trump.
What do all these findings show?
To me, the most interesting finding is that support for former-President Trump is a strong predictor of interest in ivermectin over vaccines. This is somewhat similar to my previous blog post and to a study, about Israeli politics.
As an aside, it seems to me that the ivermectin story was somewhat overblown up by media. Interest (as measured in search engine data) was much lower in actuality.
The rate of COVID19 vaccination is strongly correlated with party affiliation. Specifically, the Kaiser Family Foundation found that people in counties that voted for Biden during the last elections had significantly higher rates of COVID19 vaccination compared to those who voted for Trump.
I manually labelled the towns and cities as to whether they were predominantly Jewish or not. I also computed the percent of voters in each location who voted for the current coalition government.
Here are a few results. First, in predominantly Jewish towns and cities vaccination rates are strongly correlated with income, but even more strongly (and significantly statistically more so) with voting for the current government.
In predominantly non-Jewish cities the picture is more complicated. First, the correlation is much lower than the one we observed in Jewish cities. More interestingly, while income is still correlated with vaccination rates, voting for the current government is negatively correlated with vaccination rates.
A linear model of the data (with interactions) bears this out:
The model for Jewish towns reaches R2 of 0.67, which is extremely high. The statistically significant variables are vote for the coalition (positively correlated), Gini index (negatively correlated), and the interaction of income with the Gini index (positive) and with income (negative). Therefore, cities that voted for the government and had less inequality were more likely to vaccinate.
The model for non-Jewish towns reaches a lower R2 of 0.46. Here the statistically significant variables are vote for the coalition (negatively correlated) and the interaction of the Gini index with voting for the government (negatively correlated). This means that the most indicative variable for vaccination rate was not voting for the current government and, in cities that have more inequality and higher income this is even stronger.
My understanding from these results is that, in Israel as in the US, voting is correlated with vaccination rates. I don’t think, however, that one is causal of the other. Instead (at least in Israel) there is probably a third variable driving both. For example, the Arab party which joined the coalition is the Islamic party, who’s voters tend to come from populations with lower income and that live in areas with less access to healthcare. In the Jewish population, one of the main blocks not part of the current government is the Ultra Orthodox, who are also less likely to vaccinate. They are also poorer than the general population.
The bottom line? Vaccination rates in Israel are correlated with political affiliation, but perhaps for different reasons than those in the US.
Note: The following is somewhat different from my usual blog posts because it doesn’t involve internet data. It’s my analysis of publicly available health data which I did to answer a question I had.
The current phase of the COVID19 pandemic is affected by several trends which are driving the pandemic in opposing directions. One the one hand, the vaccination rate is high in many developed countries. On the other, new strains such as the Delta strain are more infective and the vaccines are thought to be less effective against these strains (even though they are still highly effective!).
Here is a plot of four indicators (source) of the pandemic: Number of daily positive cases, hospital admissions, ICU admissions and deaths. They are smoothed using a 7-day moving average.
On average, hospital and ICU admissions are best correlated with daily cases when those are taken 7 days later (that is, it takes around a week until a case is hospitalized), and another 7 days until deaths occur.
Therefore, I used the daily positive data to predict both hospital admissions and deaths at the appropriate lag (7 and 14 days). In both cases I used a non-linear model (second order polynomial to predict the quadratic root of the dependent variables) trained on data until the end of April 2021. The models had a good fit (R2=0.69 and 0.52, respectively).
Here are the actual and predicted hospitalizations, compared to case numbers:
As we can see, hospital admissions are rising since mid-May, but not as fast as the prediction. We would expect around 170 people to be hospitalized at this point, but there are around 45. That’s around one quarter of the expected number.
A look at deaths is even more telling:
Deaths have risen very slightly: We would have expected almost 40 per day at this stage, but are seeing around 2 (that’s one twentieth of the expected!).
My takeaway from this is that we will see a rise in hospitalizations and in deaths, but it will be much smaller than in previous waves of COVID19, especially in terms of deaths. The vaccines are providing significant protection against the worst aspects of COVID19.
There are reports of a Respiratory Syncytial Virus (RSV) outbreak in Israel. RSV is a virus which causes a flu-like illness and is especially dangerous for children. What’s strange about this outbreak is that it’s happening in early summer, whereas previously RSV outbreaks always happened in winter.
I was wondering if this is something special to RSV and to Israel or perhaps something bigger?
Luckily, a few years ago we looked at the association of search engine query volume and the incidence of RSV and found that it was quite high. Therefore, I extracted Google Trends data (using the Google Trends Anchor Bank toolbox) for RSV from the US, United Kingdom and Canada and plotted it below:
However, starting from April 2021 there is a dramatic rise of RSV in the US and UK, but not in Canada. Thus, Israel is similar to US and UK, but Canada seems an outlier.
Is there something special about RSV?
Here are the time series for several other seasonal viruses in the US:
Here we see similar correspondence, except for two outliers: First, common cold queries happened in the winter of 2021, but to a lesser extent. Second, RSV is rising, but so is norovirus, which started earlier and may already be on its way down.
Here is another virus, Rabies, compared to RSV. Rabies usually spike in summer, and in the summer of 2020 there was no spike. This year, however, it seems to be rising to normal levels. Note that it is unlikely that the search query volume for rabies represents rabies cases, as it does for RSV. Even though there is evidence for seasonality of rabies, in this case it probably reflects worry about rabies due to close contact with mammals.
What’s happening here? Perhaps opening up for social gatherings in Israel, US, and UK have enabled RSV and other viruses to spike. We are looking into whether there is supporting evidence for this question.
These findings raise the interesting question of why RSV (and other viruses) occur in winter? Is it because of the colder weather which causes people to congregate indoors and perhaps constricts our airways? Is it because there is some level of immunity in the population which slowly decays over the year until, early in winter, it is low enough for an epidemic to begin?
COVID19 may allow us to resolve this question.
(Special thanks to Prof. Lev Muchnik for interesting discussions on this topic)
“The President, Vice President and all civil Officers of the United States, shall be removed from Office on Impeachment for, and Conviction of, Treason, Bribery, or other high Crimes and Misdemeanors.” Article II, Section 4, US Constitution
Professor Anat Rafaeli and I teach a course at the Technion which is intended to teach students with a background in psychology the tools of Data Science and specifically how to answer research questions from the social sciences with internet data. As part of the course, students chose a research question and answer it during the semester, using the tools we teach them.
A couple of years ago, while explaining their research question, one of the students (whose name I don’t have anymore, but will be happy to add him, if he’s reading this) showed us an intriguing chart: It displayed the search volume (from Google Trends) for the term “impeachment” for the period of a few months before and after President Trump was inaugurated. There were several spikes during that year from people searching for how to impeach the president. I didn’t find this particularly surprising given the news coming from the US.
What was surprising was a similar chart he showed us for the same period around President Obama’s inauguration. It showed similar spikes! I hadn’t heard of anyone who wanted to impeach Obama, so that spike was shocking to me.
Recently I repeated the exercise this time adding data from similar time periods around President Biden’s inauguration (Technical note: For the first two presidents I used the “impeachment” topic, while for Biden I used the term “impeach Biden”, to exclude searches related to Trump’s impeachment trial). You can see the results in the figure below.
As you can see, each president has people searching for how to impeach them, first around the time that they are elected and then around inauguration. After these two events “impeachment” spikes every so often (as you can see in the Obama and Trump spikes at the end of May). More broadly, here’s Obama’s entire second term:
Who are these people, who are so eager to impeach their president?
We can try to answer this question by looking at the correlation between how each state voted in the recent Presidential elections and the percentage of people searching for impeachment of a president. The graphs below plot the percentage of people in each state who voted for Biden as president (i.e., roughly speaking, are Democrats) compared to the search volume for impeachment.
People who wanted to impeach Obama where mostly from republican states, as shown by the negative correlation between search volume and the percentage of Democrats in the state. The opposite is true for people who search for impeachment during Trump’s first months in office. With Biden the correlation is much worse, but the data is skewed by a single point when, if removed, is again reasonable (R2=0.13): For some reason, the mostly Democratic voters in Vermont are those searching more often for Biden’s impeachment.
What’s the bottom line? If you don’t know anyone who thinks your favorite president should be impeached, you just don’t know the right people.
Our newest paper suggests an intriguing possibility: We may be able to predict a stroke event by observing people’s activity on a search engine.
What does the evidence show?
We started with a group of anonymous Bing users who, at some point, indicated in their queries that they had undergone a stroke. We filtered these users to those who were active pretty much every day, then they were inactive for between one and several days, and indicated their stroke just after that inactivity period. We hypothesize that the inactivity was due to their stroke which happened just after they disappeared. We then tried to separate these users from other users, some who were of similar ages and others who indicated having other medical conditions.
To separate the users we represented them through a variety of attributes such as the time of day of queries, the time since their previous session, etc., but more importantly, attributes which were previously linked to cognitive decline such as the complexity of queries, the deepest link that was clicked, and more.
We found was that it was quite easy to separate these populations of users. Of course, it may be that there were other things that were different between these populations even though we took care to select them in the same way that we chose the stroke population. However, we did find that people with cardiovascular diseases were harder to differentiate from the stroke population than people with other conditions.
We also applied our model to data that was collected a year later. Here we didn’t have many people who indicated a stroke, so used a weaker label, which was the number of times each user was interested in stroke. This is an indicator that was used in the past to find people who are suffering from cancer. The model successfully found those people who are interested in stroke, just by looking at the meta data of their queries, through attributes such as those described above.
Predicting when a stroke will occur
It seems that it’s possible to differentiate populations of users who will undergo stroke from others. Can we also localize the stroke in time? That is, can we predict if a stroke will occur within the next few days?
The results here are not as strong, but they do indicate the possibility of localizing the stroke event. According to our findings, in in the 3-4 months prior to the stroke event something begins to change and peoples attributes begin to be more similar to those of people who will undergo a stroke. This could be because of microstrokes or other cardiovascular events.
The summary of our findings is the intriguing possibility that stroke causes cognitive changes some time before a stroke happens, and that these changes can be identified through people’s interactions with search engines. If this is true, the upshot could be dramatic: We may be able to prevent stroke by analyzing people’s queries and, if they indicate a possible event in future, have doctors prescribe simple medications such as aspirin. As our medical partners (Prof. Stern and Dr. Shaklai) said, there’s a lot to do before stroke and not a lot after it.
However, all our data is derived from queries of people who indicated their health conditions. We don’t have their medical records. Therefore, we’re now trying to set up a clinical trial which will collect both query data and medical records from people and validates our hypothesis.
As I write this blog post (May 18th, 2021), more and more people have been vaccinated against COVID-19. In the US, 47% of the population ever seized the first dose of the vaccine and another 37% are fully vaccinated (https://usafacts.org/visualizations/covid-vaccine-tracker-states/). In Israel around 70% of the population are fully vaccinated.
I thought to try and find out what people were most looking for now that COVID-19 will not be a risk for them. I started with Google’s autocomplete:
As you can see, some people want to know how to deal with the immediate aftermath of the vaccine. They ask about Tylenol and other pain medications, but also how soon they can eat, drink, or smoke. Many people ask about things they could do before COVID-19 but not during the pandemic. These include travel and exercise (presumably at the gym).
A fun exercise is to look at these needs across U.S. states and across different countries of the world. To do this, I queried Google Trends for the volume of queries for each of these needs (e.g., “after covid vaccine can I smoke?”) during the past 3 months and also for the volume of queries beginning “after covid vaccine”. The latter served as a baseline. I calculated the ratio between these two volume indicators given for each state. On a technical note, Google only gives a normalized score for each of the volumes, so we can’t treat this as excess searches per-se. Also, if the volume of queries is too low Google does not provide a number and these are missing data for us.
Interestingly, the correlation between query volume for “after covid vaccine” and the percentage of fully vaccinated people in each state is quite high at 0.80, and only sightly lower (0.78) with the percentage of people who received at least one shot. Therefore, this does seem like interest by people who are getting their vaccines.
Here are maps for these ratios, first for the immediate interests and then for the longer-term ones:
Gray countries are those for which there were too few queries. Colors represent how much more volume there was for the query in the title compared to the query “after covid vaccine” (scale is on the right of each image).
Side effects seem to worry everyone, but the least likely to be worried are people from South Dakota, Maine, Montana and Nevada. Californians and New Jerseyites really want their Tylenol. Once they stop worrying about their vaccines, many Texans would like a smoke and a drink (but drinking is also a favorite in California and Ohio).
As for the longer-term wants:
Gray countries are those for which there were too few queries. Colors represent how much more volume there was for the query in the title compared to the query “after covid vaccine” (scale is on the right of each image).
Californians (of course?) want to go back to the gym. Travel is yearned for in Georgia, New York and Washington.
Worldwide the data is much sparser. This is probably because I’m looking at queries in English. Nevertheless, here are some findings of note: Folks in the Philippines would like to go back to the gym. Alcohol is sought by people in Mexico, UK, India and (presumably expats) in the UAE. This is also true, albeit to a lesser extent, in Canada and Australia. Travel features high on the list for UAE, Canada and Australia.
What does all this mean? Probably not much beyond the obvious, but it’s still fun to see it in the data.