Uncategorized – Crowdsourced Health

June 23, 2025June 23, 2025

The slow death of search engines

Search engines are the bread and butter of the internet. Over 90% of internet users turn to them during their activity online. Large Language Models (LLMs) are increasingly taking over the roles traditionally performed by search engines. Indeed, when giving talks to general audiences over the past couple of years people I increasingly often get asked, to what extent are LLMs replacing traditional search?

First, it’s worth noting that there is no clear division between LLMs and search engines, because search engines increasingly provide LLM outputs in their results. However, putting that aside, let’s try to estimate what the current status is.

One piece of evidence was provided by Apple’s senior vice president of services who said that there was a decline in the use of Google for the first time in 22 years. Where is that traffic going? Undoubtedly to LLMs.

A more quantitative answer might be gotten by looking at people’s searches (and yes, I do realize the irony of this idea). This is a reasonable tool because the volume of searches is a good proxy for a services’ popularity.

Using Google Trends, we can look at the (worldwide) search volume for Google versus that of ChatGPT (shown in the figure below). We find that Google’s search volume is slowly going down, while that of ChatGPT is increasing. I don’t think we can infer from this that ChatGPT will soon cross Google’s popularity (because Google is accessed in more ways than ChatGPT is), but this gives a feel for how things are changing.

Figure 1: Google Trends search volume for Chat GPT (blue) and Google (orange) since September 2023. Linear trend lines are shown for each of the search volume data.

This switch to LLMs isn’t uniform across countries. Figure 2 shows the ratio of search volume for ChatGPT divided by the search volume for Google over the past year. As the figure shows, the most drastic changes are in Latin America and south Asia. Interestingly, what we’d consider “industrialized” countries are not the top countries in this graph.

Figure 2: Preference for Chat GPT over Google according to relative search volume

Of course, ChatGPT isn’t the only LLM out there, but it’s the most well-known, which is why I looked at it. However, when I tried other LLMs I found significant regional variations depending on the LLM used. For example, according to Google Trends, DeepSeek is popular in China, and less so outside it.

Why should we care? I think there are a few reasons:

First, LLMs are starving the web for new content. LLMs need data to learn. If data on the web is increasingly LLM-generated, then LLMs are learning from themselves, leading potentially to a downward spiral of quality. But this is probably true only if you assume LLMs provide lower quality or less novel results.

Second, there is a financial aspect. Websites rely on traffic for their revenue. If LLMs take most of the traffic, many other websites will suffer. This is already an issue with Google’s LLM results.

Finally, as a research community we have had a good run designing, running and studying search engines. Have we reached the end of that road?

May 28, 2025May 28, 2025

How to cheat your way to a top-tier Computer Science publication

(In case this isn’t clear, the first part of this post is written in bitter irony. The second part might actually be useful to Conference Chairs)

So, you want to have your name on a paper that’s published in a top-tier Computer Science conference? You have two options: One is to work hard on some worthwhile research, submit it to a conference, and eventually perhaps get it accepted. This post focuses on the other possibility: Simply cheat!

The most important thing to know about Computer Science conferences are that the people running them assume good intentions. If yours are not, it is easy to take advantage of this basic assumption.

Here are several tried and tested methods (each with an easy name for you to remember them by):

The cuckoo method: Put your name on another person’s paper. Specifically, ask a colleague to put your name on their paper. No one checks if your name should be there, and even if they did, it’s impossible to verify that your name ought to be on the paper. Bonus: If you have your own paper, offer your colleague reciprocity!
Idea laundering: Plagiarizing is bad, but copying with minor changes is bad and difficult to detect. This works better in lower-tier conferences and with older papers. Here’s how you do it: Find someone’s obscure paper and submit it as your own after making minor changes to the title, text and perhaps the equations and figures. This also works well for your own previously accepted papers.
Parallel parking: Most conferences don’t allow the submission of papers to multiple conferences in parallel. Anyone who’s worked on parallel processing knows this is sub-optimal, if you are optimizing to get your paper published. Therefore, submit your paper in parallel to a few conferences. There’s a lot of randomness in acceptance, so one of the conferences may take your work. The likelihood that Chairs of one conference will compare notes with other Chairs is very small, so your risk is miniscule. On the safe side, it is recommended to change the text slightly whenever you resubmit the paper.
50 shades of same: Generative AI models are great at rephrasing. Use them to create several versions of your paper and submit them all to a conference of your choice. Conferences are a raffle, and you’re just buying several tickets to it.
Scratch my back and I’ll scratch yours: Many conferences require you to label people with whom you have a conflict of interest and therefore won’t review your paper. Some do this automatically. One way to get around this is to make sure to mark everyone as a conflict of interest, except three or four colleagues whom you told about your paper and know that they should expect to review it. The assignment system will have no choice but to assign your paper to your friends, and they’ll write glowing reviews to your paper. Alternatively, if you have to bid for papers, bid for your friends’ papers and have them bid on yours. If you have more than one friend (not obvious in CS), it’s not too difficult to arrange a ring of reviews: You bid on friend A’s paper. Friend A bids on Friend B’s papers, and Friend B bids on yours. That way, it’s much harder to uncover.
You do you: Like the previous, why not create your own fake profile on the conference reviewing platform and volunteer to review? If you work your conflict of interests well, you’ll be chosen to review your own work which is, completely objectively, amazing, isn’t it?
Bait and switch: Most conferences allow minor changes to be made to papers after they are accepted and before they are published in the proceedings. Take your accepted paper, switch it with another paper of the same title (or remove most of the contents from your accepted paper to save it for the next conference) and submit that as the paper for the proceedings.

What to do if you are caught: First, relax. The chances of you getting caught are miniscule. Most Conference Chairs assume fair play and aren’t interested in catching bad behavior.

On the off-chance that a reviewer finds that you plagiarized, or that some PC Chair goes out of their way and detects your shenanigans, your first response is not to respond. Ignore any emails for a while. The additional work may cause the Chairs to go away.

If they persist, accuse them of being bad at their job. Attack is the best defense.

In the worst case, lay the blame on the most junior person on the author list (Bonus: Have said junior author admit guilt in writing to the Chairs). This is because the Chairs are less likely to want to pursue ethics charges with a junior colleague so as not to ruin their career. Important note: Make sure you are not the junior author!

Most often your paper will be rejected from that conference (but if you’ve been following the above, you’ll have submitted it to plenty of other conferences, so it is not really an issue). There will be no record of your transgressions. Only very rarely will the Chairs pursue steps with their organization’s ethics committee. If that happens, repeat the above process.

On a more serious note

All the above examples are things that either I saw or heard first-person accounts from those who saw it themselves. The main problems are, in my opinion, that we don’t give academic dishonesty enough attention. Granted, over 90% of authors are honest, but if we don’t take care of the offenders, the problem will only grow as it is essentially a free rider problem). Therefore, if you’re a Chair of a conference, I recommend:

Be very explicit in your Call for Papers. State what’s not allowed even if it seems obvious to you.
If your organization supports it, check to see if authors are listed as ones who are barred from submitting because of past transgressions.
Do not allow any change in authorship (including ordering of authors) after submission. If the authors “forgot” someone, suggest they withdraw the paper and resubmit it to another venue in future.
Work with the Chairs of other conferences in adjacent areas that are running within the same 6-month period (before or after your conference). Compare the papers submitted to your conference with the ones submitted to theirs. It is enough to measure the Jaccard distance of titles, authors and abstracts to catch many offenders.
Make sure that you set the review system such that it doesn’t allow people to match to a small number of papers.
Compare reviewed papers to camera-ready papers and reject those who made significant changes.
Consider having an Ethics Chair to take care of these issues, as their volume may overwhelm the PC Chairs.
Make sure to submit complaints to the relevant bodies, such as the ACM Ethics & Plagiarism Committee (https://www.acm.org/publications/publications-board-committees), the IEEE Plagiarism Information Center for IEEE Publication Volunteers (https://www.ieee.org/publications/pet) and the AAAI Executive Council (https://aaai.org/about-aaai/ethics-and-plurality/). It’s a lengthy process, but one that critical to go through.

Any other methods that I missed? Do you have ideas to combat these issues? Please do comment!

April 30, 2025April 30, 2025

Insights from serving as one of TheWebConference 2025 Program Chairs

TheWebConference 2025 began today, marking the culmination of almost a year’s work for Helen Huang, Liane Lewin-Eytan and myself as the Program Committee Chairs. In the opening session today, we gave a few statistics which may be of interest for others engaged in similar endeavors and perhaps to scientists in general.

As the figure below shows, we received 2846 Abstracts and, a week later, 2496 full papers. As we approached the deadline the rate of submission increased exponentially, as shown in the figure below.

Number of submitted abstracts as a function of hours to the deadline.

We desk rejected 434 papers because they did not adhere to the guidelines (e.g., not anonymous, over length), submitted in parallel to multiple conferences, or irrelevant to the conference. Thus, 2062 papers were sent for reviews and following the review process, 409 were accepted to the conference. The acceptance rate was therefore just shy of 20%, in line with previous conferences.

The paper funnel at TheWebConference 2025

We sent each paper to 5 reviewers and, on average, received 4.6 reviews per paper. We had at least 3 reviews per paper and, when we didn’t receive the reviews in time, we assigned emergency reviewers, so a few papers received up to 8 reviews. Nevertheless, reviewer load was very reasonable, with 3.3 on average.

The figure below shows the distribution of the number of authors per paper. Interestingly, accepted papers had a median of 6 authors per paper, while that of rejected papers only 5 (statistically significant, Kruskal-Wallis test, P<10^-6). Perhaps one should invite more co-authors to increase the chance of success?

A histogram of the number of authors per paper

This year we recruited reviewers before the submission deadline, but we also required the authors of each paper to designate one author to review papers. As the figure below shows, these reviewers seemed to act strategically, assigning a lower novelty and quality score to other (competing) papers, compared to the recruited reviewers, and claiming to be more confident. However, the difference in scores was low (0.2 points or less on a 7-point scale), so probably meaningless.

Incidentally, the figure was created by matching reviewers by papers, that is, we calculated the difference in scores in each paper and then aggregated across papers.

Other significant differences were that males were slightly more confident than females and senior reviewers (e.g., professors) were more confident than junior ones (e.g., students). Perhaps also of note is that there was no difference in novelty and quality scores given by males and females, and by junior and senior reviewers.

Difference in scores between reviewer groups. Only statistically significant differences (P<0.05 with Bonferroni correction) are shown.

After the review period we had a response and rebuttal period, where authors could comment on the reviews and clarify outstanding issues. As the figure below shows, 15% of reviews were changed by the end of this period, with 46% changing the novelty score, 67% the quality score and 8% the confidence score (some changed more than one score), with the vast majority increasing their scores.

Changes in reviews during the rebuttal period

The NIPS experiment showed that the more reviews a paper receives, the less it is likely to be accepted. We tested this with observational data (see below) and did not find such a trend (P>0.05). However, ours was not an AB test like the NIPS experiment.

Percentage of papers accepted and rejected by the number of reviews received. Data are normalized for each series (accepted/detected).

Finally, we tried to see if ChatGPT could save us the arduous review process. We sent 200 random papers to ChatGPT4o and asked it to score the papers with instructions similar to those given to reviewers. We ran two experiments, sending either the Titles and Abstracts or, separately, the full papers to the LLM. The ability of the scores given by ChatGPT to predict the outcome of the human review process were basically random predictions: The Area Under Curve (AUC) for the titles and abstracts was 0.58 and for the full papers 0.43. For comparison, the scores from a random reviewer would give an AUC of 0.8 and those of the average score of reviewers 0.9. Therefore, if we assume that the human review process is the standard we should be aiming for, ChatGPT is not a viable alternative.

October 7, 2024

Bias in studies of facial recognition bias

As few years ago there was a lot of publicity (e.g., NY Times) to papers which showed that AI systems, especially commercial ones, were biased against certain minorities. As is often the case in science, one paper that gets lots of publicity is followed up by additional papers that investigate the same or similar phenomena. Now that there are quite a few papers of this sort, it’s time to do a meta-analysis and see if there was substance to these claims.

Meta-analyses are useful tools to collect all the experiments that have been done about a topic and summarize them. Experiments vary in their size, methodology, data, etc. Meta-analyses try to pool them together and return a single message that takes all that’s known into account. As Ben Goldacre showed in his brilliant (and highly recommended) talk, there’s a simple chart that can be used to conduct this summary. It’s called a Funnel Plot.

In this two-dimensional plot, the horizontal axis represents the effect size reported in the experiment and the vertical axis the size of the population in the experiment. Each dot on the plot is one experiment. Small experiments (close to the horizontal axis) will have lots of noise and the effect size will be either larger or smaller than the true effect size. As one conducts experiments on more and more samples, the noise is reduced, and the results concentrate around the average effect size. The result should look like an isosceles triangle, where the top of the triangle is the true effect size.

What happens if there is publication bias? In that case, the triangle disappears: For example, it’s been shown that with medical drug trials, some trials go unreported (think about those done by the pharmaceutical company or it’s collaborators). It’s hard to hide the results of large trials, so the trials that go missing are small trials that showed a negative (or no) effect. In that case, the funnel plot will be missing some of the points on the left-hand side. In extreme cases the triangle will now be right-angled triangle.

I decided to do a small meta-analysis of studies that looked at bias in face recognition systems. These are systems that infer the gender of an individual from a photo, usually of their face. The claim was that these systems tend to make more mistakes when the image presented to them is of a female or of someone with darker skin color. To conduct the metal-analysis I used Google Scholar and Elicit to look for experiments that quantified bias in face recognition systems. I found 6 such papers, with a total of 98 experiments.

I quantified the effect size (the horizontal axis) as the logarithm of the ratio between the error of the presumably disadvantaged class (females or dark-skinned individuals) to that of the presumably not disadvantaged class (males or light-skinned individuals). The experiment size (the vertical axis) is the logarithm of the number of cases in each experiment.

You can see the result in the graphs below. They differ in their inclusion or exclusion of the point on the right-hand side which showed a very large effect. Regardless of the figure you chose, two things can be seen: One is that there are indeed missing experiments on the left hand-side, meaning that there’s publication bias in studies of bias… The second is that the effect size is around 26% (or 38% in the second graph). That is relative effect size, meaning that, for example, if the error rates for detection of males is 1%, the error for females is 1.26%. Significant, but quite small.

Funnel plot for all experiments. Blue dots are experiments where bias of females to males is quantified and orange dots are those where bas of dark skin to light skin is quantified. Effects larger than zero means that the errors of the disadvantaged class (e.g., females) is greater than that of the non-disadvantaged class (e.g., males)

Funnel plot excluding the rightmost point. All other details are as in the previous graph.

However, if my findings are correct, why are these studies missing? It’s hard to know for certain. It could be that there’s not enough work in this field. My guess is that people who found negative effects chose not to publish them or were unable to do so because publication venues prefer to show a specific message (here’s an interesting case in an unrelated field).

This is a very small study, based on only 6 studies (and 98 experiments), my decision on the “disadvantaged class” is arbitrary (though made in the papers), and there are very few datasets analyzed in the papers. So take this with a grain of salt (do we need meta-analyses of meta-analyses?).

However, assuming that there are missing experiments, my guess is that it’s by design. This is based on my anecdotal experience: Over the years I’ve worked with people from many scientific disciplines. There’s one scientific discipline where my experience has been a little strange: Ethical AI — the same field that’s interested in bias in face recognition systems. While I’ve worked with several excellent scientists in that field, I also had three projects where ethical AI researchers came to me for help in looking into specific questions in their field, using my methods and datasets. I analyzed the data and tried to do my best to see if they supported the hypothesis. When it turned out they didn’t, two of the researchers disappeared. They simply stopped responding. One told me outright that she wouldn’t publish data that goes against her beliefs.

What’s the takeaway from all this? Perhaps only the trivial one: When you hear a strong scientific claim, perhaps it’s better to wait a while until more evidence is collected before making one’s mind about that claim.

April 2, 2024April 2, 2024

Who are journalists writing for?

“Helping set the day’s agenda … that was a journalistic high point.” Walter Cronkite

Google search queries reflect people’s interest. Journalistic reports reflect journalist’s interests and perhaps the journalist’s attempt to set the agenda (as the quote shows). One would expect that people’s interests and those of journalists would be similar when they report on news events, since journalists are (ostensibly) living among their fellow citizens. But is that the case? Come with me to a short journey and learn how journalists try (unsuccessfully) to set an agenda about the war in Gaza.

I downloaded data about the volume of search queries about the Gaza strip from Google Trends, between September 1^st 2023 and March 31^st, 2024. I did this for each country with 20 million or more people. I used the “Gaza strip” topic, so it deals with different languages but, of course, not with differences in access to the internet. One example of the queries over time is here, from Canada:

Google Trends search volume for the “Gaza strip” topic, over time

I then computed the similarity between the search volume over time between countries. You can see the result below. In this graph, countries that are joined towards the bottom are more similar to each other in the change of their queries over time.

Similarity between search volume among countries

This graph (known as a dendrogram) shows some reasonable clusters: Muslim countries cluster in purple, and joining them is Turkey. France, Spain and Italy are together, as is the USA, Canada, UK, Australia, India and Germany. Perhaps we can use this to study similarities between countries? But I digress.

I then downloaded media interests, as computed by Google GDelt project. Similar to the Google Trends data, these provide the volume of media interests over time. The dendrogram for these data looks weird. There are fewer clusters that make sense. Egypt and Syria are together, but why Canada and Indonesia?

Similarity between volume of media publications among countries

I think this is because the media is no longer representing people’s interests, nor is it driving them. To test this, I normalized the time series and ranked the countries by the square difference between the two time series: Media interests and query volume. Smaller differences mean that the two time series are more similar to each other. Here is the result:

Difference between search volume and media volume in different countries

And here are three examples of what these differences mean:

As you can see, in countries where the differences are small (e.g., Indonesia), there is a close correspondence between the two time series meaning that the volume of journalism is similar to that of people’s interests. However, in many countries journalists continue to report on Gaza, but people, by and large, are uninterested (e.g., France). This difference could be due to many reasons including, for example, news fatigue.

Personally, I think it might be related to journalist’s political leaning, which are not representative of society (see, for example, Do all sides deserve equal coverage? U.S. journalists and public differ). If this is true, I don’t think it’s a good thing, because people are getting only some of the story, eroding trust in journalism. It’s certainly true for this war, where reporting on Gaza is severely biased to Hamas’ point of view (e.g., casualty figures drawn from obviously biased sources).

December 7, 2023December 7, 2023

Antisemitism and Islamophobia in the United States

In the USA recently, attempts to address antisemitism are often linked with those to address Islamophobia. Examples include those of the White House, Harvard, and Columbia University, to name but a few. The question is, why do the two appear together, and why these two and not hate against other religious groups?

To be sure, both antisemitism and islamophobia are problems in the United States (and obviously elsewhere, as I’ve written about in the past). But why the sudden need to mention both whenever antisemitism is mentioned (and not, for example, vice versa or other religious groups)? Is Islamophobia such a significant issue compared to antisemitism?

The FBI’s Crime Data Explorer provides an analysis of reported hate crimes by who they were directed at. Here’s a snapshot of the latest data, that of 2022.

FBI hate crime statistics by the group they were perpetrated against, 2022 data.

There were 1124 anti-Jewish (antisemitic) crimes, the second largest category after anti-Black hate crimes. Anti-Islamic (Islamophobic) crimes are in 15^th place with 158 crimes. Pretty bad, but only one seventh of the number of antisemitic crimes. If one adds anti-Catholic, anti-Protestant and other anti-Christian crimes we find that there were 375 such crimes, more than Islamophobic crimes, but POTUS didn’t mention anti-Christian hate in their publication.

Of course, both Islam and Judaism are minority religions in the USA, so perhaps it isn’t fair to compare them to Christianity. According to Wikipedia there are 7.15 million Jews and 3.45 million Muslims in the USA. If we normalize the data per-capita, there are still 3.5 times more antisemitic crimes than Islamophobic crimes reported to the FBI. However, on a per-capita basis, anti-Sikh crimes (currently ranked 14^th in the number of instances) are much worse than both. Note, however, that there are wildly different estimates for the number of people according to religious affiliations: The US Religion Census claims 4.45 million adherent Muslims and 2.07 million adherent Jews. If we use these estimates there are 15 times more crimes against Jews than against Muslims on a per-capita basis!

Unfortunately, the FBI’s data doesn’t cover 2023. Some organizations such as the Anti-Defamation Legue have reported a 5-fold increase in antisemitic crimes (see CNNs coverage, and when you do, look for the graphs worthy of “How to lie with statistics”). So what do Google searches tell us?

Looking at searches for the topic of antisemitism and Islamophobia since August 2023, we see a huge jump in the former, whereas the latter is almost at zero. The jump begins almost immediately after Hamas’ attack on Israel on October 7th.

Searches for the topics “Antisemitism” and “Islamophobia” in the USA for the 6 months starting August 7th, 2023

However, we don’t know how Google groups queries into topics. Therefore, I also looked at queries which begin with “Why are Jews” and “Why are Muslims” which, in the past at least, were associated with hate. Here the rise in Islamophobia is greater than we saw in the previous graph, but it’s still small. The total volume (as measured by the area under the graph) of antisemitic searches is 2.9 times that of Islamophobic crimes. Taking into account per-capita, that is still 1.4 times greater (using Wikipedia’s population estimates) or 6.3 times greater (using the Religion Census estimates).

Perhaps the only optimistic observation from this graph is that the jump was related to Hamas’ October 7^th attack and both Islamophobia and antisemitism are going down fairly quickly.

Searches for the queries beginning “why are Jews” and “why are Muslims” in the USA for the 6 months starting August 7th, 2023

Going back to my original question, why do antisemitism and Islamophobia appear together, and why these two? It seems I don’t have good data to answer the question.

November 26, 2023

Hamas’ attack on Israel through the lens of Google queries

On October 7^th, 2023, Hamas attacked Israel from the Gaza Strip, which it has been governing since 2007. The details of the attack are horrific, and I won’t describe them here. Suffice to say that they are in line with the acts of the Nazi regime and ISIS.

We know from the past that Google search volume tells an interesting story about major world events, so here are a few graphs about that attack, through the lens of Google Search Trends.

First, the attack has caused major trauma to Israelis, as visible in this graph of search volume for anxiety and for depression. The latter serves as a comparator. As the graph shows, searches for anxiety spiked on October 7^th and were high for around a month after then. As somewhat expected, searches for depression were attenuated for the same period, which is perhaps similar to an effect we saw at the start of the COVID19 pandemic.

Normalized Google search volume for anxiety (orange) and depression (blue) in Israel over time. The dotted line marks the date of the attack.

In Israel, interest in news spiked at the start of the war but has since gone down significantly. Similarly, queries about Hamas and the Gaza Strip spiked and then went down, though not to baseline (yet).

Normalized Google search volume for news in Israel over time. The dotted line marks the date of the attack.

Normalized Google search volume for Hamas (orange) and the Gaza Strip (blue) in Israel over time. The dotted line marks the date of the attack.

Worldwide the trend is as expected from the literature, which shows that a half-life of a few days for major news events.

Normalized Google search volume for Hamas in Israel (orange) and the world (blue) over time. The dotted line marks the date of the attack.

What about effects of this war on the world at large? Here’s interest in the phrase “Free Palestine”. Note how it is almost nonexistent until October 9^th but then, just two days after Hamas’ atrocities, it spikes. Somewhat similar to interest in Hamas it’s decaying quickly, but is not yet back to baseline.

Normalized Google search volume for “Free Palestine” in the world over time. The dotted line marks the date of the attack.

Perhaps it’s more about fashion than about substance? Let’s check the USA. Here we see a strong political effect of these queries, i.e., more democrat states are more likely to be interested in Free Palestine. The voting share for Biden-Harris in the 2020 Presidential elections explains almost 70% of the variance. This is interesting because, of course, President Biden is a democrat and he took a strong pro-Israel stance in this war.

Normalized Google search volume for “Free Palestine” per US state as a function of the voting share for Biden in the 2020 elections in that state. DC is excluded (though it falls nicely on the line). The dotted line is a linear regression line with R²=0.69.

As Greta Thunberg noticed, Free Palestine is highly correlated with another political issue – climate change. The correlation explains more than 60% of the variance! The outliers on the top (meaning, more interest in climate change than expected according to the interest in Free Palestine) are mostly states that are likely to be affected by climate change such as Hawaii, Vermont and Alaska. Outliers on the other side are (in my opinion) Democrat states with large universities, but this warrants more careful research.

Normalized Google search volume for “Free Palestine” per US state as a function of normalized search volume for Climate Change. The dotted line is a linear regression line with R²=0.62.

My prediction is that interest in Hamas and this war will soon wane as the world moves to the next crisis. In Israel, expect a spike in pregnancy queries in a few months.

September 7, 2022

Antisemitism: (Almost) everyone has their favorite reason

“Oh, the Protestants hate the Catholics,

And the Catholics hate the Protestants,

And the Hindus hate the Moslems,

And everybody hates the Jews.”

(Tom Lehrer, National Brotherhood Week)

Today, something a little different and not entirely related to health: Antisemitism. It’s not entirely divorced from health either, as the bones of my forefathers, scattered from Spain to Poland will testify, but thankfully these days the physical aspects of antisemitism are on a somewhat less grandiose scale than in previous generations. I used Google Trends to see which Jewish conspiracies were searched in different countries. Unfortunately, it isn’t always easy to capture an entire topic with a single query, so I couldn’t encompass all the hate around this issue, but here is the volume of queries since 2004 for several antisemitic tropes:

Prevalence of queries for common Jewish conspiracy theories

It’s interesting to see that each conspiracy has its own fan base, though some countries (perhaps owing to Internet penetration and population size) are represented in the maps of more than one conspiracy. It’s also notable that there is little correlation between the size of the Jewish population in a country and the volume of antisemitic searches therein: Pakistan, Norway, etc., have tiny Jewish populations (if any) and are not neighbors of Israel, and yet too many people in those countries have a favorite Jewish conspiracy theory.

How prevalent are those “theories”? It’s difficult to say with confidence. Querying for something doesn’t mean that a person believes in it, only that they are interested in the topic. Google Trends data has a few other drawbacks. Nevertheless, anecdotally, Jewish conspiracy theories seem to have volume (worldwide) of the same order of magnitude as common anti-Muslim theories. However, there are around 100 times as many Muslims as there are Jews.

Is there a lesson here? I don’t know. Perhaps it’s just that antisemitism is too common and that it manifests itself in a variety of ways. Perhaps it’s another demonstration that online activity into all aspects of human behavior, even the less savory ones.

April 3, 2022

A warning for internet platforms

China has, once again, instructed Bing to turn off the autosuggest feature of the search engine. The reason given by China’s State Information Office is, to quote from TheRegister article, that “Bad use of algorithms affects the normal communication order, market order and social order, posing challenges to maintaining ideological security, social fairness and justice and the legitimate rights and interests of netizens.”

I don’t know the details of why the Chinese government asked to remove autosuggest, nor whether and why Bing complied, but it seems to me that there is a lesson here for search engine operators and for people interested in algorithmic fairness.

Search engines are perhaps the most widely used internet service. They’ve replaced libraries for many of the information acquisition tasks we perform. When Google started, its stated mission was “to organize the world’s information and make it universally accessible and useful.” This implies that the results it provides reflect the world’s information. Indeed, many writers (e.g., this one in The Atlantic) wrote about the idea that search engines are automatic and reflect the knowledge available in the world. More recently, Google’s CEO said in testimony to the US Congress that “We use a robust methodology to reflect what is being said about any given topic at any particular time. It is in our interest to make sure we reflect what’s happening out there in the best objective manner possible. I can commit to you and I can assure you, we do it without regards to political ideology. Our algorithms do it with no notion of political sentiment.”

Unfortunately, as anyone in this business knows, a lot of manual work goes into an automated search engine. That manual work is done by people who have opinions, as do their managers. These people’s opinion can affect the results, and there is currently quite a lot of evidence that results don’t reflect the world’s knowledge anymore. Instead, they reflect the world as some people would like it to be.

I could provide many examples that seem to have this bias, but let’s take one of my favorites: Consider the search results for the innocuous query “Renaissance Europe art” below. Before you do, think to yourself what art you think should be shown. Botticelli? The Mona Lisa? The Sistine Chapel?

Now click on the spoiler below to see a screenshot of the Bing results for this query.

Click to see the Bing results

Notice the preference for paintings of particular people?

It seems to me that governments have taken notice of the fact that “algorithmic results” are no longer algorithmic (and in fact, they probably never were entirely algorithmic). If results are human-generated, they say, why shouldn’t we, the representative of the people, decide what the results should be? Why should workers at internet platforms who may have specific views of the world get to decide that these are the “right” views?

This is a logical argument, though the devil, as they say, is in the details. If a government takes a heavy hand and decides to censor views it doesn’t like, what happens to how people learn about the world? There are parallels between this problem and that of book banning at public libraries (see, for example, this overview), especially now that search engines have replaced libraries.

It is hard to say if this situation could have been averted and if so, perhaps this Pandora’s box has already opened. But I do wonder if a little more modesty in changing algorithmic results would have prevented the place where we are at today.

I work for Microsoft, which operates Bing. The views in this post (as indeed, the entire blog) are my own and not those of my employer. I do not have any inside information on which queries undergo manual editing.

December 19, 2021December 20, 2021

The cost of data privacy

The other day I was talking to a friend of mine, a senior medical doctor at a research hospital. We were discussing clinical trials and how the recent staff shortages in the US made it difficult to start new clinical trials there. He mentioned off hand that clinical trials have been difficult to do in Europe for a few years now because of GDPR.

The European Union’s General Data Protection Regulation, or GDPR, is a regulation on data protection and privacy. It provides people with rights related to their data including, for example, the right to ask companies for data they collect about an individual (Article 15). GDPR is implemented in countries of the European Union, members of the European Economic Area and other countries which chose to implement it. The latter group includes Andorra, Argentina, Canada (only commercial organizations), Faroe Islands, Guernsey, Israel, Isle of Man, Jersey, New Zealand, Switzerland, Uruguay, Japan, the United Kingdom and South Korea.

The flip side of GDPR is that, for both companies and other organizations, it’s much harder to collect and process data. This may be a good idea when we’re thinking about companies which use these data to sell us more stuff, but it may be that these regulations have a less than beneficial effect for medical science. I wanted to see if there’s evidence for the latter.

In recent years medical researchers have begun registering their clinical trials on the US government’s ClinicalTrials.gov website. This can help patients find relevant trials, improve recruitment, and also reduce the likelihood of cheating (see Ben Goldacre’s wonderful talk on this subject). I took these data and extracted from them the country where each clinical trial is held (some clinical trials are held in multiple countries and I accounted for those) and the date at which it was first registered.

The figure below shows the number of clinical trials registered each month between January 2010 and July 2021. I divided the countries where the trials were held into three groups: the United States, countries where GDPR was implemented, and all other countries of the world.

Number of new clinical trials per month in the United States (top), GDPR-implementing countries (middle) and other countries (bottom). Light colors are before May 2018 and dark ones after it. Dotted lines are linear fits to the curves, with the slopes and fit shown below them.

In the graph the light colors are the number of clinical trials per month prior to the implementation of GDPR and the dark colors are the same numbers after it. I’ve also fit linear regression curves to each of these. As one can see, the number of clinical trials up to May 2018, when GDPR was implemented, rises slowly. Interestingly, it rises more slowly in the US than in the other two groups.

After May 2018 the rise in the number of clinical trials in the US and in countries where GDPR was implemented abruptly stops and flattens. However, in countries which did not implement GDPR (and are not the US), the pace of growth rises dramatically and accounts for the expected growth in both this group of countries and most of what we would have expected in the previous two groups. It seems as though GDPR put a break on clinical trials in countries where it was implemented, as well as in the United States.

Which countries benefited from this move from the US and GDPR-implementing countries? To test this, I computed for each country, the fraction of clinical trials conducted after the implementation of GDPR from all trials in the registry. I only looked for countries which had at least 500 clinical trials in the data. The 5 countries which had the largest fraction of trials post-GDPR are Pakistan, Egypt, Turkey, Indonesia, and China. Unfortunately, these countries are not bastions of human rights. According to Freedom House they are judged either “Not free” or “Partly free”.

Thus, it seems that one of the negative aspects of GDPR was the movement of clinical trials from countries which implemented it to those which did not. Whether this is a price worth paying is a personal judgment. To me, it seems that GDPR must be changed so that studies which improve the lives of people should be able to continue even at minimal cost to data privacy.

The current state of things reminds me of a story, possibly apocryphal, told to me by a lecturer during my graduate studies: A colleague of my lecturer who was a pain researcher from one of the industrialized countries took his sabbatical in Libya. This was, I think, in the late 1980s. My lecturer said that he asked the researcher, “why Libya?”. The reply was “it’s easier to do work there”…

Let’s not have GDPR cause medical research to move to countries which don’t take human rights seriously.

Caveat: I know there may be confounders that appeared at similar times. This isn’t a scientific paper, so take my explanations above with a grain of salt.

The Protocols of the Elders of Zion
Zionist Occupation Government
Holocaust denial
Jewish lobby
Jewish bankers
Jewish Bolshevism