On September 6th this year the Centers for Disease Control and Prevention (CDC) put out an Investigation Notice concerning a (suspected) outbreak of lung illness associated with using E-cigarette products. According to this notice, CDC is reviewing reports of a severe pulmonary disease associated with E-cigarette products, Following reports from 33 US states.
People who are suspected to have this disease report the following symptoms:
- cough, shortness of breath, or chest pain
- nausea, vomiting, or diarrhea
- fatigue, fever, or weight loss
Brief summary if you want to decide whether or not to read further: Bing data seems to show that these symptoms appear in people who are likely using E-cigarettes, and offers a few additional likely symptoms.
I suspect it took a while to realize the possible adverse reactions associated with E-cigarettes because nobody thought of asking people who turned up at the doctors’ if they were using E-cigarettes. Additionally, the CDC reports that it can take weeks and sometimes longer for symptoms to develop.
Late-appearing symptoms and ones that might not immediately seem obvious to a doctor are exactly the kinds of symptoms that people’s search engine queries are good at detecting. Thus, I turned to search data to see what it might show.
I extracted 9 months (October 2018 – June 2019) of Bing search data. I chose this period of time because it was well before information of the new pulmonary disease were widely reported in the media. These data include searches by people in the United States. Each record comprises of the text of the search, it’s time and date, and an anonymous user identifier.
To analyze the data I followed the methodology Evgeniy Gabrilovich and I developed for our paper on pharmacovigilance, which showed that it was possible to discover new side effects of drugs from search data. Specifically, I filtered the data to focus on those users who mentioned E-cigarette products. My list comprised of general terms related to electronic cigarettes and vaporizers, as well as the brand names of popular E-cigarettes. Although not everyone who mentioned an E-cigarette in their queries uses them, our experience with other product suggests that many who mentioned them are users. Approximately half a million users mentioned these products in their queries during the data period.
I then found all mentions of one of 195 medical symptoms that these users made before or after the first time they queried for an E-cigarette product. As a control population I found all the users who mentioned symptoms in their queries but did not mention an E-cigarette product. For those users I picked a random reference date between their first and last query in our data. I also removed topical queries (which spiked for a few days and then disappeared) and popular queries that were obviously unrelated to medical symptoms. These include, for example, queries mentioning celebrities and their medical issues.
I then scored each symptom using QLRS statistics (see our 2013 paper). Briefly stated, a symptom will receive a high score if we saw a significant rise in the likelihood that it will be queried in the population that also queried for E-cigarettes after their first mention of the product, compared to the control population.
The symptoms that received the highest scores are shown in the table below. Notice that among the top 10 symptoms at least 4 are also mentioned in the CDC report. The Top 3 are all known symptoms.
Symptom | Mentioned in CDC report |
Pain | Y |
Cough | Y |
Weight loss | Y |
Depression | |
Anxiety | |
Perspiration | |
Headache | |
Fever | Y |
Rash | |
Itch |
Incidentally, some of the other symptoms reported by CDC are ranked high, though not in the top 10. For example, diarrhea is ranked 12th.
The temporal profile of symptom mentions seems to support the CDC report. In the figure below I plotted the likelihood that a person in the E-cigarette population would ask about cough over time, normalized by the likelihood of asking about cough in the control population. According to this figure, in the first few days after the first mention of an E-cigarette product, cough is slightly less likely and than in the general population. However, within a few weeks, cough becomes more prominent, to the point that it’s about 20% more likely than in the control population.
Given these findings I would suggest that search query data shows the traces of this mysterious new pulmonary disease, recently reported by CDC.
These results also suggest the people that researchers should investigate the possibility of additional adverse effects of E-cigarette use including depression, anxiety, and perspiration.