Measuring weight truthfully using social media: An exercise in Data Science

In 2012, Dan Pelleg, Yoelle Maarek and I looked at where people are truthful (and when they are not) on social media (See “Would you believe an anonymous contributor?”). We found that when people have an information need, they need to reveal their true intentions, otherwise they would receive wrong information. The opposite is when people want to obfuscate the truth. My favorite example is the distribution of the weight of males on OK Cupid and in the real world.

Now, suppose we want to measure the weight of the population over time using social media. Where would we get the data? One place to look is in forums where people state their weight. For example, diet forums. However, this is a biased sub-population because those people who want to lose weight are probably not representative of the population.

Where could we get weights of a less-biased population? My proposal is to look at bust volumes of women. These data are (often) truthful if there is a real information need. Moreover, they are easy to parse, containing a two-digit number and a letter or two (OK. Sometimes a few more), e.g., 32B.

Therefore, I extracted 155,602 posts (made by at least 74,279 authors) from Reddit, made between 2015 and 2024, that seemed to contain a bra size. For breast volume I used the data in a 2023 preprint, “Optimized table describing the relationship between breast volume and breast size used for implant size selecting in surgery”.

First, a few sanity checks: Let’s compare the average breast volume of women who posted in forums for breast reduction (Reduction, PlasticSurgery) to other women. The average breast volume of the first is 1137ml, compared to 962ml in others. That means that women who are writing about breast reduction have a breast volume larger, on average, by 177ml (per breast) compared to other women.

Women who complain of having small breasts (in the aptly named smallboobproblems subreddit) have an average breast volume of 357ml, compared to 1246ml in those complaining of bigboobproblems. Almost 3.5 times difference!

Transgenders from female to male have an average breast volume of 711ml, while those from male to female, 574ml (compared to the average of 971ml).

The largest number of posts were made in subreddits where users probably have a real information need, including ABraThatFits, bigboobproblems, Reduction, and braswap. However, there were several less-popular subreddits where users were probably not so truthful, for example, dirtypenpals, SluttyConfessions, and sexstories. How do bra sizes differ between the two?

The average breast volume in the latter subreddits is lower, at 806ml, than that in other subreddits (967ml). However, if we break this down by subreddits where pictures are posted versus those which are text-based, something interesting happens: The average volume in image based subreddits is 959ml, but in the text-based ones it’s 822ml. It looks like the audience of the images likes larger busts, but those reading stories prefer smaller ones. Perhaps it’s a male/female distinction?

Going back to my original question, do bust sizes correlate with known weight? I couldn’t find good data on female weight, but here’s a proxy: The percent of obese people over the years. This does indeed correlate with the average reported breast volume, with an R2 of 0.43. Therefore, it looks like there is reasonable correlation between breast volume and weight, and that social media data can reveal this information to us.