{"id":178,"date":"2024-10-07T09:16:05","date_gmt":"2024-10-07T09:16:05","guid":{"rendered":"https:\/\/yom-tov.info\/blog\/?p=178"},"modified":"2024-10-07T09:16:05","modified_gmt":"2024-10-07T09:16:05","slug":"bias-in-studies-of-facial-recognition-bias","status":"publish","type":"post","link":"https:\/\/yom-tov.info\/blog\/2024\/10\/07\/bias-in-studies-of-facial-recognition-bias\/","title":{"rendered":"Bias in studies of facial recognition bias"},"content":{"rendered":"\n<p>As few years ago there was a lot of publicity (e.g., <a href=\"https:\/\/www.nytimes.com\/2019\/12\/19\/technology\/facial-recognition-bias.html\">NY Times<\/a>) to papers which showed that AI systems, especially commercial ones, were biased against certain minorities. As is often the case in science, one paper that gets lots of publicity is followed up by additional papers that investigate the same or similar phenomena. Now that there are quite a few papers of this sort, it\u2019s time to do a meta-analysis and see if there was substance to these claims.<\/p>\n\n\n\n<p>Meta-analyses are useful tools to collect all the experiments that have been done about a topic and summarize them. Experiments vary in their size, methodology, data, etc. Meta-analyses try to pool them together and return a single message that takes all that\u2019s known into account. As Ben Goldacre showed in his brilliant (and highly recommended) <a href=\"https:\/\/www.ted.com\/talks\/ben_goldacre_battling_bad_science\">talk<\/a>, there\u2019s a simple chart that can be used to conduct this summary. It\u2019s called a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Funnel_plot\">Funnel Plot<\/a>. <\/p>\n\n\n\n<p>In this two-dimensional plot, the horizontal axis represents the effect size reported in the experiment and the vertical axis the size of the population in the experiment. Each dot on the plot is one experiment. Small experiments (close to the horizontal axis) will have lots of noise and the effect size will be either larger or smaller than the true effect size. As one conducts experiments on more and more samples, the noise is reduced, and the results concentrate around the average effect size. The result should look like an isosceles triangle, where the top of the triangle is the true effect size.<\/p>\n\n\n\n<p>What happens if there is publication bias? In that case, the triangle disappears: For example, it\u2019s been shown that with medical drug trials, some trials go unreported (think about those done by the pharmaceutical company or it&#8217;s collaborators). It\u2019s hard to hide the results of large trials, so the trials that go missing are small trials that showed a negative (or no) effect. In that case, the funnel plot will be missing some of the points on the left-hand side. In extreme cases the triangle will now be right-angled triangle.<\/p>\n\n\n\n<p>I decided to do a small meta-analysis of studies that looked at bias in face recognition systems. These are systems that infer the gender of an individual from a photo, usually of their face. The claim was that these systems tend to make more mistakes when the image presented to them is of a female or of someone with darker skin color. To conduct the metal-analysis I used <a href=\"https:\/\/scholar.google.com\/\">Google Scholar<\/a> and <a href=\"https:\/\/elicit.org\/\">Elicit <\/a>to look for experiments that quantified bias in face recognition systems.  I found 6 such papers, with a total of 98 experiments. <\/p>\n\n\n\n<p>I quantified the effect size (the horizontal axis) as the logarithm of the ratio between the error of the presumably disadvantaged class (females or dark-skinned individuals) to that of the presumably not disadvantaged class (males or light-skinned individuals). The experiment size (the vertical axis) is the logarithm of the number of cases in each experiment.<\/p>\n\n\n\n<p>You can see the result in the graphs below. They differ in their inclusion or exclusion of the point on the right-hand side which showed a very large effect. Regardless of the figure you chose, two things can be seen: One is that there are indeed missing experiments on the left hand-side, meaning that there\u2019s publication bias in studies of bias\u2026 The second is that the effect size is around 26% (or 38% in the second graph). That is relative effect size, meaning that, for example, if the error rates for detection of males is 1%, the error for females is 1.26%. Significant, but quite small.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"875\" height=\"656\" src=\"https:\/\/yom-tov.info\/blog\/wp-content\/uploads\/2024\/10\/bias_010.jpg\" alt=\"\" class=\"wp-image-179\" srcset=\"https:\/\/yom-tov.info\/blog\/wp-content\/uploads\/2024\/10\/bias_010.jpg 875w, https:\/\/yom-tov.info\/blog\/wp-content\/uploads\/2024\/10\/bias_010-300x225.jpg 300w, https:\/\/yom-tov.info\/blog\/wp-content\/uploads\/2024\/10\/bias_010-768x576.jpg 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><figcaption class=\"wp-element-caption\">Funnel plot for all experiments. Blue dots are experiments where bias of females to males is quantified and orange dots are those where bas of dark skin to light skin is quantified. Effects larger than zero means that the errors of the disadvantaged class (e.g., females) is greater than that of the non-disadvantaged class (e.g., males)<\/figcaption><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"875\" height=\"656\" src=\"https:\/\/yom-tov.info\/blog\/wp-content\/uploads\/2024\/10\/bias_014.jpg\" alt=\"\" class=\"wp-image-180\" srcset=\"https:\/\/yom-tov.info\/blog\/wp-content\/uploads\/2024\/10\/bias_014.jpg 875w, https:\/\/yom-tov.info\/blog\/wp-content\/uploads\/2024\/10\/bias_014-300x225.jpg 300w, https:\/\/yom-tov.info\/blog\/wp-content\/uploads\/2024\/10\/bias_014-768x576.jpg 768w\" sizes=\"auto, (max-width: 767px) 89vw, (max-width: 1000px) 54vw, (max-width: 1071px) 543px, 580px\" \/><figcaption class=\"wp-element-caption\">Funnel plot excluding the rightmost point. All other details are as in the previous graph.<\/figcaption><\/figure>\n\n\n\n<p>However, if my findings are correct, why are these studies missing? It\u2019s hard to know for certain. It could be that there\u2019s not enough work in this field. My guess is that people who found negative effects chose not to publish them or were unable to do so because publication venues prefer to show a specific message (<a href=\"https:\/\/math.uchicago.edu\/~shmuel\/Modeling\/Flawed%20Social%20Network%20Analysis.pdf\">here&#8217;s an interesting case in an unrelated field<\/a>).<\/p>\n\n\n\n<p>This is a very small study, based on only 6 studies (and 98 experiments), my decision on the \u201cdisadvantaged class\u201d is arbitrary (though made in the papers), and there are very few datasets analyzed in the papers. So take this with a grain of salt (do we need meta-analyses of meta-analyses?).<\/p>\n\n\n\n<p>However, assuming that there are missing experiments, my guess is that it&#8217;s by design. This is based on my anecdotal experience: Over the years I\u2019ve worked with people from many scientific disciplines. There\u2019s one scientific discipline where my experience has been a little strange: Ethical AI &#8212; the same field that&#8217;s interested in bias in face recognition systems. While I\u2019ve worked with several excellent scientists in that field, I also had three projects where ethical AI researchers came to me for help in looking into specific questions in their field, using my methods and datasets. I analyzed the data and tried to do my best to see if they supported the hypothesis. When it turned out they didn\u2019t, two of the researchers disappeared. They simply stopped responding. One told me outright that she wouldn\u2019t publish data that goes against her beliefs.<\/p>\n\n\n\n<p>What\u2019s the takeaway from all this? Perhaps only the trivial one: When you hear a strong scientific claim, perhaps it\u2019s better to wait a while until more evidence is collected before making one\u2019s mind about that claim.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>As few years ago there was a lot of publicity (e.g., NY Times) to papers which showed that AI systems, especially commercial ones, were biased against certain minorities. As is often the case in science, one paper that gets lots of publicity is followed up by additional papers that investigate the same or similar phenomena. &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/yom-tov.info\/blog\/2024\/10\/07\/bias-in-studies-of-facial-recognition-bias\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Bias in studies of facial recognition bias&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-178","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/yom-tov.info\/blog\/wp-json\/wp\/v2\/posts\/178","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/yom-tov.info\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/yom-tov.info\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/yom-tov.info\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/yom-tov.info\/blog\/wp-json\/wp\/v2\/comments?post=178"}],"version-history":[{"count":0,"href":"https:\/\/yom-tov.info\/blog\/wp-json\/wp\/v2\/posts\/178\/revisions"}],"wp:attachment":[{"href":"https:\/\/yom-tov.info\/blog\/wp-json\/wp\/v2\/media?parent=178"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/yom-tov.info\/blog\/wp-json\/wp\/v2\/categories?post=178"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/yom-tov.info\/blog\/wp-json\/wp\/v2\/tags?post=178"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}