We often evaluate the success of medical treatments or social programs by how much of the population they help, but this can be a problem.
我们经常根据医疗或社会项目帮助了多少人来评估成功,但是这可能有问题。
Like, suppose we're treating a disease that afflicts both people and cats, and among 1 cat and 4 people we treat, the cat and 1 person recover and 3 people die.
比如,假设我们正在治疗一种同时折磨人和猫的疾病,在我们治疗的1只猫和4个人中,1只猫和1个人康复了,3个人死了。
And of 4 cats and 1 person we don't treat, three of the cats recover while the person and 1 cat die.
在我们没有治疗的4只猫和1个人中,有3只猫康复了,而1个人和1只猫死了。
In the real world, these numbers might be more like 300 and 100, or whatever, but we'll keep them small so they're easier to keep track of.
在现实世界中,这些数字可能更像是300和100,或者其他任何数字,但我们会将它们保持较小,以便更容易理解。
So, in our sample, 100% of treated cats survive while only 75% of untreated cats do, and 25% of treated humans survive while 0% of untreated humans do.
因此,在我们的样本中,接受治疗的猫100%存活,而未经治疗的猫只有75%存活,25%的接受治疗的人存活,而未经治疗的人存活的比例为0%。
Which makes it seem like the treatment improves chances of recovery.
这使得这种治疗看起来提高了康复的机会。
Except that if we aggregate the data, among all people and cats treated, only 40% survive, while among all people and cats left on their own, 60% recover.
不过,如果我们汇总数据,在所有接受治疗的人和猫中,只有40%的人活了下来,而在所有的人和猫中,60%的人和猫都康复了。
Which makes it seem like the treatment reduces chances of recovery.
这使得治疗看起来降低了康复的机会。
So which is it?
那么到底是哪一个呢?
This is an illustration of Simpson's paradox , a statistical paradox where it's possible to draw two opposite conclusions from the same data depending on how you divide things up, and statistics alone cannot help us solve it
这是辛普森悖论的一个例证,这是一个统计学悖论,在这个悖论中,根据你如何划分东西,可以从同一数据中得出两个相反的结论,而单靠统计数据并不能帮助我们解决这个问题
–we have to go outside statistics and understand the causality involved in the situation at hand.
-我们必须走出统计,了解手头情况所涉及的因果关系。
For example, if we know that humans get the disease more seriously and are therefore more likely to be prescribed treatment,
例如,如果我们知道人类患上这种疾病更严重,因此更有可能接受处方药治疗,
then it can make sense that fewer individuals that get treated survive, even if the treatment increases the chances of recovery, since the individuals that got treated were more likely to die in the first place.
那么即使治疗增加了康复的机会,接受治疗的存活下来的人数也会更少,因为接受治疗的人首先更有可能死亡。
On the other hand, if we know that humans, regardless of how sick they are, are more likely to get treated than cats
另一方面,如果我们知道人类,无论他们病得有多重,都比猫更有可能得到治疗,
because no one wants to pay for kitty healthcare, then the fact that 4 out of 5 humans died while only 1 in 5 cats died suggests that, indeed, the treatment may be a bad choice.
因为没有人愿意为小猫的医疗保健买单,那么五分之四的人类死亡,而只有五分之一的猫死亡,这一事实表明,确实,治疗可能是一个糟糕的选择。
So if you're doing a controlled experiment, you need to make sure to not let anything causally related to the experiment influence how you apply your treatments,
因此,如果你在做对照实验,你需要确保不让任何与实验有因果关系的东西影响你应用治疗的方式,
and if you have an uncontrolled experiment, you have to be able to take those outside biases into account.
如果你有一个不受控制的实验,你必须能够考虑到那些外部的偏见。
As a more tangible example, Wisconsin has repeatedly had higher overall 8th grade standardized test scores than Texas, so you might think Wisconsin is doing a better job.
作为一个更具体的例子,威斯康星州的8年级标准化考试总成绩多次高于德克萨斯州的成绩,所以你可能会认为威斯康星州的教学工作比德克萨斯州更好。
However, when broken down by race –which, via entrenched socioeconomic differences is a major factor in standardized-test scores
然而,当按种族划分时-根深蒂固的社会经济差异是影响标准化考试成绩的一个主要因素
– Texas students performed better than Wisconsin students on all fronts: black Texas students scored higher than black Wisconsin students, and likewise with hispanic and white students.
-德克萨斯州的学生在所有方面都比威斯康星州的学生表现得更好:德克萨斯州的黑人学生的得分高于威斯康星州的黑人学生,西班牙裔和白人学生的得分也是如此。
The difference in the overall ranking is because Wisconsin has proportionally far fewer black and hispanic students and proportionally more white students than Texas
总体排名的不同之处在于,威斯康星州的黑人和西班牙裔学生的比例远低于德克萨斯州,白人学生的比例高于德克萨斯州
– so the takeaway should not be that Wisconsin has better education than Texas!
-因此,结论不应该是威斯康星州的教育好于德克萨斯州!
Just that it has (proportionally) more socioeconomically advantaged people.
只是它(按比例)拥有更多的社会经济优势人群。
In some situations there's also a nice graphical way to picture Simpson's paradox: as two separate trends that each go one way, but the overall trend between the populations goes the other way.
在某些情况下,也有一种很好的图形方式来描绘辛普森悖论:作为两个独立的趋势,每个趋势都朝一个方向发展,但人口之间的总体趋势却是另一个方向。
Like, maybe more money makes people sadder, and more money makes cats sadder, but if cats are both much happier and richer than people to start with, the overall trend appears, incorrectly, to be that more money makes you happier.
比如,也许更多的钱让人更悲伤,更多的钱让猫更悲伤,但如果猫从一开始就比人类更快乐、更富有,那么总体趋势似乎是,更多的钱让你更快乐,这是错误的。
In this case, being a cat makes you happier, but also has a correlated with having more money.
在这种情况下,成为一只猫会让你更快乐,但也与拥有更多的钱有关。
And you can also misinterpret this graph to show that, overall, more money makes you a cat,
当然,你也可以误解这张图,认为总的来说,钱越多,你就会变成一只猫。
which I think helps illustrate very well the ability to lie or reach incorrect conclusions by blindly using statistics without context!
我认为这很好地说明了人们撒谎或盲目使用没有背景的统计数据而得出错误结论的能力!
Of course, this is not to say that statistics are always going to be paradoxical or confusing
当然,这并不是说统计数据总是自相矛盾或令人困惑-
– it's quite possible that everything will just make sense from the get-go, like if people and cats both get sadder when you give them more money, and cats are both poorer and happier than people,
很有可能一切从一开始就有意义,比如,如果给人和猫更多的钱,它们都会变得更悲伤,而猫都比人更贫穷和更快乐,
then the overall trend is no longer paradoxical: more money = more sadness.
那么总体趋势就不再是矛盾的:更多的钱=更多的悲伤。
But it's important to be aware that paradoxes like Simpson's paradox are possible, and we often need more context to understand what a statistic actually means.
但重要的是要意识到,像辛普森悖论这样的悖论是可能发生的,我们通常需要更多的背景来理解统计数据的实际意义。