For those who would benefit from captioning on this video, you can go directly to the What Sorts link in the previous comment (generated automatically from pingback, it seems), or do to PLMS Youtube at the link below:

]]>In testing their main hypothesis, the authors report F(1,36)=4.11, p less than 0.05. Time to launch SPSS. Nope. If F(1,36)=4.11, then p exceeds 0.05.

]]>I’ll restrict myself to the main finding that among blacks found guilty of killing whites, death penalty application was associated with stereotypical blackness ratings for defendant faces. Based on a high-low median split, 22 defendants were rated as stereotypically black, and the other 22 defendants were rated as not stereotypically black. For each defendant, there were also measurements on aggravating factors. Eberhardt and her coauthors claimed that death penalty rates significantly differed for the two groups defined by stereotypical blackness, even after control for aggravating factors.

1. In testing their main hypothesis, the authors report F(1,36)=4.11, p0.05.

2. The only valid explanation for this problem is that the F-statistic was rounded down to 4.11. In complicated studies where significance is just barely achieved, it can be fruitful to look for signs of data dredging. Here is one.

It is highly suspicious that the authors don’t report any unadjusted association between stereotypical blackness and the death penalty. That is, whether anything was significant before screwing with aggravating factor covariates. What were the raw death penalty rates in the two groups: the stereotypically black group, and the not stereotypically black group? I’d like to know, and would also be willing to bet the difference in rates wasn’t significant. Otherwise, my guess is that unadjusted rates would be somewhere in the paper.

3. The presentation of results must have been learned at Stanford from the Claude Steele course on ANCOVA and misleading bar plots. See Paul Sackett’s critique if you don’t get the joke.

The caption for Eberhardt’s Figure 2 reads “Percentage of death sentences imposed in (a) cases involving White victims and (b) cases involving Black victims as a function of the perceived stereotypicality of Black defendants’ appearance. Plotted in (a) are results given earlier in the paper, that “In fact, 24.4% of those Black defendants who fell in the lower half of the stereotypicality distribution received a death sentence, whereas 57.5% of those Black defendants who fell in the upper half received a death sentence.”

Nowhere is it made clear that these are the adjusted numbers from the ANCOVA, not the raw unadjusted group rates I noted earlier were conspicuously absent. Even the stereotype threat researchers wouldn’t have the balls to publish such a plot, and the ambiguity has the potential to be very misleading. If “Looking Deathworthy” had made a larger media splash, many would have undoubtedly made the wrong interpretation.

If there was so much ambiguity, why am I so sure the reported numbers weren’t the raw rates? Because if you divide a whole number like the number of death sentences by a group size of 22, then even after rounding you couldn’t get 0.244 or 0.575.

4. Even more problematic, I can’t even figure out how the ANCOVA was implemented. The authors described their method as follows, where observational units in the regression are defendants:

“We computed an analysis of covariance (ANCOVA) using stereotypicality (low-high median split) as the independent variable, the percentage of death sentences imposed as the dependent variable, and six nonracial factors known to influence sentencing as covariates.”

What in the world is the “percentage of death sentences imposed” for a particular defendant? A defendant is either given the death penalty or not given the death penalty, meaning the percentage is either 100 or 0. The more traditional covariance adjustment method for such dichotomous response data would involve logistic regression. I’d be willing to make a second bet: that logistic regression would fail to show a significant effect for stereotypicality.

5. No matter how the adjustment actually went down, aren’t results after covariance adjustment more evidence of racism and Eberhardt’s point than the unadjusted results, because skeptics like me might excuse the latter as being due to confounding. Yes and no.

Unadjusted findings set a threshold for credibility: if the effect doesn’t show up unadjusted, it’s usually time to stop looking. After an unadjusted effect has been found, cross-tabulation and regression can be valuable for probing whether confounding can explain it away. However, a lot of assumptions are needed if regression adjustments are to be taken very seriously and not simply exploratory, including linearity, constant coefficients, and exogenous errors.

Furthermore, it is much easier with regression than with raw rates to get results you want by messing with different variables or nontraditional and inappropriate models, making findings harder to trust, especially when other parts of the presentation are so deceptive.

]]>