**I pick up on the first two things that jumped out at me in this paper. It’s just some very scribbly notes. No I don’t care that it’s not a formal scientific rebuttal – the flaws are so blindingly obvious that even someone with no real education in stats like me can point them out.**

Overall sample is: 30,435 – 11,763 yes crime – 18,672 no crime

Mean composite score was 4.76, SD 2.18, and they say it was roughly normal. This is plotted roughly below:

- Let’s ignore that more education (esp. in elementary maths) will CAUSE higher IQ (‘practice effect’), and that children of lower socioeconomic status will strongly tend to be less educated AND will be far, far more likely to commit a crime due to exactly those socioeconomic factors. I digress.

**IQ Test-Retest is poor**

Apparently IQ has a test-retest correlation of about 0.8. That looks something like this: (let’s say test one is the horizontal and test two is the vertical axis) (I’m doing these with a quite small monte carlo hence why they’re not very smooth – doesn’t detract from the point)

Expanding this to the scoring that they use in this paper – If you take their composite test and get the average, 4.76, the first time, what will you get the second time, assuming test-retest of 0.8 like IQ? Plot is below:

So if on the first test (horizontal) you got a score of 4.76, on your second test (vertical) your score will vary by the roughly normal distribution with mean 4.76 and SD roughly ~0.65. This is plotted below:

**In blue** is the population’s IQ distribution (mean 4.76, SD 2.18). **In green** is a single person’s IQ distribution assuming test-retest correlation of 0.8 (mean 4.76, SD ~0.65).

So you can see that even something as simple as the test-retest correlation being as low as 0.8 has a dramatic effect. The amount of variation of an **individual’s** IQ score is huuuuuge relative to the effects described in the paper.

**Moving on… now for their shitty line plots.**

Overall sample is: 30,435 – 11,763 yes crime – 18,672 no crime.

What they don’t show you is the **noise**.

They say that just over 5000 people were rated a 5 on the composite score.

Then they plot lines of the means like this below, without ever showing you the underlying data. At no point do they fully describe the distribution, only the mean.

This is just like the one Taleb points out in his original Medium article:

This resulting correlation is **far, far more noise than signal.**

We know that ~50% of 1s had ‘any criminal behaviour’ and that ~35% of 5s did. There were 5000 5s, so 1750 committed crimes.

But that’s all we get given, so all we know about the distribution i.e. the error bars on the mean of ~0.6 crimes per person for 5s is that it has some sort of distribution like this:

But we have no idea which of those curves is closest to the real data. It might be that the std dev for 5s is really low. Great, in that case I would have a little more faith in this data. But

AND THEN, remember that we have the test-retest, so an **individual’s **total score looks like the red distribution below:

(No I can’t draw!)

So an individual will be some (BROAD, SD ~0.65) cut across the test score range, giving us some sort of noisy prediction of number of crimes they’ll committ (but we don’t know how noisy, they don’t tell us).

So the data/plots presented to us is highly lacking. Without being able to see the actual data, we have no idea how much noise there is (like the Taleb plot above).

Meanwhile, the low test-retest means that any **individual **will end up all over the shop.

The paper cherry picks like crazy and ignores things like test-retest that would add massive uncertainty to the means they present.

Calling it a day for now.