If it helps, I think a simplified version of what they're doing would be this....
A video of yours 5 minutes long has been watched by 20 people. It's compared with other videos of 5 minutes in length.
In the first 30 seconds, 7 of your viewers either click away or move forward to another point in the video. The expected number given the size of your audience and the averages from other videos is 4. It's therefore deemed to be a below average audience retention.
Further into the video at the 1.30 mark, 12 viewers are watching. Given the averages, YouTube would have predicted that only 8 viewers would be watching at this point. Your audience retention is therefore above average.
The same principle applies for this graph throughout your video to decide where the points are plotted.
Sorry if that's too simple an explanation, I'm still trying to figure it out myself and I've found the best way to wrap my head around it is to write it out like a Maths question. I've made up the figures in the example above.
It seems like it would be a fun thing to experiment with if you think you know what needs to be changed for the results to improve. I must try this sometime!