Judging is Accurate but the Scoring System is Not

Wayne's World

Considering the overwhelming response to my article on the half-point, I think this is an issue with which many people in dressage identify and would like to see implemented. National and international riders, trainers, judges, a few Official judges (“O” judges)

and marketing communications experts have written to me in support of the idea.

Since the last article, I have had the expert input from other fields who have spent considerable time and effort in calculating the theoretical accuracy of integer scoring versus half points and analysing the actual results of over 1,800 Grand Prix tests. I will incorporate their conclusions in the discussion below. But first, I think it is fair to give an airing to the very few but influential dissenters.

Dissenters

Several O judges had concerns and the more substantive were:

that it would just encourage the less experienced judges to give a “safe 6.5” rather than to take the “risk” and go for a 7.
Even worse, it would stop the awarding of 9s and 4s.
To improve accuracy we should try adding 1 or 2 more judges around the area.
The better judges use a “big 6” or a “small 7” anyway. This is a sort of carry forward system whereby a judge will give a 7 for a “small 7” and on one of the next movements give a 6 if it was also “small 7”. The same idea for a “big 6”.
In Germany they use decimal point scoring for the freestyle and this results in lower marks.

My response

In my conversations with several O judges, they seem to fall into two camps, those that are not mathematical and feel that it would stop the higher points being awarded and those that are mathematical and see that this would bring greater accuracy to the sport.

Psychology

First, it is worth looking at the psychology of riders and judges – we have different motivations:

Senior judges want other judges to give riders a clear statement of what they believe the mark should be and be brave enough to say “it is a choice between a 7 or an 8 and it really is more an 8 so I will give it!” I can understand this and I believe they are right as a judge training other judges.
Riders want judges to give a clear statement too, but we are not so concerned about then being brave and reaching for the higher mark. We just want to know what we have earned. If they could say it was a 6.7 (and if normally our mark was a 6.5) then we would be delighted, because we would know that we had made progress. It is too far from a 6 to a 7 and most judges, when in doubt, give the lower mark.

Detailed response

To look at these points in more detail:

The safe mark would become a 6.5. If the movement is a 6.5 then give a 6.5. If it is a 7, then give a 7. The idea is to be accurate, not generous or mean. It is a lot easier to take the “risk” and go from 6.5 to 7 than from 6 to 7.

But taking the point at face value, do less experienced judges take the risk and round up to a 7? I don’t really believe it – it is a lot easier and safer for a judge to give a 6 rather than a 7 when they are in doubt.
It would stop the awarding of 9s and 4s. This is really the same point as above – it is about accuracy not risk, this is a sport not a casino. If it is between an 8 and a 9 then give an 8.5. If it is nearer the 9 give a 9. A 9 is closer to an 8.5 than an 8 and a lot less risky decision to take, if they believe they have to take risks.
More judges around the arena will make the average more accurate. Well, actually it won’t. As I will explain below, the basic accuracy of our system is just under 1%. As I showed last time, when you get your score of 64%, it could have been anywhere between 69% or 59%. And on average it was probably somewhere between 63% or 65%. Having more judges using an inaccurate system does not make it more accurate.

What it will do, is reduce the influence of one rogue judge who randomly gives too high and too low marks distorting the placings. If the judge were consistent in awarding higher marks for all competitors then they would not have a positive or negative influence – it is when they are high for some competitors and low for others that they distort the test results.

Actually, our senior judges are very good and, within the accuracy afforded by the system, most of the time get it spot on. Rogue judges are not really our problem so why to fix this when we have a real problem in system accuracy?
The Big 6 or Little 7 concept. Many of the more experienced judges I spoke to felt that they already had a “big 6” or a “small 7” and they would then adjust up or down on the next movement to compensate. This requires more explanation: when a judge sees something that is not quite a 7 but is not a 6 they give a 7 but they remember that it is a “small 7”. On the next movement that is a small 7 they then give a 6. This make the average 6.5 for each movement. So what they are doing is mentally awarding half-points!

This sort of works some of the time for the more experienced judges and I see this a lot but it has downsides:
1. It reduces transparency – when I see a rider get an 8 for one pirouette and a 7 for the next and they both look about the same to me, I wonder what the judges are doing. Even though I know they are probably doing the Big 6 Little 7 thing it does not help transparency in the sport. To the amateur rider or even layman, who we are trying to attract to the sport we are really not helping with this trick.
2. They should mark what they see – there is no reason why we should be restricted to integer scoring and the judges would find it a lot easier to give half points. I really don’t see the downside
3. Real life means that it is not always possible to carry the half point forward. Something happens in the test and there is a couple of problems and the judge forgets that he or she “owes” a 7 for the next big 6. I am told by several judges that this happens quite often in the many tests.
4. Can every judge do this? In reality, no. Most inexperienced judges are so worried about getting it wrong or working out what the mark should be that they don’t really have time to think about carrying anything forward. At the top of the sport, the national and international judges can do this. That is why they are at the top of the sport. I believe it behoves them to give the tools to their less experienced colleagues that they take for granted.
Decimal scoring results in lower marks for the freestyle – the implication is that half-point scoring will result in lower scores. This point is also saying that we prefer to be inaccurate because it enables us to give higher scores. As a competitor, I want accuracy. I don’t want the gift of higher scores, I want to earn my points.

Anyway, I can see how this happens in the kur. This is because the artistic mark is only 5 “movements” and accounts for 50% of the total mark so each movement has a very large coefficient. With decimal scoring the result would be accurate – whatever you get is what you really should have had. With half point scoring, each movement is rounded to the nearest 0.5. And each 0.5 accounts for at least 0.5% of the total test. So if the judges round up on the artistic score then it will add 2.5% to the final result compared with decimal scoring.

Greater analytical analysis

Mathematical analysis

It is clear that such a change needs this kind of debate and investigation, following this the idea will be tried or rejected. To further support this proposal I asked a mathematician at Imperial College, London, to give his view on our current system. His view was extremely interesting.

He analysed our current scoring system and pointed out a possible perverse result of integer scoring. Take two horses, Horse A is a 6.4 mover for all 36 moves and Horse B is a 5.7 mover for 34 moves and a 6.6 mover for the other two.

Clearly, horse A is the better horse of the two and should score more highly. However, when you work out their score for a Grand Prix test, the result is amazing:

Horse A scores 6.4 at all 36 moves and hence is awarded a mark of 6 for every move and gets an average of 6.0 or 60%. Horse B scores 5.7 at 34 moves and 6.6 at the two remaining moves which happen to have a coefficient of two. The judges would award 34 times the mark 6 and twice the mark 7. The average is 6.083 which amount to a total mark of 60.83%
So Horse B would win by 0.83%
If the precision of the individual scores would be decimal, then the correct scores would be Horse A 64% and Horse B 57.75%. This is the problem with integer scoring!

FEI Dressage Committee

So, to summarise:

Several mathematicians, including theoreticians, statisticians and experimentalists have shown that we can easily improve the accuracy of our system by using half points
Implicitly, experienced judges do it anyway
It will improve transparency and this is very important
It has a wide base of support from riders, trainers, judges and communications experts

There is now considerable information available that I believe warrants further investigation by the FEI Dressage Committee and/or the International Dressage Judges' Club.

For fun, please vote For or Against and we will publish the results!

The Half-Point System

For or Against the Half-Point System?

For
Against

view results

Please let me know your view on this subject at wayne@eurodressage.com.