Bias in NFL Referees

While football fans have always been quick to see errors by the officiating crews, recently there has been a much more serious tone. Fans in Denver are upset about Bill Vinovich’s officiating in Bronco games, and Eagles fans are incensed with Pete Morelli. Eagles fans point to the horrible penalty disparity, while Broncos fans point to an anomalous win/loss ratio. The fan bases feel betrayed by the officials.

The concerns have generated some responses from higher up. Mike Florio addressed conspiracy theories by superciliously dismissing them, the NFLRA issued a statement to refute them, and Nick wrote an excellent post summarizing the main arguments. Unfortunately the arguments tend to be either appeals to authority or arguments from incredulity, both of which are the same as the uncritical denunciation of the officials. As a scientist, I wanted to get past the emotional and look at the actual data to see if there were discernable trends that would point to some sort of bias.

The reason we can’t just take Florio’s or the NFLRA’s word on there being no grand conspiracy is that a conspiracy by nature is secret. If the conspirators are even moderately good, Florio would have no clue it was going on, any more than Congress had a clue that Oliver North was trading guns or that Zuckerberg knew about the Russian ads. And of course, the NFLRA is part of the conspiracy. Adding fuel to the conspiracy theory is the shroud secrecy that the referees pull around themselves. Coaches and players are not allowed to criticize officials, and the internal evaluation is not transparent. The main reason for the secrecy is to maintain the aura of infallibility that a sports official must have in order to maintain their authority on the field, but that secrecy makes it easier for angry fans to focus their wrath on a questionable call.

A few years ago when the rumors about Vinovich dressing in ~~drag~~ Chargers apparel first appeared, I took the opportunity to look for apparent bias. With limited data, I found that there was a statistically significant difference between the Broncos’ win-loss records when Vinovich was refereeing versus other referees. This time I decided to cast a wider net at the league-wide conspiracy and look at all teams and a bigger selection of referees.

I extracted the win-loss records for all teams from Pro Football Reference. I also discovered PFR has a list of every official for every game. Using this list, I was able to derive the win-loss record for every team when a particular official was the referee. I then computed the differential between the team’s overall winning percentage and the winning percentage with a particular referee. For the referees, I chose seven official with service as referees between 2012 and present: Bill Vinovich, Ed Hochuli, Walt Anderson, Pete Morelli, Tony Corrente, Jeff Triplette, and Walt Anderson.

The Numbers

The graph below shows the normalized deviation for all teams with a particular referee. Note that the chart is a bit misleading since I did not set a minimum number of games, so a single game with a referee can have an out-sized result, but in general most team had three to seven games with any given referee. Each line represents the difference between the measured win percentage and the win percentage with a given referee. For example, the Patriots have won 100% of their games called by Jeff Triplette, but their 77% winning percentage brings that their line to 23%. Looking at it another way, at 77% wins, we would expect the Patriots to win 4 of the 5 games so winning 5 of 5 is probably random noise. Based upon the numbers as a whole, our cut-off for significant, non-random deviation is around 40%.

Looking at the overall chart, we see a random distribution above and below the line across all referees. This is as it should be, since one team wins and one team loses. We would also hope to see all the lines as close to the zero as possible, and certainly none above 40%. Unfortunately we do, so we need to dig a bit deeper.

Looking at specific referees, it is clear that Broncos fans have reason to hate it when Bill Vinovich referees their game. In five years seasons, he has called seven Broncos games, of which the Broncos have won exactly one. For a team with an overall winning percentage of 73% in the past five years (second only to New England’s 77%), a 14% winning rate is pretty noticeable, and the 59% deviation between Vinovich and the Broncos is the highest of all combinations. There are other teams on the Vinovich-hate-train as well: the poor Rams are 0-5 when Vinovich is around, while the Dolphins are 1-5. These teams significantly underperform around Vinovich.

On the other hand, there are teams that are happy to see Vinovich. The Lions, Steelers, and Buccaneers have all performed significantly better than expected. The Steelers always win with Vinovich, while the Buccaneers are 5-1 for an 83% win rate compared to their normal 35% win rate.

Zooming in on the graph shows the deviation from expected for teams when Vinovich is referee sorted from lowest to highest. So there are three teams that are worse with Vinovich (the 40% win rate Bears are not allowed to have any angst because they lost their only game), and two teams that benefit from Vinovich. But, the Bucs? Really? Vinovich should be helping the Chargers, right? Well, it turns out that the Chargers are 3-3 with Vinovich, only slightly better than their paltry 42% win rate. And the Chiefs and Raiders have benefited more than the Chargers, so I’d be tempted to put it down to poor Vinovich suffering from chrysophobia but the Bucs wear orange also.

Are there any referees who are friendly to the Broncos? Ed Hochuli is probably the friendliest, but the deviation in favor of the Broncos doesn’t hit significance. In fact we can see Hochuli doesn’t like the Falcons, but he does like the Titans and the Chargers. Chargers fans need to get over that blown fumble call.

The other referees also have similar patterns, and all except Morelli show a significant bias for or against a few teams.

So we have found some individual bias, but individual referees seem to like different teams. If there were systematic bias at a league level against teams, we would see the patterns emerge across all the referees. So by sorting the teams by Vinovich’s biases, we should be able see a pattern in the other referees’ bars. In fact we don’t. There’s potentially some minor patterning with the Redskins, Jaguars, and Browns but not outside the bounds of their crappy records.

The last gasp is to turn the graph sideways and look for patterns by team. Again, we find a pretty random distribution.

So now we must admit that there is no league-wide desire to skew the results. But aside from wanting to prove Florio wrong, there was never any serious thought that a league-wide conspiracy could exist. There is no chance that the 32 owners could ever agree on such a thing, and someone would have blown the whistle. Probably Spanos would have smuggled the meeting notes out in his underwear.

But wait, don’t throw away that tinfoil hat quite yet! We still have some serious allegations of bias on the part of certain referees for and against certain teams, and we have numerical evidence. How does this compete with the arguments from the NFLRA?

Making Bias Work for You

The NFLRA makes two points. First is the claim that end-of-game penalty statistics don’t really represent bias. This is more or less true, which is why I steered clear of including it. Occasionally there will be a huge penalty disparity between teams, but to determine if it was on purpose would require film study. The second point the NFLRA made is a lot less reliable. Switching crews around falls into the same bucket as the seven man crew argument in suggesting there is no way that a referee could influence the game on his own. This is in fact not true.

To understand how the referee could influence the game and develop a pattern of bias without being blatant, let’s review the positions and responsibilities of each official:

Referee – Positioned behind the offense, watches QB and tackles, announces calls, adjudicates disputes.
Umpire – Positioned behind the offense, watches interior line
Linesman (2) – Positioned at line of scrimmage, watches for false starts and off-sides, marks forward progress for runs, manages chains
Field judge (3) – Positioned in defensive backfield, watches for pass interference, marks forward progress for passes.

Linesmen and field judges all have overlapping responsibilities and have at least two sets of eyes on the ball. A pass interference will quite often draw multiple flags. Similarly, on a false start the linesmen are both running in to stop the play.

Referees and umpires have slightly different responsibilities and do not overlap as much. The umpire sets up closer to the line of scrimmage and is watching the linemen for holding. The referee is further back and follows the runner on run plays or is watching the tackles and QB. Basically, on passing plays the umpire is watching the front of the pocket while the referee is watching the back of the pocket looking for holding.

Holding is arguably the most disastrous penalty for the offence. Not only do you lose the gain from the previous play, but the ten yard assessment doubles the number of yards you must gain for a first down. The probability of converting a first down is inversely proportional to the distance. Calling (or not calling) holding can make a huge difference in the game.

When any other official throws a flag, there is a meeting between the responsible officials and the umpire. If a field judge throws a flag downfield, the field judge and the side judge meet with the referee to explain what they saw, then the referee makes the call. However, if the referee throws a flag there usually is no meeting because there is typically no overlapping responsibility. Because the referee has sole responsibility for the back of the pocket, if he does not like your defensive end he can let the offensive tackle get away with a bit more grabbing. Alternatively, if he dislikes the QB he can be a bit lax on how physical the defense can get before calling roughing the passer.

The players of course pick up on this. They push the envelope early on to see what will and won’t draw a flag. If the referee doesn’t throw a flag on the tackle, he will continue to increase his holding until such a time as he can control the rusher or the flags come out. When the flag comes out, the tackle decreases his grabbing a couple shades for a few plays and then keeps on going.

We also occasionally see the referee expanding his zone of influence. If you see the flag come in from the right of the QB to call holding on the center, the referee is expanding his zone. When that happens, the referee will usually meet with the umpire to exchange information. But remember the referee is the crew chief, and both holds the final say as well as grading of the rest of the crew. And this is where the psychology of the situation comes in.

Becoming an official for a major-league sport requires years of dedication and work, probably even more than becoming a player. NFL officials have spent years running up and down the field after high-school and college players, go to rules clinics and training sessions on their own dime, and in general have worked very hard to get where they are.

At the beginning of the league year, they put a crew together and start training. You work with these guys for months. The crew chief gets you working as a team, coaching and mentoring you. Now, a few weeks into the season you see you crew chief make a few bad calls. You trust this guy, and figure he’s just having a bad day. You don’t say anything, and in the review meeting the crew chief notes where he made some bad call. The next Sunday he’s back to normal and you relax, knowing that your bond with the rest of the crew is intact and you won’t need to risk the position you worked hard for. So the very thing that the NFLRA is touting as a positive is also a negative. While shuffling crews each season removes the chances for collusion, it also reduces the chances for someone to observe the bad behavior from inside. A crew will only see any given team once or twice a year, so there is likely no way to pick up on a pattern from inside the crew.

Bias patterns should be picked up from the grading of the officials. But grading is done in secret and rarely made public. Even when it is made public, there is relatively little negative: Jeff Triplette has made major gaffes over the years, but yet he still has his six-figure part-time job. That’s why the NFLRA response rings hollow. Like any labor association, the NFLRA is geared toward protecting members rather than protecting integrity.

So…

It is pretty clear there is no league-wide conspiracy at work here. It makes little sense for the league to mandate outcomes. There is a lot of money flowing into the NFL coffers without needing to fudge the results. Plus, in this case I will indulge in an argument from incredulity since I cannot imagine any way the owners would ever agree to such a thing, or would be able to keep it quiet.

However, individual bias is clearly shown by the numbers. While possible, it is statistically improbable that Denver simply has a bad football day with Vinovich by coincidence. It is much more likely that Vinovich is either consciously or subconsciously influencing the outcome from his position of power.

The obvious question is why? What motivation would he have to consciously screw up a game for the Broncos? I can offer possibilities ranging from point-shaving (a lot of money gets lost when the heavy favorite loses) to a straight-up dislike for the Broncos. The complexity of human behavior defies an easy external answer from me. All I can say is the evidence exists, and until the NFLRA and the NFL admit there might be a problem with bias or show they have bias under control, the fans will continue to speculate.