Toward a uniform (quantitative) worldwide rating system: A plea to the community to develop standards
At present, player ability ratings are highly subjective and highly variable. Each individual creator selects their preferred values while denigrating all other's ratings. These ratings are heavily painted by their own biases and preferences. But, the process of assigning a value to a player is and should be
as statistical problem. I want to outline a possible statistical approach, define some of the problems with this approach, clarify methods for evaluating the effectiveness of a rating system, and encourage disagreement about specific facets of the approach. Much of my professional training is in applied statistical techniques for evaluating performance and testing hypotheses about behavioral health. I hope to apply some of this in the current endeavor. That is not
an argument from authority, however. If you have specific knowledge or even a hunch that there may be a problem with a given decision or approach, voice it. Even if we ultimately disagree, it is better to explore alternatives than to assume they are incorrect at the onset. I'm not a mathematician. If you spot something wrong with the math. Point it out and propose a fix.
What would a good rating system do?
expect a good approach to accurately identify all players, nor indeed all attributes for a specific player. A statistical approach is a heuristic. It is not a gospel. That means that you must
expect good statistical approaches to be wrong in isolated situations. They may even be wrong a lot. The point is for the systems to be right more often than they are wrong and to be close enough to believable values for most players that fixing the other values doesn't take the community much time.
A good approach will develop a guide that roughly works for most players in the world using as little information as possible (occam's razor). The values which it derives should be mostly believable within a margin of error. If we're talking about less than about 5 Overall points, I think we're splitting hairs.
A good approach needs to be pliable. That means that others can make subtle changes and test the impact of those changes quantitatively and subjectively.
A good approach should mirror reality to an extent. But, do not be fooled into believing that it will be perfect.
A good approach will prioritize data that is available for most of the world's teams, leagues, and national teams rather than opting for data available only for the elite clubs.
What is a valid way to critique a good rating system?
It is not
valid to pick a single player or a set of single players and ridicule the system for not accurately predicting their breakout performance. For instance, the system I will propose would have undervalued a player like Joevin Jones by quite a lot. It also would value someone like Harry Kane very little in a relative sense. These individual's are exceptions. We talk about them and they stand out in our mind because they are so unique. A good system may accurately predict most players in a given league, but it may miss a few hidden gems.
Most of the professional systems have this problem. It is the purview of many scouting departments for professional clubs and independent firms to predict gems in an otherwise average pool. We can't expect to be better at this than they are. And, they miss often.
A statistical rating system will have parameters and sets of decisions. A parameter is like a variable (often entered into an equation) whose value one hopes to either set or estimate. If a parameter is entered into the system, it is valid to critique it's inclusion or estimated value. It is also valid to critique the behavior of that parameter. For instance, say that a we estimate a population mean is x. If you find that when we apply x to a team in the game they tend to dramatically and consistently over-perform, we need to reevaluate that parameter estimate. Perhaps our estimate for x is too high. Perhaps that parameter shouldn't be included in the way we include it. Perhaps it should be moderated (a multiplicative relationship) by another parameter.
Decisions should also be critiqued. Most statistical systems have arbitrary or biased decisions built into them. This one is no different. An eventual goal is to reduce those or at least reduce the impact of them. If a decision is made and is found to have little basis or is dramatically and consistently impacting the behavior of certain teams and/or players, that decision should be criticized and alternatives should be explored.
The take home message for critiquing a statistical system is that your gut matters, but evidence matters more. Instead of stating your opinion that a given team, league or player is overrated or underrated, run a simulation in-game and test whether it supports your position.
Also remember that this is a world-wide system, not an elite western european system. I expect it to be off when it comes to some of the elite leagues and players. But these have been extensively scouted by many football experts. The values are at least fairly representative of most of the community's beliefs about a player's value. Extensive scouting can trump mathematical heuristics. This is a system for the teams, leagues and players that EA didn't include (though it may work well for some of the one's that it did include).
A baseline proposal
This is a system I have been developing and feel comfortable releasing with the caveat that it is not perfect, is not finished, and is deserving of specific critiques and readjustments. When critiquing, be specific about your concerns, provide evidence, discuss mathematical alternatives.
The overall approach
In previous iterations I attempted to base league ratings on performance in intercontinental club competitions. A good system needs to find some common way of measuring performance relative to all
other countries. The problem, however, is that very few matches are played in these competitions and fewer matches means greater uncertainty. When using this approach, leagues and teams which are well thought of globally and which consistently produce players which perform well on the international level, would be so lowly rated that a massive disagreement occurred when trying to apply these players to national teams. One would expect that the best domestic players from a national team should be filling out a national team roster. The best foreign players picked up by better foreign leagues would usually constitute most of the starters and best players on a national team roster. The wide disagreement was just unacceptable.
What uniform measure can be used as a global basis for comparison?
Instead of basing comparisons on international cup competitions, I instead base them on Elo Rankings of National Teams. Every team is rated by the elo system. It is better accepted than the FIFA system. I doubt it is as good as the fivethirtyeight system, but they don't publish their full list. It's the best we have. That does not mean that it is perfect or even ideal. It's just the best we have.
Levels of analysis
The process I use drills down to a player value for a specific player in a club from a league in a country with a national team of a given Elo rank. That means the levels of analysis proceed as follows:
National Team Best Player (assuming no significant outliers or foreign players). If foreign players are in the national team, their value is either estimated by that
league's parameter estimates or by EA's scouts/process.
League best player (assuming no significant outliers)
Club team best player (assuming no significant outliers)
individual overall value estimate (assuming he/she is not an outlier).
National Team Best Player
When estimating the national team best player, the following parameters are used:
let y-hat will denote the outcome value for the best player on the national team (assuming they aren't a superstar that has been independently professionally scouted and shown to be an outlier... e.g., Messi, Ronaldo).
x = the nation's Elo ranking
a = first polynomial term
f = second polynomial term
g = third polynomial term
b = Polynomial y-intercept
I use a polynomial to describe national team best player abilities. As with most equations, it is meant to describe most players, not outliers. When a player is already in the FIFA game, I defer to EA trusting that they scouted or have a larger more elaborate system. I could be wrong about that.
The parameter values in this equation can and should be tested and critiqued
(but you should suggest an alternative value or equation instead of only criticizing).
y-hat = (a*x^3)-(f*x^2)-(g*x)+b
y-hat = (0.000005*x^3)-(0.0011*x^2)-(0.3343*x)+83
League best player
Now, let's assume that the best domestic player is the best domestic player on the national team (not a big leap). Again, ignoring foreign players, let the value obtained above for y-hat also represent the best player in a domestic league.
y-hat = the best player in a league = international best domestic player
Let us also suggest that soccer talent within a country can be understood via a normal distribution. Apply this normal distribution to the entire population, not just professionals or professionals in the first division. Now, let us say that the best player from a given league is approximately q standard deviations above the mean.
Use q to represent the exact number of standard deviations above the mean a best player is.
Therefore y-hat should also equal q*σ where:
σ = standard deviation of the population of soccer players in a given country
q = The standard deviations above the mean of best player in a league.
μ = population mean. Can be attained by taking (y-hat)-q*σ
In my system I assume (all assumptions can be challenged):
σ = 5
q = 3
Estimating best players on other teams
For a given league select the largest number of points available in the league table for a specific evaluation period. Call this t. Remember that the fewer the number of observations, the more uncertain your estimates will be. If the team only played 6 matches, there is a high degree of uncertainty about their relative ability. If they have played 46 matches, you can better estimate their relative ability.
t = highest number of points attained in league table
u = a specific team's number of points attained in league table during the same evaluation period.
v-hat = a given team's best player's z-score
v-hat = q-((t-u)*.01)
I have played with alternative calculations for v-hat such as using a log of the (t-u) term, but I am concerned that the drop off in talent is far too steep from the best team to the next few. It gives a huge advantage for the team at the top of the table. You can see an example of what I mean in the TT Pro League and Jamaican Red Stripe Leagues that I released (v0.1).
EDIT--- The calculation for v-hat is proving especially difficult. When using a logarithmic relationship with (t-u), there are very few top notch players in the league. Maybe that is okay, but it felt like there were perhaps too few (Subjective). Using the calculation above (and included in the released files, I think there is almost no separation between the best teams in the league and the worst. The old calculation was:
v-hat = 3-(log((t+1)-u)). I think I may revert to that calculation.
Getting an individual player's rating
The final step is to attain a value for an individual player. To do this, put the players in order of best to worst (at least as far as you can tell based on the data you have available). You may not know exactly which player is 7th versus 8th for example, but, you should be able to get enough information to take an educated guess.
r = player rank in squad.
z = z-score for player in population
o-hat = estimated Overall value for the player
z = (v-hat)-(log(r))
I tried some other formula's for z but each had fundamental flaws. One thing to avoid here is using another arbitrarily selected constant. We want to minimize how many constants we're adding.
o-hat = z*σ+μ
So there is a preliminary system for the community to play with. It is statistical. It is uniform. And I have included several specific points where the system can be modified and debated. I encourage using simulations in game to drive your decisions about modifications to different parameters. Feel free to discuss, but make sure this is a positive experience for all.
Goals moving forward should be fewer arbitrarily selected values and more calculated values as well as fewer predictors if possible. Also, I haven't figured out a quantitative way to drop down to second division teams, yet. I'm not sure if they should simply start at a z-score of something like 10% above the lowest in the table above. This is another good point for the community to discuss.
Do not try to chase outliers. It is a waste of our time. No one has been effective at identifying extreme outliers.
Remember, nothing is perfect. We're not aiming for perfect. We're aiming for a decent standardized heuristic. Once this system spits out values for every player on the planet, you can change the 5-10 values you want to change by hand.
Here is a worksheet which you can use to calculate one player (first sheet) or an array of players (second sheet) where you can get some estimated values and begin to play with the system.
Here is a downloadable version which you can use to modify the formula's, change certain parameter values and complete offline calculations: