Let's Level the Replacement Level Playing Field


According to Warren Buffett, the record for mathematical lunacy in the name of simplification has been held by the Indiana House of Representatives for more than one hundred years – a reign threatened only once by a misguided 2004 committee bill regulating stock options in the United States Congress.  In 1897, in attempt to make life easier for its citizens, the Indiana House passed a bill which decreed that pi, or Π, would be equal to 3.2 as opposed to the infinitely extending, and presumably difficult to remember, 3.14159 . . . . 

It’s quite possible that by wading into the debate on the various sabermetric calculations of Wins Above Replacement, or WAR, I will unseat the Indiana House of Representatives for that distinction and a heaping dose of derision will be my reward for this piece. 

If so, Sam Miller is to blame.

Sam Miller formerly covered the Los Angeles Angels beat for the Orange County Register but his work has come to the attention of a national audience since he started writing for Baseball Prospectus a little more than a year ago, and where he now writes full-time.  Sam is a terrific writer, extremely funny, and clearly passionate about baseball.  He’s quickly become a prolific must-read at BP and is a worthy successor in its long line of writing talent.  (Full disclosure:  Sam has been assigned the job of reviewing my upcoming book, Trading Bases, for Baseball Prospectus.  When BP’s Editor-in-Chief, Ben Lindbergh informed me of Sam’s assignment last fall, I told him I was thrilled and basically expressed the same sentiments as above.)

Last week, on Friday, December 14, Sam posted a piece at BP titled, Which WAR(P) Are You?  In it, he listed the divergent WAR (FanGraphs, Baseball-Reference) and WARP (Baseball Prospectus) calculations of a dozen players making off-season headlines.  The catch was he kept the source of each WAR blind until the end of the article.  The tone was whimsical and basically Sam challenged the reader to pick the WAR for each player which matched the reader’s impression of the player.  By the end of the article, Sam reasoned, the reader would have an idea which WAR calculation matched his or her “eye test.”

As it was certainly intended, the article was fun, and based on my Twitter timeline certainly attracted a lot of attention.  There were some grumblings about the divergent calculations but Jonah Keri probably summed up the feelings of today’s analysts best when he tweeted, “None of this negates the value of WAR as one of the many tools you can use to evaluate players.  Just requires patience and nuance.”

He’s right of course, but if we could just get the three sources to agree on one aspect of their calculations, the amount of patience required would be a lot less and we’d truly be left talking about nuance.  But as things stand now, well . . . let me show you.

The “RP” in WARP, of course, stands for Replacement Player (FanGraphs and Baseball-Reference drop the P) and, incredibly, this concept often gets ridiculed by some members of the mainstream press as being a contrived, impossible-to-comprehend theory, as pointless to attempt to grasp as a plume of smoke.  That astounds me because, like many employees, reporters confront the idea of low-cost replacement often in their own jobs.  When an editor tosses copy back at a scribe and mutters, “I can get an intern copywriter in the steno pool to write this crap.  Get me a source!” the writer is being challenged to justify his existence and perform at a level above replacement. (It’s possible my idea of newsroom banter never evolved beyond viewings of Lou Grant.)

The difference between baseball and other professions is that thanks to the dozens of players who change teams each year as either unsigned free agents, non-roster invitees, or waiver additions, it’s possible to actually quantify the performance of replacement players.  Evaluate a number of years of performance and you’ve got the baseline value of a replacement player.  Beyond some minor arguments on the fringes, there really shouldn’t be much debate about that population of players.  However, that’s not the case because the difference between the value of replacement players among the three sites in Sam’s article are so vast that it completely clouds the concept of comparative analysis.

To see this, one only needs to calculate the total WAR created by the population of MLB players in a single year.  Here are the total WARs for each site in 2012:

FanGraphs: 1,130 WAR, or an average of just under 38 wins per team.

Baseball-Reference: 874 WAR, or just over 29 wins per team.

Baseball Prospectus: 686 WARP, or just under 23 wins per team.

Those are huge differences in total WAR for exactly the same players with exactly the same statistics, etc.  This isn’t unique to 2012.  Total WAR in 2011 and 2010 were virtually identical to 2012 for each site.  (This will become important when we get back to Sam’s examples, which often took a multiple-year look at performance.)  Subtract the average wins per team from the average MLB wins of 81 and you get each site’s baseline for a team that fields nothing but replacement level players:

FanGraphs: 43 wins (.267 winning percentage)

Baseball-Reference: 52 wins (.320 winning percentage)

Baseball Prospectus: 58 wins (.359 winning percentage) 

These are huge differences and before you decide which level best passes a logic test, let’s quantify exactly what effect these differences have at the individual player level.  Conveniently, and to me a complete validation of the weight each site places on batting, fielding, and pitching, the total WARs listed above breakout between pitchers and hitters to within one-tenth of a percent of each other.  In other words, at FanGraphs and Baseball Prospectus, hitters account for 59.3% of total WAR(P) and it’s 59.2% at Baseball-Reference.  At the individual player level, that means a batter that plays an entire season (I use 685 plate appearances as my baseline) will have a WAR at FanGraphs 1.0 higher than at Baseball Prospectus as follows: Total excess wins at FanGraphs due to difference in baselines of 15 games (58 – 43) multiplied by the percent attributable to everyday players of 59.3% equals 8.9 WAR per team. Divided by 9 batters getting 685 plate appearances a piece equals 1 WAR per full-time player.  Using that logic, here’s a summary of the differences, per season, for a full-time batter and pitcher (32 starts) using FanGraphs as a baseline:

Baseball-Reference:  Add .6 wins per batter for each 685 plate appearances and add .75 wins for each 32 starts by a pitcher.

Baseball Prospectus:  Add 1 win per batter for each 685 plate appearances and add 1.2 wins for each 32 starts by a pitcher. 

Just normalizing each player to a standard baseline makes an enormous difference.  How much?  Consider the article that started this discussion: Including Joey Votto from the lede, Sam listed the WAR(P) for 13 different players.  Here were the sums of those thirteen figures, by site:

FanGraphs: 107.2 WAR

Baseball-Reference: 85.3 WAR

Baseball Prospectus: 64.8 WARP 

 Here they are restated to a common baseline as shown above, in this case FanGraphs:

FanGraphs: 107.2 WAR

Baseball-Reference: 104.3 WAR

Baseball Prospectus: 95.8 WARP

Look at that!  Suddenly, things look a lot more similar than dissimilar.  To be sure, there are still some meaningful differences at the individual player level but now we can truly have a meaningful discussion about nuance. (Email me at the address below if you’d like any supporting data.)

The massive differences in assumed replacement levels distort (senselessly, in my mind) the comparison of individual players.  If you have in your mind that it takes 50 WAR to have a Hall of Fame discussion, think about how much these differences add up over the course of a career. When I tell you Derek Jeter has 15 more WAR at FanGraphs than WARP at Baseball Prospectus you might assume it’s due to the sites differing evaluation of his defensive skills.  No!  It’s simply because Jeter has amassed 17 seasons worth of plate appearances over his 18-year career.  That’s 17 more WAR at FanGraphs versus Baseball Prospectus.  Both sites are actually in extreme agreement over his career value.

If you’re a passionate baseball fan, and a regular visitor to the three indispensable sites, you know there are certainly egos involved in the creation and oversight of the data at each site.  Just as fingerprints uniquely identify individuals, so does prose – although only prose also can give the reader insight into an individual’s personality.  So I imagine common ground might be difficult.  But champions of critical reasoning are at their core rational people.  Can’t these three sites come to an agreement on replacement level as it’s essentially a provable point?  By using 43 wins as its baseline, is FanGraphs really correct that only one team in the 40+ year division era, the 43-119 Detroit Tigers of 2003 have come close to being a replacement level team – and using the currency of runs scored and allowed the Tigers actually performed at a 49-win level.  A 43-win team would do something like score 500 runs and allow 900 over the course of a season; no expansion team in the era of player movement has ever been close to that bad. 

At the other end of the spectrum, what about Baseball Prospectus and its 58-win level?  Is every team that loses 100 games, really just about replacement level?  Perhaps, as that probably is close to what happens when expansion teams are formed but I wonder if there isn’t some middle ground.  Baseball-Reference’s reliance on the 52-win/.320 winning percentage replacement level team is well documented on its site.  Couldn’t some common ground be found on this topic to enhance comparisons?

I have very strong opinions on some of the various calculations of WAR including defense and the inclusion of “what if” instead of “what did” elements (think, FIP) but I readily admit those opinions fall under the category of Debates Worthy to Have.  Those debates, without question, enhance our understanding and enjoyment of baseball.  In the case of the different replacement levels, as I hope I’ve shown above using Sam’s article as a springboard, I think the differences needlessly cloud the picture.

If you disagree, please excuse me; I need to borrow the Irsay family’s legendary Mayflower moving van to transport a certain award from Indianapolis to San Francisco, where it just may reside for another one hundred years.


