Demystifying the Math – The Rise of Statistics in Baseball

By Conway West 11.26.14

It has been impossible to be a baseball fan and not witness (or take part) in the holy war that is statistical analysis of the sport. With the rise of fame of Moneyball, this reached levels that transcend baseball fandom. When I tell folks I write about baseball, they immediately talk Moneyball: “oh so you are like Billy Beane/Brad Pitt/Nerd With A Calculator Who Squeaks Nerdy Information When Commanded By The Executive”. Stats referred to as sabermetrics (from SABR: Society for American Baseball Research) are everywhere in the sport, even to some who know the game only casually.

While I may love stats, particularly SABR, I also love the fickle nature of the sport of baseball. I love that one day runs are scored in bunches, then the next they are at a premium. I love how baseball blends predictability and statistical anomalies every game of an arduously long season.

The stance I see with statistics in the game boils down to two main tenants, which I will talk about in depth:

  • Almost all stats have at least some value
  • This value can be used as an evaluative tool or a predictive tool

The first of these tenants, finding the value in each statistic, will be covered in this article.

Jonah Hill

Moneyball foreshadowed a huge transition, investigation and investment by MLB teams into finding more meaningful and valuable statistical measures.

Value of statistics

The most misunderstood about new age baseball statistics is their purpose. As Jonah Hill points out, the point of the game is to score runs and win. As baseball analysts, we try to refine that goal into models: what is valuable on a baseball field? SABR’s main purpose is to determine what answers are more correct than others – and that this evaluation is an ever-evolving tool.

Nearly all baseball statistics have value. What should be debated in how much stock we put into each statistic. Wins and batting average are two of the most talked about on this topic. Contrary to what be said in analysis, both these stats do have (some) value. When there is one game left of the playoffs, you better believe whichever pitcher gets the win is a good stat for that team. During the year, wins tells how many games the pitcher had some involvement in the win – often interpreted as ‘outdueling’ the opposing pitcher. This has been like in days of old when one pitcher pitched the entire game until it was out of reach, and it was surmised that the pitchers from either side “won” or “lost” the game for their teams.

The problem is that there are far, far better tools to determine value of a pitcher. Gone are the days of pitchers going deep into games every time out, so “wins” measure nothing outside of that game, which is usually based on many things outside of that pitcher’s control. So “wins” measures so little of the value of that pitcher, both for that game and games to come. What about the elements of the game that the pitcher does have more control over?

The next step in evaluation of the pitcher is usually earned run average (ERA), a tried and true measurement. Unfortunately, ERA has its flaws too. First, there is always a debate of the idea of “earned run”. When determining this stat, it takes into account errors, which are arbitrarily assigned by a member of the media (the “scorekeeper”) watching that game. This mechanic is flawed. Additionally, there is the inherited runners issue. ERA assigns the runs to whatever pitcher was responsible for putting the runner on base, even if the next pitcher comes in to the game and lets the runner score. Why should the first pitcher be charged for a run he didn’t give up?

ERA is a decent measure, but like good modelers, baseball statisticians have developed and shown what pitchers have more control over: strikeouts, walks, and avoiding hard contact. That’s all there is to it for a pitcher – if they can strike a lot out while only allowing weak contact with no walks, they are doing their job. This is where fielding independent pitching (FIP), expected fielding independent pitching (xFIP), and skill interactive ERA (SIERA) come into play. These isolate just the things that pitchers have sole control over.

For hitting, let’s take Batting Average to start. Batting average has been used since the 1800s, and can describe how effective the player is at hitting the ball to get on base, particularly in large samples. It is also predictive, in large sample sizes: players with this skill can repeat it. This explains why Joe Mauer, Ichiro, and others have had years near the league lead in average, and why Tony Gwynn was a first ballot hall of famer. Baseball lore has always loved the “good hitter”, someone who can go up to the plate and square up a baseball, putting it where they want it in the field. For decades, the ‘hit’ tool by scouts has been quantified by batting average. Batting average has a special, special place in some fans’ hearts.

If baseball was a contest of who could hit the most pitches squarely where they wanted, batting average may be more useful. But baseball is a sport where runs matter, and runs only. And runs are scored more by either powerful hits (doubles, triples, home runs), or by avoiding outs (lots of hits, walks, and hit by pitches together). Simple tools such as on base percentage (OBP) and slugging percentage (SLG) have been around for decades, but have become more and more popular as the game develops. OBP is easy to calculate and doesn’t need any conversions or “park factors” is a simple tool to show how well a batter is at avoiding outs.

Once again, better models have been developed to show exactly how valuable each skill has been proven, using thousands and thousands of actual baseball games as data. Stats such as weighted on base average (wOBA), weighted runs created plus (wRC+) and Run Equivilancy per 24 outs (RE24) have been developed using these samples, and are far more useful when looking at batting. By assigning weights to events (singles, doubles, homers), ballparks, quality of pitching, and situations (did the hit happen with people on base?), baseball analysts can tell so much better how a player is performing, and the sustainability level of that performance.

So many fans are irate about baseball becoming a numbers game. The truth is, baseball has always been a numbers game. We have just been using more refined tools to figure out how the numbers matter. So while wins and batting average can be counted, they do not tell even close to the entire picture.

Ultimately, all these numbers are used to try to develop wins on the baseball field, something that no one has mastered perfectly. But as Jonah Hill shows in Moneyball, we are certainly getting better. And contrary to some beliefs, these numbers are just trying to tell a better story of how and why some teams make it and some don’t.

PS: We will be back after a short hiatus for Thanksgiving! Check back the week beginning 12.05.14


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s