Defining Descriptive, Predictive, and Prescriptive Statistics in Baseball
When the study of sabermetric thought was first brought to glory by the likes of Bill James and other industry leaders, the primary issue was related to the accounting methodology of baseball. Players were either being given too much or too little credit, which caused a massive misunderstanding of how these players were valued. This was addressed by the creation of descriptive sabermetrics or numbers that were more accurate in describing a player. Teams could properly understand the innate skills of given players relative to the competition, but that was the limit of the scope. Just measuring players proved to not be enough - teams needed to know more. This led to the discovery of prescriptive and predictive analytics in baseball, which are types of measures that try to affect the future in some way. With different statistics trying to do different things, fans often get confused when referencing numbers about their players, inevitably using the wrong types of measures to prove a point. It is quite easy to do this - most baseball numbers don’t explicitly state what they’re trying to accomplish. This makes differentiating and identifying these statistics all the more important.
The most basic and well-known of the three, descriptive sabermetrics are measures that attempt to tell the exact amount of skill demonstrated in a given play or year, calculating the value added or lost. These types of numbers are the cornerstone of modern advanced statistics, as well as the type that most conventional everyday fans use. In figuring out if a statistic is descriptive, it needs to pass a few criteria:
The statistic must describe what has physically happened on the field. If the number involves any type of regressing qualities in its formula that attempts to adjust for future performance, then it can probably not be considered descriptive.
Any value weights must be related to historical performance. Regression can still be used in descriptive stats, but the weights that they yield must be related to past performance. They are often necessary for more accurate types of these numbers, as proper accounting usually requires regression to yield what factors are worth more than others when comparing players.
The number must try to indicate some sort of added skill value to the game of baseball. This point is crucial in differentiating between descriptive and prescriptive, as some numbers may fall into both before this separation. A descriptive statistic must show some sort of skill, like fielding or extra-base hitting ability. Statcast measures do not apply (ex. Launch Angle, Exit Velo), as these numbers don’t directly reveal value added. They reveal the ability to impact descriptive statistics (such as a higher Exit Velocity leads to a higher wOBA) but don’t actually provide the value. Descriptive statistics explicitly show the value.
Assuming that the stat passes all of these criteria, then one can correctly treat it as a descriptive statistic. When utilizing these, it is important to note their boundaries. Mainly, the focus of these measurements is to accurately tell what went on during the season. These numbers rarely have any predictive value, and often shouldn’t be used heavily when evaluating a player’s future worth (in comparison to predictive metrics). The value measures may also be limited in accuracy, as using weights at all invites the possibility of incorrectly assigning values to the wrong places. As baseball gains knowledge, the margin of error will slowly shrink - but, error will likely persist due to the very nature of the sport.
These types of numbers in analysis are perfect for comparing two players and their performances, evaluating MVP, Cy Young, and any other award votes that may persist. In the long run, these are also helpful for Hall of Fame cases. Since trying to guess the future is not needed in that type of voting, using these statistics is perfect for voters to make the right decisions.
Examples of Descriptive Sabermetrics: wOBA, wRC+, OPS, DRS, UZR, WAR
Out of any type, Predictive Statistics get the worst treatment of them all. Many fans genuinely dislike the fact that people are trying to predict America’s Pastime with a bunch of numbers, claiming that it brings unnecessary complications. And while the merits of these attacks are questionable, a refutation is beyond the point of explaining what predictive sabermetrics actually are, and how they can be applied correctly. Predictive Sabermetrics are measures that try to articulate what should have or will happen, regressing for certain rates within a player's performance to indicate future value. To determine if a statistic is predictive, it must pass the following test:
The statistic must attempt to forecast what is likely to happen in the future. While in general, this must feel obvious to the reader, it is extremely important to clarify the difference between the types. This is generally done by regressing numbers in comparison to past performance to yield new potential stats. By assigning weights and going through variable formulas, these numbers will show the likely outcomes from a given player based on his type of play and performance.
The number must adjust for current unlikely factors. With xStats specifically in mind for this measure, the number must attempt to alienate outliers and show the most probable event. xStats may be showing a current year's expected performance, but they aren’t showing a player’s actual value - they are showing his expected value without the unlikely events, which are only used to predict the future under a normal environment. A normal event is most likely to happen in any given circumstance and adjusting to it is necessary for any type of statistic.
If the statistic one wants to use passes both of the barriers then it should be utilized as a predictive statistic. With that in mind, acknowledging the potential shortcomings is continually useful for proper evaluation. These numbers should not be used for any type of comparison in value, as ultimately that deviates from their objective, and is therefore useless. Predictive measurements are often subject to several flaws as frankly, no one can predict the future with 100% certainty. Deviations from the normal happen constantly - all one can do is hope for the most likely event. They should be considered as the best guess - nothing more, nothing less.
In analysis, these numbers are best used for projection arguments. When the question turns into, “Who will win MVP next year?”, these measurements become extremely relevant. The same goes along with team records, which can be projected for next season based on many factors. Once a player or team stops playing, they’re arguably useless. But before then, they serve as the best glimpse of the future.
Examples of Predictive Sabermetrics: xwOBA, ZIPS Projections, xSLG, xERA, pCRA
Descriptive and Predictive metrics were at the forefront of baseball’s sabermetric revolution for many years. But with the introduction of loads of data to team front offices, prescriptive numbers are more important than ever, even having a major influence over many predictive numbers. Specifically, prescriptive sabermetrics are numbers that isolate certain physical factors with performance and help yield recommendations as to how a player or team might adjust. To know if a statistic is prescriptive, it must have the following:
The statistic must reveal some sort of underlying information. The whole point of prescriptive analytics is to be able to figure out what is causing a player to act a certain way through statistical methods. Player X saw his batting average go down - what might have caused it to go down? The numbers must convey something that is not the batting average itself, but a factor about the player that might affect that.
The statistic has a degree of impact on player performance. If the number being used has a correlating impact on a player’s ability to produce value, then it has passed. These numbers must show an ability to some degree, which would hopefully translate into a player adding value for his team.
The statistic must quantify action. For this criterion, the action is considered as vague as possible - only to the extent that it is at least physical to some extent. Physical action can include sprint speeds, arm angles, exit velocities, etc. As long as physics can measure it and formulate it into usable numbers that are proven to cause a player’s performance, it should be considered prescriptive.
Given that the statistic being used passed these qualifications, it can be used as a prescriptive number. Now, for the limitations. These numbers provide no estimate whatsoever as to what value a player produced or what value a player might produce. They can be impressive feats that show the amazingness of the human body, but they generally won’t translate to how many runs were added on the scoreboard, or how many might be added next year. These metrics can also be heavily misleading, as a load of assumptions are having to be made for these metrics to somewhat make sense and add value. It is possible to discover visible laws between performance and prescriptive statistics, but it may be faux that happens to correlate very well. Most of these numbers never directly translate to success or issue, but they can prove to be helpful.
When doing analysis, these metrics are perfect for judging a player’s physical ability, as well as the potential for future success by looking for transferable skills. They are also great for diagnosing problems, as underlying large deviations in statistics may mean that something is afoot. For example, pitchers that have augmented stride lengths or decreased velocity may mean that they’re hurt. They may not physically know that they are, but their bodies' corrective nature reveals that they’re struggling. These numbers best represent the underlying causes of performance.
Examples of Prescriptive Sabermetrics: Exit Velocity, Throw Speed, Spin Rate, Pitch Tilt
Acknowledging that different types of sabermetrics serve different types of purposes is crucial to understand the new look of baseball knowledge itself, as numbers are often meaningless without context. Being able to differentiate the types provides context, which should be able to eliminate some erroneous or misguided evaluations. Descriptive sabermetrics describe the actual value that happened on the field, which can help in evaluating two players against each other in a season. Predictive sabermetrics attempt to predict what should have and will likely happen, trying to bring some certainty in arguments about the future. Perspective sabermetrics showcase the underlying quantifiable causes of the ability to produce value, proving beneficial in evaluating player potential with limited statistics or identifying problems. These types provide a great outline for the goals of sabermetrics - to describe, predict, or prescribe.
The examination may appear to be black-and-white, divided by flat lines that directly separate the types… But, I encourage the reader to ignore that type of thought entirely concerning these categories. There are many shades of gray between types, with many statistics overlapping a few. One number may be designated as descriptive or prescriptive, but happen to have predictive qualities in other aspects. The categories are only meant to serve as guidelines, not as strict rules. Any way that it is taken, it is crucial to view a statistic with as open a mind as possible. Know its strengths, know its weaknesses, know its goals - only then, one can make a proper evaluation of what is trying to be found.