From Joe DiMaggio’s 56-game hit streak to Nolan Ryan’s 5,714 career strikeouts, baseball is a game of numbers. Statistical information increases your understanding of player performance and deepens your appreciation for the game.
We’ve put together a resource guide for budding sabermetricians looking to dive into the data. It starts with a powerful computer like the Dell XPS or Lenovo Ideapad 310, with plenty of storage and processing power to work with large data sets. Then you’ll be ready to take advantage of the many analytic websites that steward baseball statistics. The resources that follow will get you in playing shape whether you’re putting together your fantasy lineup or just want to be a better student of the game.
This popular site is cherished among seamheads for its huge statistical database, insightful analysis, and detailed projections. But the biggest draw for rookie number crunchers is the Fangraphs Library, a primer on advanced baseball metrics. Here, you can read explanations of sabermetric stats such as FIP (Fielding-Independent Pitching) and BABIP (Batting Average on Balls in Play) in plain English and learn how to use them to see the game behind the game.
This long-running site is a treasure trove of historical baseball data, including both traditional and advanced statistics. But the real prize is what you can do with it. The Play Index allows you to perform incredibly granular queries to get to the deep details of player performance. Want to know what active American League starting pitchers have had the longest winning streaks with at least five strikeouts and no walks per game? This is where to find it.
The Lahman Database
At some point, you’ll want to manipulate raw data yourself rather than rely on the preset breakdowns a stat site provides. The downloadable Lahman Database currently provides complete batting pitching, and fielding statistics from 1871 to 2015, as well as post-season data, managerial records, and other information. Best of all, it comes as a relational database—you can download it in MySQL and Microsoft Access versions or as a comma-delimited file for use with a spreadsheet program like Excel—so you can perform powerful searches limited only by your own curiosity.
Pitch analysis is where much of the cutting-edge baseball research is being done. This site leverages Major League Baseball’s PITCHf/x data—an aggregation of every pitch thrown in a game, including the type of pitch, its trajectory, and where it crossed the plate—to provide insights into both pitchers and batters. Select a date, game, and pitcher and you can export the tabled data into Excel.
Retrosheet is a unique and indispensable resource in the world of baseball data mining. With the help of a devoted volunteer corps, it aspires to recreate play-by-play records of every major league game since 1914. In addition to the boxscores and narratives, you get team and player statistical splits, umpire data, and histories of every major league franchise. You can also download game logs, play-by-play files, and schedules to do your own analysis.
Big data has changed the way we play and understand the game. But statistical analysis isn’t just for baseball’s coaches and front office executives. As a passionate fan you only stand to deepen your love of the game by watching it with an empirical eye. Step up to the plate!