Diamond-Theory.com

News and commentary from around the world of baseball

  • Increase font size
  • Default font size
  • Decrease font size
Home Numbers
Numbers

Where does Rickey Henderson fit?

E-mail Print PDF
This year Rickey Henderson is on the Hall of Fame ballot for the first time. To commemorate this occasion, I want to play a little game: Here is a list of players. Tell me the first thing that comes to mind.

Pete Rose, Barry Bonds, Ty Cobb, Carl Yastrzemski, Hank Aaron

Last Updated on Friday, 06 March 2009 23:17 Read more...
 

Revisiting MVP voting

E-mail Print PDF

A little over a week ago I made two posts involving a model I made to try predicting how the BWAA would vote on the MVP. Now that the votes are out, I wanted to go back and see how my model did. Below are plots for the National and American League vote outcomes:

Last Updated on Friday, 06 March 2009 23:17 Read more...
 

Predicting the American League MVP

E-mail Print PDF
As promised, here is my model prediction for the American League MVP voting. It looks like I should have added a "plays in the AL East" variable, because I know those guys are going to end up doing better than this model shows. It's interesting to see that this race is much closer than the NL one I posted a few days ago(see below). Not surprising, since I didn't really see any clear frontline candidates looking at stats before the simulation (again, I used the top ten plauers in OPS, runs, and RBI in the model). This siulation brought to light a few issued I didn't mention in the earlier article, but will below. Again, the values given in the table are (1) the average MVP win probability, (2) the probabilities rescaled to sum to 1, and (3) the probabilties scaled to the predicted winner (ie, Morneau's probability of winning is 0.67 times that of John Hamilton).
Last Updated on Friday, 06 March 2009 23:17 Read more...
 

Predicting the National League MVP

E-mail Print PDF
This year's post season awards are upon us. Yesterday Geovany Soto and Evan Longoria won the NL and AL Rookie of the Year Awards, respectively. While neither of those two picks come as a huge surprise to me, there have been times in the past when I felt different players should have received some BWAA hardware (I know this wasn't the BWAA, but how does Aramis Ramirez beat out Albert Pujols as the best offensive player in the NL this year?).

Anyway, the question has often plagued me as to what makes an MVP over a player that had a really good year. To this end I have tried a number of different models over the past couple years, but recently I came across the ideas of classification trees and "bagging". Classification, or decision, trees have become popular in the media recently. Instead of trying to explain what such a tree is, here is an example (along with some props thrown to FlowingData.com). The basic idea is that you classify each individual by one variable at a time into one of three options: group A, group B, or no decision yet. At each fork in the tree, with the exception of the last fork, one of the two options is no decision. This seemed like an interesting idea to use for MVP voting, among other baseball applications that I have tried to model at some point, such as HOF status. In fact, that may lead to another post.

Last Updated on Friday, 06 March 2009 23:17 Read more...
 

Randomized Wins: Predicting team wins using game run totals

E-mail Print PDF

By now, most people are fairly familar with Bill James' Expected Wins (referred to as XW) formula, but for those who are not, a quick description:

  • (Expected Winning Percentage) = ( (Total Runs Scored)^1.82 ) / ( (Total Runs Scored)^1.82 + (Total Runs Allowed)^1.82 )
  • Step 2: Multiply expected winning percentage and by the unmber of games played to get expected wins estimate

The beauty of this process is that a fairly simple formula gives very accurate results. In the table and images at the bottom of this article you will see that this gives surprisingly accurate results. For the 2007 and 2008 seasons, the average error made by the XW formula is 0.0411 wins, and half of all errors are between about -2.5 and 3 wins. The idea of the XW statistic is to see how much teams have over- or under-performed throughout the season. Whether one attributes this difference in performance to randomness or skill of the team/ manager/ players to win close games is a personal decision. Most tend to assume any difference is due to luck, but I don't know if I fully agree with that assessment.

From its introduction, I have been wary of the XW procedure. The XW formula treats every run a team scores throughout the course of a season equally. This, we all know, is not true. Some runs are more important than others. This does not just include runs that win (or lose) crucial games , but crucial runs within games. In almost every case, a team's 10th or 12th run of a game means much less than a teams 3rd or 4th run of that game since they would most likely win with or without that, say, 12th run. I felt that the XW formula did not account for this difference in importance.

Enter Randomized Wins.

Last Updated on Friday, 06 March 2009 23:20 Read more...
 


Translator