Tuesday, February 21 at 4:00 PM ET
Today's blog will outline some of the recent tweaks and discoveries related to strength- of-schedule adjustments in college sports.
College Strength-of-schedule Adjusting
In 115 DI college basketball games, Jeremy Lin shot 54.7% from two, 33.3% from three, 73.3% from the free-throw line and averaged 17.7 points, 5.9 rebounds, 4.8 assists, 0.8 blocks, 2.7 steals and 3.9 turnovers per 40 minutes played. In 87 DI college basketball starts - every game he played in his sophomore, junior and senior seasons at Harvard - Lin shot 55.5% from two, 33.9% from three, 72.5% from the free-throw line and averaged 19.0 points, 6.0 rebounds, 5.0 assists, 0.9 blocks, 2.8 steals and 3.9 turnovers per 40 minutes played. Yeah, but he went to Harvard. He dominated the ball for a bad team and beat up on worse teams. Right?
He definitely dominated the ball for the better part of his three starting seasons, but playing the difference in playing for an elite team and playing for Harvard may not mean as much as most - including me to a good degree - believe. It's a small sample size, but in six starts (209 minutes played) against "BCS" college basketball teams, Lin actually performed better than those other numbers quoted, shooting 61.4% from two, 38.1% from three, 79.3% from the free throw line and averaging 22.4 points, 5.6 rebounds, 5.4 assists, 1.0 blocks, 3.4 steals and 3.4 turnovers per 40 minutes. For comparison's sake, in his nine NBA games with at least 30 minutes, Lin is shooting 54.4% from two, 33.3% from three, 71.0% from the free-throw line and averaging 26.5 points, 3.6 rebounds, 8.6 assists, 0.2 blocks, 2.5 steals and 5.8 turnovers per 40 minutes played. The per 40 minute numbers are not pace-adjusted (the Knicks play much, much faster), and Lin has actually handled the ball more with the Knicks than he was expected to at Harvard, but it still pretty easy to realize that there seem to be obvious links between the college profile and what Lin has done in the pros - especially in that he is shooting almost exactly the same.
This example is not meant to illustrate why he was missed or that Lin should not have been missed by scouts. Technically, I have not done any pre-draft NBA analysis (just post-draft) since starting PredictionMachine.com, so I never actually crunched the numbers to look at Lin (he was not drafted) coming out of college, but my guess is that I would have seen a player not all that dissimilar to Norris Cole (though Cole was not quite as consistent throughout his career at Cleveland State), who played four years in a weak, yet respectable conference, filled the stat sheet, shot the ball well and was undersized. That's not necessarily the point. It so happens that Lin is the perfect example of something I uncovered in college sports over the last month or so (and fully implemented for all games on February 8).
I like to track the performance of our simulation results against other, purely objective, analytical approaches in the hopes of finding ways in which simulation can better account the intricacies of individual games than power ratings or similar techniques. Many of these analysts are widely known, but I will leave all names out of this because I do not want to give the impression that I understand everything that is going on "under the hood" of another entity (just like I would hope they would do with me). This is also a good way to gauge how people of our ilk are performing relative to the sportsbooks. The hope/assumption is that, since sports wagering, due to the bias nature of individuals connected to teams, will always remain an inefficient market - even if Las Vegas is flooded with "nerds" like the many bright analysts that have come before, exist now and will likely exponentially increase in volume in the coming years. I also believe that our simulation approach provides the best overall combination of analytics due to the fusion of strength-of-schedule (and ballpark) adjustments (to remove bias from the numbers), accounting for injury and allowance for interaction of players. I believed that was the case two weeks ago and I think it is better now. That all being said, over the course of the week from January 23 through January 30, every "normal" or better college basketball ATS pick that we got wrong agreed with the two most notable (and generally well respected - to the point that I know that most sportsbooks check their numbers frequently) analytical approaches to the sport. There is/was certainly the possibility that was a fluke, but analysis for the time after before modifications were made (and to the best of our ability, tracking back through historical performance) has indicated a strong correlation between our strong opinion, incorrect picks and those approaches. Something that most of us are/have been doing is off - and there are very few things that could be throwing it off in the same direction.
Simulation is pretty straight-forward. It's a choose your own adventure book with finite options to choose at each decision. The complications to what we do lie outside of the actual engines. Garbage in can equal garbage out, so the manipulation of the data is of the utmost importance. I have a great deal of experience (and confidence gained from that experience) in doing this over eight years professionally (as do the previously alluded to analysts - and then some), yet that does not meant that I/we cannot miss anything or be humbled analysis that improves upon what I/we have done. The most difficult, yet important things to get right when manipulating data are: removing bias from previous opponents (strength-of-schedule adjusting), deciphering what portion of performance in a new season is "real" (as opposed to what career numbers indicate), role changes (mostly having to do with how a coach will use a play) and evaluating the impact of injuries on a player who is going to play (the Rob Gronkowski rule). I will put our track record of handling these issues up against anyone - especially in pro sports. But, when it comes to strength-of-schedule adjusting college sports, I have to acknowledge that I feel as though the books passed us up (temporarily)... Needless to say, I finally figured it out.
I do not mean to claim that all other approaches were doing exactly what we were doing that created inaccuracies, nor that what Vegas is doing with college SOS is perfect. It's an ever-changing business. Sometimes (hopefully most of the time in this analogy - as is actually the case in most sports as well) the offense has the advantage. Sometimes it's the defense. If a talented offense can figure out what the defense is doing, it should be able to exploit it (even more than vice versa). This is not a static process where our opponent will use the same approach indefinitely and we just need to figure out a way to adapt and beat it. To a good degree (while balancing public bias), our opponent (which, in this case is actually the books and other players who have greatest impact on line movements), is trying to adapt and beat us (or at least to understand where we may have inefficiencies and to allow them to be exploited). It's an evolving process where both can always improve - which is what we are always trying to do (it's just not always that obvious how to do it, but I/we spend a huge amount of time looking through numbers, tracking performance and analyzing new approaches to the data manipulation concerns outlined above). As long as the market includes fans though, it will remain exploitable by people and organizations like us.
We cannot give everything away, but the gist of it is that I fully believe that adjusting for opponents in college sports in our analysis was excessive (in most areas - double accounting in some). Basketball provides the best example. Jeremy Lin played in a sport with 345 different teams, on a team that would rank somewhere around the middle of that enormous group and against opponents that would rank anywhere from the top ten to the bottom ten. Yet choosing any sample of 25 consecutive games in his career (25 is around the threshold for statistical significance in this example) - including college and NBA - unveils a player who shoots around 55% from two, 33% from three, 73% from the free throw line (though FTs were never impacted by opponent in our numbers), had about 1.5 assists for every turnover, rebounded a high percentage of the game's misses relative to his position and was typically the best on the floor with respect to steals per defensive possession. It's the same guy on any court. We trust that is the case for professional athletes, but that concept is true for most DI college athletes as well. Far more so than ever, DI college athletes are talented and consistent. Furthermore, not nearly as much separates the good from the bad as their used to.
In a bit of a crude example (because it never was quite this way in the engine), if a player shoots 40% from three against the tough schedule that Michigan has faced, he is probably not going to shoot 50% from three against the easy schedule that Southern has played. And, he likely would not be able to play against high school kids and hit 60% from three for the season. Defenses do far more to limit shot opportunities than they do to impact a shot's likelihood to go in (especially with three point attempts - this is not an original thought and actually became a topic of much discussion online a day after the final engine tweaks were implemented).
It does not just matter what you have done in sports, but against whom you have done it - a phrase I have uttered many times - is still valid, yet not to the degree or in the same ways that I/we had been doing it since the site's inception (January, 2010). What does this mean? Well, since modifying the numbers, it has definitely meant that our ability to draw perceived value from analyzing basketball games against the number has dropped significantly. Far fewer "normal" or better plays are being generated. Actual value has increased, however, because performance has improved. We do not have the extra noise of invalid SOS adjustment clouding the picks. The engine takes over and better assesses how teams relate. Understandably, it is still early (and we still expect to do better in March when teams play neutral court games with high motivation and we have more public to exploit), but I personally feel very comfortable with what we have done. AND, I look forward to rolling it out for college football as well.
As usual, if you have any of your own comments about this article or suggestions about how to improve the site, please do not hesitate to contact us at any time. We respond to every support contact as quickly as we can (usually within a few hours) and are very amenable to suggestions. I firmly believe that open communication with our customers and user feedback is the best way for us to grow and provide the types of products that will maximize the experience for all. Thank you in advance for your suggestions, comments and questions.