Sports analytics: Nate Silver, Jeff Ma and Daryl Morey discuss all things data

This weekend, geeks of all shapes and sizes gathered at the Boston Convention Center for the Sloan Sports Analytics Conference. The definition of what exactly analytics is remains a bit fuzzy, but the crowd was a mixture of analysts from the NBA, NFL, NHL, MLB, and Premier League teams, statistical consultants, eager engineering students, and a good number of business students.

Michael Lewis, Mark Cuban, Nate Silver, and Daryl Morey open the Sports Analytics Conference. Revenge of the Nerds indeed!

As for the actual content delivered, the early returns were a mixed bag. On one hand, the breadth of sports covered was breathtaking. We all know that analytics has swept through the major professional sports, but who knew there was so much attention being given to tennis, triathlon, MMA, NASCAR and video games? It was very cool to see people digging into these different games, each with it’s unique set of challenges for making prediction.

On the other hand, analytics for all of these fields, including the major sports, are still in their infancy. Much of the work centered around finding simple correlations in the data, with little to no ability to address the underlying mechanisms driving the overall pattern. No new Bayesian approaches to sports analytics, though I suspect they were largely being kept under wraps. The inherent weakness with most of the analyses shown at this conference is that they only apply to a very narrow set of conditions. Faced with a new or novel set of conditions, it is unclear how well these models will hold up. That’s not a great proposition for a dynamic realm like sports.

The most interesting session was one that included Nate Silver, Jeff Ma and Daryl Morey. Silver is the statistician and author of FiveThirtyEight, rising to fame for his accurate predictions of the 2012 elections. He subsequently authored the book The Signal and the Noise. Jeff Ma used his statistical background to employ a card counting strategy that made him the subject of Bringing Down the House: The Inside Story of Six M.I.T. Students Who Took Vegas for Millions. Daryl Morey is the Houston Rockets general manager who brought an analytics approach to team management and is the founder of the analytics conference.

During the panel, Silver and Ma both emphasized the need to separate the analytics process from outcomes. Teams focusing too narrowly on outcomes over short time horizons or small sample sizes will render any analytics effort a dicey proposition. Patience and discipline to stick with the predictions of the model are required characteristics, but difficult to maintain when short-term success is so (understandably) important to most teams. Being swayed by losing streaks and short-term underperformance are powerful motivators to deviate from the plan. Unfortunately, waiting for data to provide a definitive prediction of success is a less realistic goal than using the data to understand and reduce risk. Professor of Sport Management at Menlo College, Benjamin Alamar, who was also on the panel suggested that analytics simply serve to narrow the range of noise. Successful teams must understand that analytics is a long-term strategy.

MIT Sloan Sports Analytics Conference

Owners, the panel went on to explain, often want experts that can tell them what moves will definitively ensure that they are going to win, but analytics can’t do that. At best, they can make recommendations for strategies to shift the odds. This puts the statistician in a bind, because they are generally interacting with scouts who KNOW they are right, whereas the statistician can only talk of probabilities. Morey gave an example about how the Rockets, lacking any superstars, needed to find high-variance players. In this sense, high variance equates to high risk and the players they got could have flamed out, but gave the team a chance to improve markedly. Silver mentioned the Blue Jays under JP Riccardi made a habit of selecting low variance college players that were closer to making the majors, but lacked the potential upside of players taken out of high school. Over the long term, that strategy didn’t appear to work out all that well, especially when the competition in the AL East was loaded with high end talent.

Jeff Ma suggested that successful outcomes can be used as a measure of progress for an analytics system, but only over long time periods. An insight into the type of long-term discipline that exists was on display by Cleveland Browns president Alec Scheiner. It’s generally understood, he contends, that one does not easily give up draft picks to move up in the draft, but we also know that quarterback is one of the most valuable positions. So, a difficult choice was presented about moving up in the 2012 NFL draft to pick Robert-Griffin III. Scheiner contends that the models suggested no and that picking RG3 was, “at it’s core” the wrong pick. Even in the face of so much early success, one season was not a large enough sample to change his opinion about the pick. Scheiner suggested that it may take 5 years to properly assess the value of RGIII given the potential injury risk.

Morey and Silver then made comparisons to weather forecasters who sometimes bias their predictions towards more rain because their job incentives are asymmetrical. Miss predicting a sunny is fine, but missing a storm is held strongly against them. This, they contend, is where sports analytics can help by avoiding the big mistakes. Another interesting point was that weather forecasters often avoid predicting that the chances of rain are 50-50%, because who needs a weather forecast that appears no better than a coin flip? I wonder how often this occurs in the world of sports analytics?

So what is the real value of using computer models versus human experience and intuition? Silver contends that humans are good at pattern recognition. When the NBA came back from strike, it was clear that poor performance was related to players being rusty and out of shape. Humans understand that sort of context easily, whereas computers cannot. However, overconfidence in this very skill is also what leads to many errors in judgement. There are special cases where the models do not apply and human intuition can help, but one must always guard against calling every case special. In his review of Silver’s book, Princeton professor Sam Wang sums it up best, “Heuristics are no substitute for careful and rigorous study – in other words, expertise”.

Ma agreed that one strength of computers is that they check bias, but that the models must not only be appropriate, but also stable over time. Morey gave an example of this, where the inability of existing models to assess the potential of ivy league players was a contributing factor in why Jeremy Lin was overlooked. To this, Silver pointed out that using analogies from other systems may help identify if there important characteristics from other environments that work.

So, how does one create a culture that appreciates process over output? Ma argues that the big shift coming is that people who hate numbers will begin to embrace the analytics approach. In this respect, effective communication is paramount. Morey’s position with the Rockets serves as a good example of the challenges that lay ahead. Even though the moves made are consistent with their analytics approach, the team has not yet won a title. The owners have continued to hold a firm belief that sound process will lead to successful outcomes.

Visual Tracking

Another theme this weekend was the rising interest in visual tracking data. For instance, Sportvision is a group that has developed it’s PitchF/X system to track pitches in 3D. The baseball tracking system can be used to analyze how effective pitchers are against hitters (see Mariano Rivera video below). The technology holds much promise as teams may track not only the flight trajectory of balls, but also the movement of body parts to improve injury prevention by picking up small changes in arm slot or delivery that my not be so obvious to the eye.

Another company, SportsVu, has adapted missile tracking technologies to track the motion of players, referees and the ball on NBA courts. The system, which consists of six cameras tracking motion at 25Hz, is currently installed in about half of the arenas in the NBA. These new spatial approaches have the potential to fundamentally change the scope of what analytics can provide. The most significant bottleneck appears to be that the expertise to understand the large amount of data produced is limited to a select pool of experts, which brings us to our next topic – Big Data.

Big Data

Another recurring theme was the need to deal with big data. The large amounts of unstructured data being generated for every player in every game will only grow as visual tracking systems become more commonplace. The question is whether there is a lot of Big Data being collected that provides very little robust insight. Teams are still struggling with the ability of systems to allow people to ask the right questions to get the information on which to make management decisions. On this front, there were many commercial demonstrations from big players that embed tools to manipulate large data sets. Since this isn’t an advertisement for those companies, suffice to say there are many tools available that come at a cost.

Communication challenges

Another challenge touched on was the need to communicate findings to decision-makers in some form of actionable information. For instance, SportsVu can produce data for teams in 60 seconds, but can that data be communicated in a manner that it is useful for making in-game decisions? The only real message here was that the onus lies on the communicator. Clear ideas, actionable information and effective visuals should be the goal for any analytics.

The disconnect between clear communication and science literacy was on full display in the Data visualization session, Marten Wattenberg of Google emphasized the need to indicate the relative error on graphs, a simple point from any science 101 class. This was followed up on the very same stage by Ben Fry of Fathom who presented a bubble-plot of Wonderlic scores for the different positions in football. The graph was clean and showed differences in scores among the different positions. However, it lacked any measure of variance, leaving the reader with no sense of whether the differences are actually meaningful. Remember, Signal and the noise!

Where were the Seattle teams?

Sadly, there were no representatives from the Seattle Seahawks, Seattle Mariners or Seattle Sounders at the conference. Maybe the Seattle statheads were too busy analyzing the effect of the proposed Sonics arena on local traffic patterns. Not surprisingly, the Patriots and 49ers had the largest NFL contingents. The Canucks and Oilers were both there, but nobody from the Calgary Flames.


Share Button