Great (and Not-So-Great) Expectations
Posted by Neil Paine on October 31, 2008
The 2008-09 NBA season is finally getting started, and just about every fan and media member is concerned with what we can expect from each team. So here's a question: from a strictly statistical perspective, which teams in NBA history most exceeded their preseason expectations? By the same token, which teams were the most disappointing in light of their expectations?
Well, first of all, we're going to have to come up with a formula to define "expectations" for every team. Logically, you'd think that past W-L records would be the place to start, so I created 5 linear regression models, regressing a team's winning percentage for the year in question (year Y) on the previous 5, 4, 3, 2, & 1 years (Y-5, Y-4, etc.). The results weren't surprising: the only year that really matters in terms of establishing expectations is Y-1, the season immediately prior to the one we're trying to predict. In other words, for every 2002 Lakers and 1998 Spurs — where Y-1 didn't accurately reflect the true abilities of a team as well as Y-2 did — there are countless other examples of teams whose expectations were best set by simply looking at the previous year's record.
As a matter of fact, we can even do one better than simply looking at W-L records. A common piece of APBRmetrics wisdom is that point differential is actually more informative than winning % when assessing a team's strength. And here at Basketball-Reference, we happen to have the ultimate incarnation of point differential: the Simple Rating System, which also adjusts for strength of schedule. Sure enough, when we use the SRS to run the same series of regressions that we did with winning percentage, we find that Y-1 is again the only season that is significant in establishing our win expectations in year Y — and we also see that the SRS is a slightly better predictor of future performance than winning percentage (R-squared of 0.44 vs. 0.42). Armed with that information, we can use the following equation to create our "expected wins" for any given season:
Win%_Y = 41 + (1.88 * SRS_Y-1)
So, according to our very simple method for establishing preseason expectations, which teams were the biggest surprises of all-time?
Year Team SRS_Y-1 xWins Wins Diff 2008 BOS -3.706 34.0 66.0 32.0 1998 SAS -7.926 26.1 56.0 29.9 1990 SAS -7.450 27.0 56.0 29.0 1980 BOS -4.775 32.0 61.0 29.0 2005 PHO -2.941 35.5 62.0 26.5 1970 MIL -5.067 31.5 56.0 24.5 1989 PHO -4.801 32.0 55.0 23.0 1996 CHI 4.311 49.1 72.0 22.9 1972 LAL 3.264 47.1 69.0 21.9 2000 LAL 2.675 46.0 67.0 21.0
What's the common thread here? Each of these teams either had key offseason additions, or they exploded for an historically great season (or both). In the first category, last year's Celtics obviously added Kevin Garnett, Ray Allen, and James Posey; the '98 Spurs added Tim Duncan & a healthy David Robinson; in 1990, San Antonio added Robinson, Terry Cummings, & Sean Elliott; the '80 Celtics added Larry Bird; the 2005 Suns added Steve Nash; the '70 Bucks added Kareem Abdul-Jabbar; and the '89 Suns added Tom Chambers & a full season of Kevin Johnson. The other 3 teams were already good in Y-1, but each peaked in year Y as one of the greatest teams in NBA history. Needless to say, rare outbursts like that are pretty hard to see coming.
And how about the most disappointing teams of all-time?
Year Team SRS_Y-1 xWins Wins Diff 1999 CHI 7.244 54.6 21.3 -33.3 1997 SAS 5.975 52.2 20.0 -32.2 1965 SFW 4.390 49.3 17.4 -31.8 1983 HOU -0.393 40.3 14.0 -26.3 2007 MEM 3.738 48.0 22.0 -26.0 1973 PHI -3.441 34.5 9.0 -25.5 1953 PHW -1.071 39.0 14.3 -24.7 1985 NYK 3.789 48.1 24.0 -24.1 1991 DEN 1.562 43.9 20.0 -23.9 2008 MIA -1.209 38.7 15.0 -23.7 (Note: All teams were pro-rated to an 82-game schedule.)
The causes of these collapses are more varied and complex: Michael Jordan et al's departure from the Bulls is easily the single biggest mass exodus of talent from any team in NBA history, while other franchises crumbled due to key injuries (Robinson's Spurs, Pau Gasol's Grizzlies), isolated personnel losses (Wilt Chamberlain leaving the Warriors), or just plain old attrition ('85 Knicks, '08 Heat). Catastrophes like these are, for the most part, easier to predict than the pleasant surprises that filled the previous section, assuming you have all the facts in hand -- it's always risky to guess how a player will fit within a new team's system, but it doesn't take a rocket scientist to see that a team is going to fall apart without its superstar player(s).
That said, what kind of expectations does the model set for the 2008-09 season? Here's what happens when we apply the equation to last year's SRS numbers:
Year Team SRS_Y-1 xWins 2009 BOS 9.307 58.5 2009 LAL 7.344 54.8 2009 UTA 6.867 53.9 2009 DET 6.671 53.5 2009 NOH 5.464 51.3 2009 PHO 5.138 50.7 2009 SAS 5.104 50.6 2009 HOU 4.835 50.1 2009 ORL 4.788 50.0 2009 DAL 4.702 49.8 2009 DEN 3.739 48.0 2009 TOR 2.469 45.6 2009 GSW 2.381 45.5 2009 PHI 0.188 41.4 2009 POR -0.520 40.0 2009 CLE -0.525 40.0 2009 WAS -0.605 39.9 2009 SAC -1.854 37.5 2009 IND -1.864 37.5 2009 ATL -2.228 36.8 2009 CHI -3.191 35.0 2009 CHA -4.484 32.6 2009 NJN -5.146 31.3 2009 MEM -5.752 30.2 2009 MIN -6.254 29.2 2009 NYK -6.543 28.7 2009 LAC -6.561 28.7 2009 MIL -6.912 28.0 2009 OKC -8.037 25.9 2009 MIA -8.530 25.0
Just eyeballing the list, it looks like Denver and at least one of the Phoenix/Dallas/San Antonio triad are good bets to underperform their expected win totals, while Miami, Cleveland, Philadelphia, and (especially) Houston could exceed these expectations.
As an aside, I wonder how these expected records would stand up in Erich Doerr's projection comparison? I have a hunch that they'd do surprisingly well, since most of the competitors in last year's APBRmetrics prediction challenge (including myself) had standard errors well above 9.0. Which just goes to show that you can have the most sophisticated projection system in the world, but there's a good chance it won't predict the standings any better than last year's Simple Ratings.
October 31st, 2008 at 11:06 am
If you do a regression with both y-1 wins and srs, do you still get an r^2 around .44 since they're so highly correlated?
October 31st, 2008 at 11:10 am
One variable to consider adding would be the previous year's win share weighted average age. That could be one number that might pick up the direction a team's headed in.
October 31st, 2008 at 12:48 pm
Yeah, that's exactly what happens. When I regress SRS_Y-1 and WAA_Y-1 on WAA_Y, the r-squared is still 0.44, because they're so correlated with each other (r-value between SRS and WAA is .96). Actually, in the multivariate regression, the WAA_Y-1 variable isn't even significant at the 5% level, which just reinforces SRS' predictive superiority over straight-up winning %.
October 31st, 2008 at 1:09 pm
That's a good idea, adding age as a variable, except I'd prefer to weight by minutes or possessions used (or both), to get at the size of a player's role on his team. I wouldn't use Win Shares, though, because they can be negative for really bad players even if they have big roles.
Let's take the 1992-93 Mavericks as an example, since they had several notably negative players by WS playing large roles... If you look at their average age when weighted by minutes played and possessions used, it's basically the same (24.9 by MP, 25.0 by Poss). But if you weight by Win Shares, their average age ends up being 31.2, because they had 7 players with negative WS, including 22-year-old Jim Jackson with the absolute worst single-season WS total of any NBA player for the years we're able to calculate the stat.
But I do think minute- or possession-weighted age could improve the model. Between 2 equally bad Y-1 teams, one old and one young, the younger team should obviously be expected to do better in year Y, and right now our simple model doesn't make that distinction.
October 31st, 2008 at 4:21 pm
If previous year is only year that "really matters in terms of establishing expectations" I wonder what you'd find if you weighted previous year performance by month in a way that gave somewhat greater weight to later months. This would help more closely capture trade impacts, rookie development, coach/team adjustments, etc. I know most recent month or game isn't necessarily more accurate for that reason alone but considering these factors I think it might be worth running some alternatives and see if you can improve the prediction even further. Not immediately sure exactly how much NBA weights conference and division games more to later months but could see that as a mitigating factor but perhaps one that could be adjusted for.
November 1st, 2008 at 10:40 am
I hadn't followed all the changes to win shares. I knew decimals were added, but I didn't know it could be negative now. If you just zero out the negative contributors, you probably get good results. Minutes and possessions would, of course, be reasonable too.
November 1st, 2008 at 1:49 pm
If previous year is only year that “really matters in terms of establishing expectations” would that suggest a change from
"Give the 2007-08 season a weight of 5, the 2006-07 season a weight of 4, and the 2005-06 season a weight of 3 and calculate the weighted sum of minutes played."
to
Give the 2007-08 season a weight of 5, the 2006-07 season a weight of 2-3, and the 2005-06 season a weight of 1-2?
November 1st, 2008 at 1:57 pm
Team regression doesn't necessarily imply a change for players but I wonder whether if these player season weights work better than Marcel's. It could be checked, if it hasn't already been.
November 2nd, 2008 at 12:46 am
I don't know exactly where Tango's Marcel weights come from (though I remember Bill James coming up with a similar weighting system for projections), but I do know that player performance is a lot more stable from year to year than team performance. While Y-1 might be the only significant variable in predicting teams, think about how much personnel turnover happens from year to year -- between trades, the draft, free agency, etc., a team's "true talent level" can drastically change over the course of 2+ years.
Players, on the other hand, largely retain their skillsets from year to year; they may improve certain aspects of their game or get older and decline, but, barring injury, "true talent" doesn't change anywhere near as quickly and dramatically at the individual level as it does for teams. That's why I think the Simple Projection System of weighting performance over multiple seasons are appropriate for individual players, even if Y-1 is the only past season that matters when creating expectations at the team level.
November 2nd, 2008 at 2:40 am
I shouldn't have tried the two together but I'd still wonder about whether a different weight set does better. If not as big a shift as I suggested maybe 5, 3.5, 2.5-3. A small difference but if the goal is nudging the overall result might be worth looking.
November 2nd, 2008 at 3:28 pm
"tied"
or perhaps some improvement could be obtained by discounting outlier games to some extent