Monday, August 25, 2014

Combining Scouting And Stats

As hockey fans know, it’s been a big summer for analytics.  While some NHL teams have been using advanced stats behind closed doors for years, the scene was given greater exposure this off-season thanks to organizations plucking various bloggers from the internet and moving them into front offices around the league.

For the past 18 months or so I’ve taken an interest in the topic, looking to learn as much as I could in an effort to see the game from an another angle and expand my knowledge base.

Over that time one thing has become very clear to me.  Statistics are a great tool to have, but they are not the perfect solution.  The game of hockey is simply too fast, too random and too complex when compared to baseball, for example, to really be able to put too much stock purely into statistics (at least for the time being). 

With that in mind, it occurred to me that it would be ideal to come up with a metric to combine the statistical side of the game with the scouting part of the game.  This would end up being a number that would roll a player’s skillset and in-game production into one comfortable package.  If done thoroughly across the board, it can give you a ranking player-by-player and team-by-team to show you which players are the best combinations of production and style of play. 

After fiddling around some, I came up with something that accomplishes that at a base-level.  Just a fair warning, some parts of this won’t be explained in explicit detail.  I don’t quite feel like giving away the secrets to this formula online, free for others to use in the exact same way as I will be doing.  Hopefully you can understand my stance with that.

So the first thing I needed to do was convert a scouting report into a numerical “Player Rating”.  This player rating is a metric out of 100 that outlines a player’s skillset, similar to the way a player is listed in an NHL video game.  This number includes 5 main “scouted” traits, each scored on a scale out of 20.  These 5 categories are then added together to give you an overall score, or “PR”, out of 100.   I won’t go into detail into which 5 player traits or categories I have selected for my formula, given the circumstances.  These numbers and metrics shouldn’t get plucked out of mid-air, either.  They need to come after watching live games and filing reports, analyzing the strengths and weaknesses of the players at hand. 

The second part of the equation is where things can (if you choose to) get more complicated.  I won’t go into very much detail with this part of the equation in regards to what I have used to generate my “Stat Rating” or “SR”.  I said that you can get more complicated with this part of an equation, but when creating the basis of this model I was simply looking to apply it to junior hockey and/or prospects given the nature of my work as a scout.  Unfortunately, the statistics available for players at the junior level are quite basic.  Games played, wins, losses, goals, assists, points, plus-minus, but no shots on goal and definitely no minutes played, Corsi or Fenwick. 

This created a bit of an issue.  But, despite that challenge I am comfortable with the rating system devised here to capture, in essence, a player’s ability to contribute to his team’s success. 

So after coming up with my formula, I decided to give it a test on some well known junior hockey players.  My first tests of it showed me that players like Nikita Scherbak (a 1st round pick in 2014 by Montreal) and Nic Petan (a 2nd round pick in 2013 by Winnipeg) scored around the 95 overall mark.  Meanwhile, players with less skill and production ranged from the 30’s, 40’s or 50’s up into the 60’s, 70’s or 80’s. 

That’s a big range, but to me it sums up junior hockey pretty well.  You have a handful of elite players who light it up, get drafted, signed and jump to pro hockey.  You also have your mid-range guys who may have minor pro careers.  You have your guys who will top out as CIS players.  Then you have your group of guys who are in the league and happily earning money to use toward their education later down the road.  It’s not a simple 0 to 100 scale, as with my formula the best players in the given league will be able to crack the 100 overall mark.

As an example of how some of these early results have ended up looking, I took the time to breakdown the 2013-14 Saskatoon Blades.  As you can see below, I list the player’s “Player Rating”, “Stat Rating” and a category called “NR”.  The NR is essentially the player’s final and overall grade.  A metric of how they stack up against other players on their team and in the league when comparing their combination of on-ice skill and on-ice production.  Please note that the player’s “Stat Rating” only consists of regular season production, leaving out playoff or Memorial Cup results.

 I watched every single one of the Blades games last year and used those viewings to rate the players on my scale.  Based on that knowledge of how the team performed on the ice, I would say the above data does a great job breaking down the players in regards to their skill level, production and importance to the team. 

As you can see from the data provided, it becomes clear just how much better Nikita Scherbak was than his teammates last season.  His basic stats back that up, as do his independently-tracked analytics from his rookie season.  Hat-tip to Bruce Peter for his work and insight in that area. 

So what can this information be used for exactly?  I think it could potentially be a great tool to analyze junior hockey.  It could be used by CHL teams to track the quality of each player on opposing rosters, as long as you have up-to-date and consistent scouting reports of those players.  It could also be used by CHL teams when looking at their own bantam or first-year player drafts.  The model I’ve created can definitely be transferred to bantam players for the WHL Draft, for example, as long as you have access to those player’s basic stats.  I also think it has some practicality for NHL teams who are looking to draft said CHL players.

So this all seems like a fun concept and a potentially valuable one, but what about the issues it comes with?

- Originality? I have to say that while I’m sure this isn’t a 100% original idea (I’m sure there are pro sports teams who have their own formulas for rating players), I haven’t seen it talked about as much as you might think given the rise of analytics and the always-present knowledge of player ratings for video games. 

- The biggest potential issue I could foresee with this type of a concept is the fact that the scouting side of the equation carries some major variables.  Even if you have one scout, assistant GM or GM doing the number scale on all players, you will still have some slight variance from player to player and team to team.  That’s just the nature of the beast and why many teams look to do their best at avoiding a number system when rating players for a draft.  An 8 out of 10 for one scout is different than an 8 out of 10 for another, as an example.

- The number-based scouting reports also play a hand in another issue.  In order to track a players growth, you would need to have an updated “Player Rating” for each year of the players development.  For example, it would be ideal to be able to track a 16 year old’s rookie year in comparison to his draft season as a 17 year old, rather than just having a career figure represented.  Once again, to avoid this problem you would need to update each skaters “Player Rating” for each year they are in the given league.  This type of hurdle might be avoidable for an NHL team, but not so much for a junior team.

- Along with that, to REALLY see if this type of system works it would be nice to be able to look back at past NHL Drafts to judge if the system helped predict future NHL success.  You could do a rough example by reading old reports, but it’s not quite the same as doing it “in the moment” as I will be able to do from now on forward, if I choose to do so.

- The equation will likely tend to favour gifted offensive players.  Based on my formula, the more offensively productive a player is, the higher their NR will be.  Yes, if a player is a great scorer and gets bad Player Rating numbers, it will counteract that to some degree, but not to a 100% perfect level.  In the end this might not be the worst thing, as good offensive production is a pretty telltale sign of a productive pro.  This piece from earlier in the summer proves that, as Sham Sharron out-picked the Vancouver Canucks scouting staff when basing his selections only on a player’s offensive production.

- The equation could be beefed up immensely with more stats.  I did some rough examples of the formula on NHL players using more advanced stats and it really helped give a well-rounded picture of the player.  Adding Corsi to the mix could be a great benefit to this model and I will look to do a piece on that sometime in the future.

- The Player Rating scales are only viable when comparing players who are playing in the same league.  For example an elite skater at the junior level might not be able to step into the NHL and be considered an elite skater.  So, while Nikita Scherbak might be a 91 overall, that doesn’t mean he’s the same quality of player as Jamie Benn, who would likely be in that range on the NHL scale of things.  Apples to oranges.

- While computing junior hockey players SR’s, you only have so much data to use.  At most, these players can play in the league for 5 seasons so the numbers you can draw from are limited.  Along with that, most players don’t immediately step into the league and produce as rookies.  That obviously means that a player’s SR should change for the positive as they get more games and production under their belt.  This would be a great thing when ranking a team’s depth chart to show who is the most valuable, but does add some challenge when using this technique to try and find prospects for an NHL Draft.

- The final issue that immediately became apparent is the difference between forwards and defencemen in this formula (it’s for skaters only, no goalies).  Given the nature of the position’s and offence produced while playing them, the defenders numbers are instantly lower than forwards.  It’s not the end of the world, as you can still compare the positions as separate entities.

Going forward, I hope to investigate this formula more.  I think there is some potential value with it, as long as it’s used the correct way and used consistently.  I think team’s are doing themselves a disservice if they don’t have a way to rank players league-wide on a numbers system.  Including stats and eventually advanced stats in that type of a system would only round out the process, give it more depth and essentially allow team’s and their hockey operations staff to make better decisions with their on-ice personnel.