Welcome back tennis fans… and no, we didn’t forget about your thirst for tennis and data analytics! Our new web app allows you to easily see a player’s ranking history, similar to the ATP site, but with an added ranking forecast, packaged as an interactive visualization not available anywhere else!*
Check out the app here (15 second load time**):
More info below.
What’s the intent behind this app?
The intent is to visualize the biggest monthly predicted rank movers in the ATP top 500. Since the model takes into account recent performance and rankings, you can see how a player’s trajectory will continue, stay the same, or reverse. By comparing the red (predicted ranking) vs. the blue (actual ranking) lines, you can tell whether a player has historically over or underperformed, as judged by their trend.
For example, Juan Martin del Potro was predicted to finish 6th in June 2018 but actually overperformed his forecast by finishing 4th (As of June 11th ATP rankings). He received a boost from his Semi-Finals finish at the 2018 French Open, a tournament where he only reached the Round of 32 last year.
By showing the historical trend, along with adjusting Advanced Options: Add Nearby Ranked Players, you can visualize how your favorite player compares to others historically.
Compared to players plus or minus 10 ranking spots to del Potro in June 2018 (rank 1 to rank 16), del Potro was the worst ranked player out of that cohort 24 months ago. This means that he made the biggest improvement (after coming back from an injury) to his ranking compared to others who were already top 70:
You can also use this functionality to compare similarly ranked players and their stats. For example, del Potro had a 1st Serve In % of around 68% in May 2018, compared to John Isner’s 72%, who leads the selected cohort.
What do we mean by live?
Live in this context means that predictions are automatically generated on demand using the newest data available. Our infrastructure now pulls new match data, rankings data, and player data every week after an ATP tournament is played. This new data is used to generate future ranking predictions 1 month into the future. This differs from our past forecasting techniques that needed to be updated by hand, as in our 2017 Indian Wells match forecasts.
Which month is being forecasted?
If the current month day is less than 15, then the current month is being forecasted using historical data. This means that even if new data is available, like a current month ranking file, it is not used for predictions. If the current month day is greater than or equal to 15, then the next month is being forecasted (this logic may be updated in the future).
Data is typically updated weekly the Tuesday after a tournament is played. Match data is attributed to the first day of the tournament. Although the French Open takes place between May 27 – June 10, all match results are attributed to May.
(you can sort the table by clicking on any column header)
Once Nadal won the French Open, the matches were attributed to his May stats. Prior to June 15th, the model is forecasting his June ranking and given his recent performance (winning the French Open), has him forecasted as #1:
There are 2 things that actually make me believe this was a good forecast:
- The model can’t physically know Roger Federer was skipping the clay season. It does have some data points about his performace last year (where he also skipped), but it’s indirect.
- Rafael Nadal won the French Open last year so he has to defend his ranking points. Any finish other than a victory will mean he will potentially drop in rankings.
Prior to the start of Wimbledon, Denis Kudla, an American player, was forecasted to move up 24 spots in the rankings in July. This forecast was based on making the Semi-Finals in Halle, his recent trend of improving his ranking every month as well as his historically good performance during the grass season. Unfortunately, he ran into the #17 seed, Lucas Pouille in the 1st Round at Wimbledon and lost in 4 sets.
Why do the rankings have decimals?
Rankings are typically published weekly, so if a player is ranked #1 for 1 week and #2 for 1 week during a month, the table shows the average, or 1.5.
Why do the points differ from the ATP site?
Same as with rankings, points are averaged for a given month.
Why does it take so long to load?
There is a lot of data being loaded to generate new predictions, as well as measuring the performance of the model. There are around 1,000 variable features that are generated for each player. This means that when predicting, for example, a player’s July 2018 ranking, the model uses data from June 2018 and goes backwards.
(An original version of the app took over 45 seconds to load. This has been improved to under 15 seconds)
How does the model work? How well does the model perform?
The machine learning model is trained to minimize the error when predicting a player’s future ranking points. Players are then ordered based on the predicted points and a ranking is assigned.
As of the June 11th rankings, the model, on average, is within 63 points for any given player. Since the ATP top 500 players range from 70 points to over 10,000 points, this performance is in-line with a solid model.
Other metrics are calculated, broken out across predicting points or ranking. They can be seen by clicking on the performance tab. % Correct Direction measures the percentage of time where the model correctly predicts whether a player will move up or down.
If you’ve read this far, no doubt you’ve noticed the coming soon sections of the app! Next up, we will be remaking our match forecasting model to predict future match results.
We’ll also work to enhance this model by adding more data points, such as separate win categories for the tournament level, to further improve the accuracy of the model.
Let us know if you have any feedback on what you’d like to see next.
* A search revealed a similar app, albeit without any visualizations and an unknown prediction scheme: https://live-tennis.eu/en/forecast-atp-ranking
** One of the goals of this blog is to see how far I can push free or open source software/resources such as R, Tableau Public, Amazon AWS, Google Drive, GitHub, WordPress, and Shiny to create useful data apps. Hence, the app is run on https://www.shinyapps.io free tier, which limits it to 25 hrs of up-time per month, limited CPU/memory usage, and a limited number of concurrent users. If the app is slow or non-responsive, check back later.