TidyTuesday, a data science and R learning community, recently published my beach volleyball match database. I wanted to take the time to share some of my favorite community made use cases.
1. Ricardo Santos is the Most Traveled Player
Ricardo Santos traveled to 14 countries in 2005 alone, being one of the most traveled players on the tour! Shows some amazing charisma going from tournament to tournament.
2. Brazilian Teams are the Most Dominant
Brazilians had the most wins on the FIVB World Tour followed by Americans and Germans. Winning teams tend to be older and taller than losing teams. However, since 2000, the tour has become younger and taller overall. I really like the clarity of the conclusions drawn and clear graphs with flags.
Another post had a similar graph of Brazilian teams with the most wins.
Good catch for them to filter out the AVP tour (only American players) in their code as that could have biased their results.
3. Machine Learning and Beach Volleyball
Julia Silge’s blog has a great step-by-step guide on building an XGB model to predict winning teams based on their stats. It’s not surprising to see the model pick kills and errors as the most important variables as they directly translate to points scored.
I like the box and whisker plots comparing the distributions of stat features by gender and winning team. From the visual, it looks like blocks also have a big margin between winning and losing teams — something the model should pick up on.
A second Random Forest model included the hitting percentage (i.e. attacking efficiency) which turned out to be the most predictive feature in picking winners. Both models seem to get relatively good performance of ~83-84% accuracy with good recall and precision metrics. However, a major drawback to their collective approach is that they are using the stats that are recorded after a match has finished. To be able to truly forecast who will win a match beforehand, they would have to re-train the model on historical data that happens prior to each individual match. Nonetheless, their findings are helpful to see which skills are most relevant to winning a match.
4. Network Clusters of Beach Volleyball Players
This really cool visual shows the networks of player partnerships along with clusters of the highest and lowest win rates (circled). I was able to reverse engineer the code to see some really interesting insights. For example, the clusters highlight the highest winning Brazilian pairings on both the men’s and women’s sides! They include Emanuel Rego playing with Tande Ramos, Ricardo Santos, and Alison Cerutti. It also has Alison Cerutti playing with Bruno Oscar Schmidt and other Brazilian players in the highest winning cluster.
On the women’s side, the highest winning cluster has Larissa Franca playing with Talita Antunes and Juliana Felisberta along with other Brazilian pairings. Super cool!
5. How Height Influences Blocking
This clean visual shows the unsurprising correlation between a player’s height and blocks. It’s a cleanly executed, straight-forward graph that communicates the message well.
A similar visual shows that hitting percentage does not appear to decline with age.
Check out some of the resources below to help you get started with the data:
- Data dictionary and some starter code to download the data, although you should download it directly from my Github repo to get the latest matches
- David Robinson’s guide to reshaping the data to do useful analysis
Thanks to the TidyTuesday community for taking the time to share their work with everyone.
Edited by: Chase Youngblood
One thought on “The 5 BEST Beach Volleyball Data Analyses from a Tidy Tuesday Hackathon”