A community resource for tennis research topics, ideas, and resources
About: I have more tennis research questions and ideas than I have time to pursue. (I suspect I'm not the only one.) People often ask me for suggestions for research topics. Maybe we can address both issues at once.
How this works: I'm going to share some of my notes, in the form of research topics and questions. If there is published work on the question, I may include a link or citation. Depending on how long the list gets, I may make some attempt to organize by category, maybe even splitting into multiple files. We'll see.
How you can help:
- Add a topic/question
- Add a link or citation to one of the existing topics or questions
- Research something on the list. (And if you do so, add a link to your work.)
If you want to add something, please submit a pull request. As long as your submission is on topic (tennis analytics), doesn't duplicate something already on the list, and isn't overtly self-promotional or obnoxious, I'll accept it.
(These are numbered for convenience, not to imply any kind of ranking.)
-
Are serve-and-volleys streakier than others?
-
How often do betting odds favor the lower-ranked player? Is it more frequent on clay? Is it rare enough that using higher ranking as a proxy for market favorite is acceptable?
-
Are breaks more frequent immediately after winning a set? (Jeff might have done this one.)
-
Which players have the greatest variation in serve speed? (Among first serves, or between first and second serves.) Does it have any effect on, say, the success of second serves?
-
Does winning a very long game (e.g. a game with 6+ deuces) influence the outcome of the next game, or represent some more general momentum shift?
-
Can Elo ratings be improved to take surface into account?
-
Can Elo ratings be improved to take missed time (due to injury, etc) into account?
-
How much does Elo inflation affect ATP/WTA ratings? Can this be addressed for better era comparisons?
-
How to integrate challenger (and other lower-level) results into Elo ratings?
-
Do big servers (Isner/Karlovic) win more return points in tiebreaks (or when close in sets) than otherwise? Might be a proxy for whether they're tanking return games.
-
Any way to use MCP data -- rally length, shot direction -- to come up with approximate distance run, by combining MCP data with distance run when distance is available?
-
Who are the best "frontrunners?" Win% when up a break ... when up a set.
-
Serve + 1 tactic -- who does it the most often? Success rates when attempting it (i.e. winners vs UFEs on third shot) ... relationship to other stats, more general success rate.
-
Are there players who get a lot of service winners (or unreturned serves in general) but not a lot aces? Or are aces a reliable proxy for unreturned serves in general?
-
When serving for set -- does failing to do so swing momentum in the other direction, esp if fail at 6-5 and then play tiebreak?
-
"life change effect" for late first-time slam winners (like Wawrinka) -- do they play worse than expected for some period after the slam win?
-
Can we get a more rounded perspective on surface speed using MCP data? (aces + service winners, serve+1, rally length)
-
Which players best disguise serve direction? (Maybe quantify by comparing ace % to serve speed; best disguises are players with more aces than expected at any given speed.)
-
What's the difference in success rate immediately after reaching tour level for older players vs younger players?
-
which players are better/worse on more volatile points?
-
effect of serve speeds -- slow serving doesn’t help, but how much does it hurt? is there a magic number for men? women? which players win the most service points despite slow serving, and vice versa? [see donald young, decent data from 2011 usopen]
-
Dominance, as measured by % of available ranking points, or % of ranking points of entire top 100. Could also be used to measure best #2, #3, or best top 2, top 5, etc.
-
Prob of comeback from one set down -- who is good/bad at coming back, good/bad at preventing it?
-
Probability of withdrawals/retirements--more likely for underdogs, etc?
-
Players who excel against higher-ranked opponents, less successful against lower-ranked opps.
-
Is it increasing difficult to hold deeper into a set? Into a match?
-
Are players really more likely to break after being broken themselves?
-
Are players using challenges wisely? (use leverage index on challenge situations)
-
Based on LEV, do double faults occur at crucial moments? Does this vary player to player?
-
Win prob effect of missing first serve
-
Hot hand, using point-by-point:
- aces
- double faults
- first serves
- points won
- points won in each court
- Deuce/ad court differences
- certain players with stronger differences?
- lefty/righty differences?
- ace / 1st sv% differences?
- AGING
- What is typical peak?
- Does peak vary with (a) player type (big servers vs. counterpunchers), (b) country of origin, (c) college experience / age of going pro
- If you break into top 100 (200? etc.) by age 18/19/20/21/etc., what is probability of reaching top 10/etc.?
- Of all players who reached top 10 (/5/1/etc.), how early/late did they break into top (200/100/50/etc.)
- Aging delayed for injury (less time on court)?
- Aging delayed for clay-courters?
- adjustments for height/weight? (may overlap with player type)
- Comparing scoring systems
- maximize average leverage index
- also maximize p(better player wins)
- consider time management aspect
- Prize money
- more/less predictive than ranking points?
- theory that players want to maximize points and/or prize money, means ranking points are consistently predictive despite quirks/changes in system?
-
Quantifying volatility in rankings -- particular levels (mid-100) where rankings are particularly meaningless/volatile?
-
Challenges [from carl email]
- Are players more likely to win the point after a successful challenge than they are for all other points in match with same serving situation?
- How about after unsuccessful. i.e. any carryover psychologically. I ask because I know of researchers looking at carryover effects from points in which a player wins with lucky net cords.
- If we have names of umps for each match, or court, maybe check if number of successful challenges is tied to particular umps (bad eyes, or hesitant to overrule) or courts (maybe some harder to see or call).
- Certain players less successful after first matchup with a given player?
- Or, is there an initial surge into the highest level, like Raonic or Dolgopolov?
- theory being, players get familiar with your game, they learn how to beat it
-
Age distribution at different tournaments, particularly Challenger vs ATP
-
Lefties
- prevalence over time
- most/least successful against them?
- tactics that are more/less successful?
- different aging patterns?
- more common player types or surface preferences among lefties?
- RH/LH doubles combos more successful?
-
Lefty tactics -- are they different than RH tactics? Do they play differently vs other lefties? In what ways is Nadal a typical lefty?
-
Clutch: using leverage index, see who does the best in high-leverage points
-
when 2 players meet, and one wins x% of points on serve, while opponent’s opponents generally win y% of service points, what is result? Jeff guesses not exactly (x+y)/2, and it depends if x or y is exactly the average for service-points won -- if it is, then the other should dominate the result. Can extend this to other stats -- ace percentage, double-fault rate, and, where available, winner and unforced-error rate. And maybe even if enough data, get more sophisticated -- does extent to which result falls between x and y depend on surface? On home court? On experience levels/age/ranking of both players? On how many times they’ve played?
-
which players choose tournaments most strategically, playing in ones with easiest fields relative to points/prize money on offer? (and relatedly, which tournaments are best to play in, in those terms)
-
relatedly which players optimize their wins most/least, for most bang for buck i.e. clustering wins at one tournament for more ranking points/prize money vs. a guy who always wins in first round, loses in second
-
What would be the most exciting format, if tennis were to revamp its score system (think world team tennis)? E.g. How would series of best of 9 supertiebreaks work instead of sets? You’d probably want a system that maximizes expected median leverage over points, or 25th percentile leverage; while also delivering the better player the win some high enough percentage of the time?
-
Relatedly, compare average leverage/volatility to other sports during average match
-
Tweak the doubles ranking to take into account singles ranking. how much does one affect the other? What is optimal use of singles ranking to make dubs ranking more predictive, and vice versa?
-
Separate the home-court advantage among courts that do and don’t have a challenge system
-
Which tourneys have the most upsets, or most surprising results?
-
Whether players get better at returning tough servers after the first or second time they play, or in second or third set
-
betting lines, what influences them (some sort of regression model on rankings and other factors), any wacky ones that stand out and might be signs of funny business, or perhaps just low volume
-
injury patterns, how they vary based on playing style, and who are biggest outliers
-
guys who have the biggest variation between sets (bageled in one set, bagels opponent in next)
-
Difficulty of winning masters 1000 vs. major
-
time per point, game, set, match on various surfaces
-
Does time per point decrease as careers go on (and guys shorten points to reduce stress on their bodies)
-
controlling for everything else, once past first round of tourney is it better to have had a bye or not? [not sure how to control for everything else, because top eight seeds usually are ones with byes. Maybe compare how No. 9-16 seeds to at IW and Miami vs. at other Masters?]
-
what's more predictive of success at majors, other tourneys on that surface; or other majors, regardless of surface; or most recent results
-
break rates in doubles w or w/o no-ad scoring
-
upset rates w/ no ad scoring, 10-pt tiebreak
-
players who fade during matches, or get better during matches, in terms of score difference over successive sets
-
which tournaments play most like clay, grass, how can we define each? particularly useful for categorizing various hard courts [what stats characterize them, in terms of length of points, ace rate, break rate]
-
does home court matter more in decisive sets (third in best of 3, fifth in best of 5)?
-
analyze tour schedule for average weather at outdoor locations -- is sked optimized to avoid extreme heat or cold and precipitation?
-
categorize players into different styles by length of rallies, ace rate, aced rate, break rate, broken rate, and perhaps height/weight. then see how various groups match up -- is there some predictive power, above and beyond your rankings and H2H, by using player type?
-
also based on categorizing players, is there a correlation between player type (or spot on aggressiveness spectrum) and consistency/inconsistency?
-
are there fewer upsets among men in general than women? Perhaps rankings matter less for women (i.e. Serena)?
-
percentage of sets that are 6-1 that have 2 breaks vs. 3, and same for 6-3 sets (1 break or 2)
-
how close are games after a player goes up 1 or 2 breaks in a set than they were earlier in set? do players kinda tank the set, particularly if it doesn’t mean they lose the match
-
who clusters return points most/least, is there overall clustering beyond what you’d expect from randomness (which would suggest returners are saving energy when down 30-0 or 40-0, or that servers are streaky with their serves)?
-
How common are breaks at love, who has the most such breaks and the most times broken at love?
-
deciding points percentages in doubles -- how often does return team win, and how does that compare to the same teams’ success overall in return games
-
return points won for each player in doubles team, how that corresponds to who takes deciding pts -- is it the guy who usually wins more return point who returns, the one who is winning more that day, or is it random or qualitative who is chosen?
-
hold percentages for first vs. second server in doubles
-
mixed doubles: men/women hold percentages
-
percentage of time server is broken when reaches deuce, man vs. woman, compare different players
-
is there more (or less) than expected clustering of aces, double faults, winners, unforced errors, points won during matches
-
hold percentages with new balls, with oldest balls
-
how often after saving break points a player breaks in next game
-
aces, double faults on break point, set point, match point, etc., do they differ from rates in rest of match for same match
-
percentage of match points won on serve, on return -- and percentage of time loser held match points, and that holder of match points loses (can derive second from first)
-
how often servers are broken in deciding game of set, is it more or less often than would be expected, who is best/worst at breaking or avoiding being broken in that game
-
do players hit more or fewer unforced errors/winners when have to hold serve to stay in set?
-
streakiest and steadiest players, game to game and set to set
-
turning points in matches -- Is it possibly to define quantitatively such moments, and then does anything characterize them or are they random
-
relatedly, do rain delays or injury delays or other sorts of delays look like breaking points/turning points or are any differences between parts of matches before and after such delays about what you’d expect
-
effect of return errors/winners on subsequent serves, do they lead to more aces, dfs
-
are consecutive points/games won at about the rate you’d expect by chance (simulate matches then see if actual ones are more/less random)
-
ace/double fault rates at various scores
-
fed is said to have the fewest unreturned returns. is that true? who does?
-
do bk pt conversions on 30-40 vary depending on how score got there (was it 0-40 then 30-40, or 30-0 then 30-40)
-
are service games closer later in set
-
does winner/ufe percentage stay steady between sets
-
are there more breaks later in matches -- are players starting to time serves?
-
shots per unforced error/winner/forced error, how that varies per player
-
winning percentage from various spots on court, various speed shots
-
how certain players do after certain number of shots in rally
-
percentage of balls in net, men vs. women (and net clearance)
-
for various players, percentage of points lost from winners, vs. errors into net, long, wide
-
is there bias in results of challenges for certain players or types of players, like big servers? i.e. do big servers have in serves called out more than others?
-
advantages for players after beneficiary of luck -- hit line, or lucky net cord? do such players lose next point because of unconscious guilt? These guys found that happened to lucky winners of points http://www.sciencedirect.com/science/article/pii/S0022103110001241
-
which kinds of shots force errors, and are winners
-
can wind be guesstimated from data for a match?
-
speed of serve, shots vs. success -- how does ue/w/pt won rate vary by speed of shot?
-
success at net when brought to net by opponent’s short ball vs. choosing to come to net
-
effect of return errors on subsequent serves, in terms of speed, placement
-
challenges midpoint -- interrupting a point you’re still in: how often successful
-
winning percentage on second serves after unsuccessful challenge on out call on first serves
-
whether players are good at game theory, mixing up shots optimally based on how successful different ones are (like penalty kickers in soccer: http://www.slate.com/id/2144182/
-
what percentage of serves are aces/unreturned/won in 3 shots/4, etc. how this varies by player, surface, and stage of match
-
how common are service-return winners, how these vary by player, surface, and stage of match
-
are there more overheads hit in first set of 5 setter vs. fifth set, are they more or less successful
-
the notion that fewer breaks occur during the first few games in a match;
-
do players change their approach after a double fault, taking something off their serve to get the next first serve in and do the receivers attack those points differently;
-
is the same true at break point
-
quantify the impact of the new atp 250 qualification draw rules (draw size reduced from 32 to 16, only two rounds need to be won, still 4 qualifiers into main draw) is it harder/easier now for youngsters to jump to atp level via qualies (see thiem 2014, kokkinakis 2015)? how often did players enter a quali-draw pre-2016 which would now not be able to compete (due to smaller draw size)? how often did these players successfully qualify? how did they perform in the main draw (compared to the other qualifiers, which would have also entered the 16-draw size quali draws)?