24 Matching Annotations
  1. Apr 2021
    1. Correlation matrix

      Maybe not the best dataset for the correlation matrix, but this is a must for the initial analysis of any dataset. Getting basic info, descriptive statistics and distribution of each column and correlation matrix.

    2. Median of point duration is about 40 seconds with three longest points around 5 minute player in the middle of the game. One fourth of all points have ended up with less then 25 seconds and over one fourth of all points lasted over 50s. With match duration of nearly 5 hours without a doubt it is one of the longest, if not the longest, Wimbledon finals

      great sum up

    3. First and second service direction

      Thumbs up for this analysis. It's not something that's just sitting in a column as an integer, so needs some additional work, great job :)

    4. result of this set (1-6) didn't actually mean that Đoković "gave

      And a great conclusion. It's also important to have this kind of insight because we can maybe exclude it from some specific analysis as an outlier set.

    5. plt.show()

      Great, simple chart, giving a temporal dynamics. Some further idea could be more features involved (winners, rallies, forced errors etc. maybe they can be combined into a single metric etc.)

    1. Federer had 250% more aces than Nole

      Nole like things the hard way :) typical Serb :D But it's a good match to represent the "importance" or "relativity" of the statistics.

    2. detected_ballhit_count

      good observation, also 'detected_ballhit_count' is the output of the 'ball_hit' algorithm so it's probably on the account of the algorithm accuracy. Good check would be to use inferred ball hits from the 'point_description' column.

    3. Result analysis

      great commenting, small remark that it would've probably been easier that the result is on a single plot. In addition, you could emphasize the difference in absolute points or percentage-wise

    4. Federer has Nole beat in every single category

      yup, it's considered one of "those" matches that in regular statistics you would assume that things ended up differently.

    5. np.linspace

      Bins are always a good checkpoint for discussions, so here it goes :) Like for any kind of histogram we underepresent the data in order to have a "meaningful" plot and depending of the bins the "story" can go one way or the other. So it's cool to use some specific bin size to represent something, but always a good idea to check that the data "behaves' the same when the bin is smaller. Here in specific, because the data is not that "linear", it would probably be better to distribute them into equal bins (containing roughly the same amount of points) or go with the "domain knowledge" and use that (e.g. short rallies 1-4, medium 4-9, and long >9).

    6. Hm, at first, it seems that Federer had two matchpoints. But, we can't be certain, it could just be the same ongoing point. Let's see the indexes of these matchpoints.

      Just to emphasize the things already discussed, it's nice to have frequent comments. Great storytelling and thinking out loud.

    1. My guess

      Great, that you've picked up the story part. Also, it's nice to see that you can add comments that are facts, like conclusion and also some additional thoughts (my guess). So you're getting 2 things. One, it's much easier to go through a notebook, but also document your own line of thinking. It'll be much easier when/if you return to continue with the project later on.

    2. plt.show()

      Nice set of plots, would be cool to include 'unforced errors' (but this is relevant from domain knowledge point of view) It's also interesting when you put it into perspective who won which set (it can be an added info to the plot).

      storytelling part: e.g. It's also interesting that despite the statistics (provided here) in the 5th set, Djokovic still managed to take the victory

    1. plt.show()

      It's great flow how you elaborated from services to winners and errors, a few comments/observations on the charts and it's full house.

    2. plt.show()

      Nice set of plots, would be cool to include 'unforced errors' (but this is relevant from domain knowledge point of view) It's also interesting when you put it into perspective who won which set (it can be an added info to the plot).

      storytelling part: e.g. It's also interesting that despite the statistics (provided here) in the 5th set, Djokovic still managed to take the victory