Hypothesis

24 Matching Annotations

Apr 2021
nbviewer.jupyter.org nbviewer.jupyter.org

Notebook on nbviewer

7
1. thedzo 27 Apr 2021
  
  in Public
  
  Correlation matrix
  
  Maybe not the best dataset for the correlation matrix, but this is a must for the initial analysis of any dataset. Getting basic info, descriptive statistics and distribution of each column and correlation matrix.
2. thedzo 27 Apr 2021
  
  in Public
  
  Median of point duration is about 40 seconds with three longest points around 5 minute player in the middle of the game. One fourth of all points have ended up with less then 25 seconds and over one fourth of all points lasted over 50s. With match duration of nearly 5 hours without a doubt it is one of the longest, if not the longest, Wimbledon finals
  
  great sum up
3. thedzo 27 Apr 2021
  
  in Public
  
  maybe avoiding mistakes
  
  yup, they tend to risk less, so you go for the central path, good notice
4. thedzo 27 Apr 2021
  
  in Public
  
  First and second service direction
  
  Thumbs up for this analysis. It's not something that's just sitting in a column as an integer, so needs some additional work, great job :)
5. thedzo 27 Apr 2021
  
  in Public
  
  result of this set (1-6) didn't actually mean that Đoković "gave
  
  And a great conclusion. It's also important to have this kind of insight because we can maybe exclude it from some specific analysis as an outlier set.
6. thedzo 27 Apr 2021
  
  in Public
  
  There is not NaNs in dataset
  
  There's a fantastic example on how much you can achieve only with a single column, that majority of people would just neglect and disregard. If you're acquainted with the titanic dataset and the ML prediction tasks, check this notebook, it's a jewel :) https://www.kaggle.com/ccastleberry/titanic-cabin-features/notebook
7. thedzo 15 Apr 2021
  
  in Public
  
  plt.show()
  
  Great, simple chart, giving a temporal dynamics. Some further idea could be more features involved (winners, rallies, forced errors etc. maybe they can be combined into a single metric etc.)
Visit annotations in context

Annotators

thedzo

URL

nbviewer.jupyter.org/github/m-grbic/Tennis-And-Data-Science-Foundley-Project/blob/main/Tennis.ipynb
nbviewer.jupyter.org nbviewer.jupyter.org

Notebook on nbviewer

10
1. thedzo 27 Apr 2021
  
  in Public
  
  Federer had 250% more aces than Nole
  
  Nole like things the hard way :) typical Serb :D But it's a good match to represent the "importance" or "relativity" of the statistics.
2. thedzo 27 Apr 2021
  
  in Public
  
  Let's say that the
  
  I like how you handled the outliers. Made reasonable assumptions, documented them and moved on with the analysis.
3. thedzo 27 Apr 2021
  
  in Public
  
  detected_ballhit_count
  
  good observation, also 'detected_ballhit_count' is the output of the 'ball_hit' algorithm so it's probably on the account of the algorithm accuracy. Good check would be to use inferred ball hits from the 'point_description' column.
4. thedzo 27 Apr 2021
  
  in Public
  
  Seems
  
  great remark
5. thedzo 27 Apr 2021
  
  in Public
  
  Result analysis
  
  great commenting, small remark that it would've probably been easier that the result is on a single plot. In addition, you could emphasize the difference in absolute points or percentage-wise
6. thedzo 27 Apr 2021
  
  in Public
  
  Federer has Nole beat in every single category
  
  yup, it's considered one of "those" matches that in regular statistics you would assume that things ended up differently.
7. thedzo 27 Apr 2021
  
  in Public
  
  Reading csv
  
  Concerning the organizational part, there's a great nbextension for generating automatic table of contents, so check it out. https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/toc2/README.html
  
  And a a collection of community-contributed unofficial extensions: https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/
8. thedzo 27 Apr 2021
  
  in Public
  
  A little about this project
  
  nice intro :)
9. thedzo 27 Apr 2021
  
  in Public
  
  np.linspace
  
  Bins are always a good checkpoint for discussions, so here it goes :) Like for any kind of histogram we underepresent the data in order to have a "meaningful" plot and depending of the bins the "story" can go one way or the other. So it's cool to use some specific bin size to represent something, but always a good idea to check that the data "behaves' the same when the bin is smaller. Here in specific, because the data is not that "linear", it would probably be better to distribute them into equal bins (containing roughly the same amount of points) or go with the "domain knowledge" and use that (e.g. short rallies 1-4, medium 4-9, and long >9).
10. thedzo 27 Apr 2021
  
  in Public
  
  Hm, at first, it seems that Federer had two matchpoints. But, we can't be certain, it could just be the same ongoing point. Let's see the indexes of these matchpoints.
  
  Just to emphasize the things already discussed, it's nice to have frequent comments. Great storytelling and thinking out loud.
Visit annotations in context

Annotators

thedzo

URL

nbviewer.jupyter.org/github/PinkFrojdSenjak/Exploratary-Data-Analysis---Tennis-Match-Dataset/blob/main/main.ipynb
nbviewer.jupyter.org nbviewer.jupyter.org

Notebook on nbviewer

4
1. thedzo 27 Apr 2021
  
  in Public
  
  My guess
  
  Great, that you've picked up the story part. Also, it's nice to see that you can add comments that are facts, like conclusion and also some additional thoughts (my guess). So you're getting 2 things. One, it's much easier to go through a notebook, but also document your own line of thinking. It'll be much easier when/if you return to continue with the project later on.
2. thedzo 27 Apr 2021
  
  in Public
  
  Table of Contents
  
  There's a great nbextension for this part, so check it out. https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/toc2/README.html
  
  And a a collection of community-contributed unofficial extensions: https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/
3. thedzo 27 Apr 2021
  
  in Public
  
  plt.show()
  
  Nice set of plots, would be cool to include 'unforced errors' (but this is relevant from domain knowledge point of view) It's also interesting when you put it into perspective who won which set (it can be an added info to the plot).
  
  storytelling part: e.g. It's also interesting that despite the statistics (provided here) in the 5th set, Djokovic still managed to take the victory
4. thedzo 27 Apr 2021
  
  in Public
  
  plt.show()
  
  Not the same format of end data, but just wanted to give you a glimpse of power of using groupby. It also needs some time to get accustomed to the groupby logic (e.g. writing sql queries) like for vectorization as we discussed.
  
  data.groupby(['p1_sets','p2_sets'])['point_duration'].sum().values/60
  
  a more elaborate read: https://pandas.pydata.org/docs/user_guide/groupby.html
Visit annotations in context

Annotators

thedzo

URL

nbviewer.jupyter.org/github/Brankonymous/Tennis-and-Data-Science-Foundley-Project-2021/blob/main/Tennis and Data Science Final.ipynb
nbviewer.jupyter.org nbviewer.jupyter.org

Notebook on nbviewer

3
1. thedzo 15 Apr 2021
  
  in Public
  
  plt.show()
  
  It's great flow how you elaborated from services to winners and errors, a few comments/observations on the charts and it's full house.
2. thedzo 15 Apr 2021
  
  in Public
  
  plt.show()
  
  Nice set of plots, would be cool to include 'unforced errors' (but this is relevant from domain knowledge point of view) It's also interesting when you put it into perspective who won which set (it can be an added info to the plot).
  
  storytelling part: e.g. It's also interesting that despite the statistics (provided here) in the 5th set, Djokovic still managed to take the victory
3. thedzo 15 Apr 2021
  
  in Public
  
  Time elapsed for each set
  
  Not the same format of end data, but just wanted to give you a glimpse of power of using groupby. It also needs some time to get accustomed to the groupby logic (e.g. writing sql queries) like for vectorization as we discussed.
  
  data.groupby(['p1_sets','p2_sets'])['point_duration'].sum().values/60
  
  a more elaborate read: https://pandas.pydata.org/docs/user_guide/groupby.html
Visit annotations in context

Annotators

thedzo

URL

nbviewer.jupyter.org/github/Brankonymous/Tennis-and-Data-Science-Foundley-Project-2021/blob/HEAD/Tennis and Data Science checkpoint 2.ipynb

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL