 Last 7 days

www.newscientist.com www.newscientist.com

Spiegelhalter. D. (2020). David Spiegelhalter: How to be a coronavirus statistics sleuth. New Scientist. Retrieved from: https://www.newscientist.com/article/mg24732954000davidspiegelhalterhowtobeacoronavirusstatisticssleuth/?utm_term=Autofeed&utm_campaign=echobox&utm_medium=social&utm_source=Twitter#Echobox=1597271080

 Dec 2020

www.metacritic.com www.metacritic.com

In addition, for music and movies, we also normalize the resulting scores (akin to "grading on a curve" in college), which prevents scores from clumping together.
Tags
Annotators
URL


twitter.com twitter.com

Stuaert Rtchie [@StuartJRitchie] (2020) This encapsulates the problem nicely. Sure, there’s a paper. But actually read it & what do you find? pvalues mostly juuuust under .05 (a red flag) and a sample size that’s FAR less than “25m”. If you think this is in any way compelling evidence, you’ve totally been sold a pup. Twitter. Retrieved from:https://twitter.com/StuartJRitchie/status/1305963050302877697


pubmed.ncbi.nlm.nih.gov pubmed.ncbi.nlm.nih.gov

Lakens. D. Etz.. A. J. (2020) Too True to be Bad: When Sets of Studies With Significant and Nonsignificant Findings Are Probably True. Pubmed. Retrieved from: https://pubmed.ncbi.nlm.nih.gov/29276574/


via3.hypothes.is via3.hypothes.is

Inferential statistics are the statistical procedures that are used to reach conclusions aboutassociations between variables. They differ from descriptive statistics in that they are explicitly designed to test hypotheses.
Descriptive statistics are used specifically to test hypotheses.

 Nov 2020

hypothes.is hypothes.is
 Oct 2020

seeingtheory.brown.edu seeingtheory.brown.edu

Kunin, D. (n.d.). Seeing Theory. Retrieved October 27, 2020, from http://seeingtheory.io


www.youtube.com www.youtube.com

David Spiegelhalter and False Positives. (2020, October 14). https://www.youtube.com/watch?v=XmiEzi54lBI&feature=youtu.be


www.bmj.com www.bmj.com

Smith, G. D., Blastland, M., & Munafò, M. (2020). Covid19’s known unknowns. BMJ, 371. https://doi.org/10.1136/bmj.m3979


twitter.com twitter.com

Dominique Heinke on Twitter. (n.d.). Twitter. Retrieved October 12, 2020, from https://twitter.com/Epi_D_Nique/status/1314753256556552192


www.inquirer.com www.inquirer.com

McCrystal, J. M., Oona GoodinSmith, Laura. (n.d.). 1 in 4 Philadelphians knows someone who has died of COVID19, and nearly half have lost jobs or wages, Pew study says. Https://Www.Inquirer.Com. Retrieved October 9, 2020, from https://www.inquirer.com/news/coronaviruscovid19pandemicphiladelphiaprotestsgeorgefloydcitykenneyresponsepewsurvey20201007.html


www.politico.com www.politico.com

CDC reverses course on testing for asymptomatic people who had Covid19 contact
Take Away
Transmission of viable SARSCoV2 RNA can occur even from an infected but asymptomatic individual. Some people never become symptomatic. That group usually becomes noninfectious after 14 days from initial infection. For persons displaying symptoms , the SARSCoV2 RNA can be detected for 1 to 2 days prior to symptomatology. (1)
The Claim
Asymptomatic people who had SARSCoV2 contact should be tested.
The Evidence
Yes, this is a reversal of August 2020 advice. What is the importance of asymptomatic testing?
Studies show that asymptomatic individuals have infected others prior to displaying symptoms. (1)
According to the CDC’s September 10th 2020 update approximately 40% of infected Americans are asymptomatic at time of testing. Those persons are still contagious and are estimated to have already transmitted the virus to some of their close contacts. (2)
In a report appearing in the July 2020 Journal of Medical Virology, 15.6% of SARSCoV2 positive patients in China are asymptomatic at time of testing. (3)
Asymptomatic infection also varies by age group as older persons often have more comorbidities causing them to be susceptible to displaying symptoms earlier. A larger percentage of children remain asymptomatic but are still able to transmit the virus to their contacts. (1) (3)
Transmission modes
Droplet transmission is the primary proven mode of transmission of the SARSCoV2 virus, although it is believed that touching a contaminated surface then touching mucous membranes, for example, the mouth and nose can also serve to transmit the virus. (1)
It is still unclear how big or small a dose of exposure to viable viral particles is needed for transmission; more research is needed to elucidate this. (1)
Citations
(1) https://www.who.int/news room/commentaries/detail/transmissionofsarscov2 implicationsforinfectionpreventionprecautions
(2) https://www.cdc.gov/coronavirus/2019 ncov/hcp/planningscenarios.html
(3) He J, Guo Y, Mao R, Zhang J. Proportion of asymptomatic coronavirus disease 2019: A systematic review and metaanalysis. J Med Virol. 2020;1– 11.https://doi.org/10.1002/jmv.26326

 Sep 2020

lockdownsceptics.org lockdownsceptics.org

The lowest value for false positive rate was 0.8%. Allow me to explain the impact of a false positive rate of 0.8% on Pillar 2. We return to our 10,000 people who’ve volunteered to get tested, and the expected ten with virus (0.1% prevalence or 1:1000) have been identified by the PCR test. But now we’ve to calculate how many false positives are to accompanying them. The shocking answer is 80. 80 is 0.8% of 10,000. That’s how many false positives you’d get every time you were to use a Pillar 2 test on a group of that size.
Take Away: The exact frequency of false positive test results for COVID19 is unknown. Real world data on COVID19 testing suggests that rigorous testing regimes likely produce fewer than 1 in 10,000 (<0.01%) false positives, orders of magnitude below the frequency proposed here.
The Claim: The reported numbers for new COVID19 cases are overblown due to a false positive rate of 0.8%
The Evidence: In this opinion article, the author correctly conveys the concern that for large testing strategies, case rates could become inflated if there is (a) a high false positive rate for the test and (b) there is a very low prevalence of the virus within the population. The false positive rate proposed by the author is 0.8%, based on the "lowest value" for similar tests given by a briefing to the UK's Scientific Advisory Group for Emergencies (1).
In fact, the briefing states that, based on another analysis, among false positive rates for 43 external quality assessments, the interquartile range for false positive rate was 0.84.0%. The actual lowest value for false positive rate from this study was 0% (2).
An upper limit for false positive rate can also be estimated from the number of tests conducted per confirmed COVID19 case. In countries with low infection rates that have conducted widespread testing, such as Vietnam and New Zealand, at multiple periods throughout the pandemic they have achieved over 10,000 tests per positive case (3). Even if every single positive was false, the false positive rate would be below 0.01%.
The prevalence of the virus within a population being tested can affect the positive predictive value of a test, which is the likelihood that a positive result is due to a true infection. The author here assumes the current prevalence of COVID19 in the UK is 1 in 1,000 and the expected rate of positive results is 0.1%. Data from the University of Oxford and the Global Change Data Lab show that the current (Sept. 22, 2020) share of daily COVID19 tests that are positive in the UK is around 1.7% (4). Therefore, based on real world data, the probability that a patient is positive for the test and does have the disease is 99.4%.
(2) https://www.medrxiv.org/content/10.1101/2020.04.26.20080911v3.full.pdf+html


robjhyndman.com robjhyndman.com

crossvalidation is sometimes not valid for time series models
What? Why? Does he mean kfold specifically?


psycnet.apa.org psycnet.apa.org

Harris, A. J. L., & Hahn, U. (2011). Unrealistic optimism about future life events: A cautionary note. Psychological Review, 118(1), 135–154. https://doi.org/10.1037/a0020997



Transport use during the coronavirus (COVID19) pandemic. (n.d.). GOV.UK. Retrieved September 18, 2020, from https://www.gov.uk/government/statistics/transportuseduringthecoronaviruscovid19pandemic


www.theguardian.com www.theguardian.com

Facts v feelings: How to stop our emotions misleading us. (2020, September 10). The Guardian. http://www.theguardian.com/science/2020/sep/10/factsvfeelingshowtostopemotionsmisleadingus


bobbywlindsey.com bobbywlindsey.com

H not
I'm sorry but this is kind of lazy from the author. Either write H0, \(H_0\) or H naught. H not sounds like you're saying H "not" (negation)


www.youtube.com www.youtube.com

Susan Athey, July 22, 2020. (2020, August 2). https://www.youtube.com/watch?v=hqTOPrUxDzM


maxkasy.github.io maxkasy.github.io

Kasy, M. (2020). How to run an adaptive field experiment. Retrieved from https://maxkasy.github.io/home/files/slides/adaptive_field_slides_kasy.pdf


github.com github.com

Viechtbauer, W. (2020). Wviechtb/forest_emojis [R]. https://github.com/wviechtb/forest_emojis (Original work published 2020)


metascience.com metascience.com

Steven Goodman: Statistical methods as social technologies versus analytic tools: Implications for metascience and research reform (Video). (n.d.). Metascience.com. Retrieved 2 September 2020, from https://metascience.com/events/metascience2019symposium/stevengoodmanstatisticalmethodsversusanalytictools/

 Aug 2020

academic.oup.com academic.oup.com

van Smeden, M., Lash, T. L., & Groenwold, R. H. H. (2020). Reflection on modern methods: Five myths about measurement error in epidemiological research. International Journal of Epidemiology, 49(1), 338–347. https://doi.org/10.1093/ije/dyz251



Karl Friston: Up to 80% not even susceptible to Covid19. (2020, June 4). UnHerd. https://unherd.com/2020/06/karlfristonupto80notevensusceptibletocovid19/


onlinelibrary.wiley.com onlinelibrary.wiley.com

Frias‐Navarro, D., Pascual‐Llobell, J., Pascual‐Soler, M., Perezgonzalez, J., & Berrios‐Riquelme, J. (n.d.). Replication crisis or an opportunity to improve scientific production? European Journal of Education, n/a(n/a). https://doi.org/10.1111/ejed.12417


www.journalofsurgicalresearch.com www.journalofsurgicalresearch.com

Althouse, A. D. (2020). Post Hoc Power: Not Empowering, Just Misleading. Journal of Surgical Research, 0(0). https://doi.org/10.1016/j.jss.2019.10.049


www.journalofsurgicalresearch.com www.journalofsurgicalresearch.com

Bababekov, Y. J., Hung, Y.C., Hsu, Y.T., Udelsman, B. V., Mueller, J. L., Lin, H.Y., Stapleton, S. M., & Chang, D. C. (2019). Is the Power Threshold of 0.8 Applicable to Surgical Science?—Empowering the Underpowered Study. Journal of Surgical Research, 241, 235–239. https://doi.org/10.1016/j.jss.2019.03.062


panopto.lshtm.ac.uk panopto.lshtm.ac.uk

CSM_seminar Causal Inference Isn't What You Think It Is. (2020). Retrieved 24 August 2020, from https://panopto.lshtm.ac.uk/Panopto/Pages/Viewer.aspx?id=ac88b49f7e63458d823eabe50152fb66


www.youtube.com www.youtube.comYouTube1

Communicating statistics, risks and uncertainty in the age of COVID19  David Spiegelhalter  5x15. (n.d.). Retrieved 19 August 2020, from https://www.youtube.com/watch?v=m_D9egJHfCw


twitter.com twitter.com

JASP Statistics on Twitter: “How to copy tables directly into your word processor using JASP. #stats #openSource https://t.co/slson1Hxlh” / Twitter. (n.d.). Twitter. Retrieved August 18, 2020, from https://twitter.com/JASPStats/status/1295057741216485376



Laghaie, A., & Otter, T. (2020). Measuring evidence for mediation in the presence of measurement error [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/5bz3f


psyarxiv.com psyarxiv.com

Speelman, C., & McGann, M. (2020). Statements about the Pervasiveness of Behaviour Require Data about the Pervasiveness of Behaviour [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/bxzm4


www.cebm.net www.cebm.net

Public Health England has changed its definition of deaths: Here’s what it means. (n.d.). CEBM. Retrieved 14 August 2020, from https://www.cebm.net/covid19/publichealthenglanddeathdatarevised/



Diewert, W. Erwin, and Kevin J Fox. ‘Measuring Real Consumption and CPI Bias under Lockdown Conditions’. Working Paper. Working Paper Series. National Bureau of Economic Research, May 2020. https://doi.org/10.3386/w27144.


onlinelibrary.wiley.com onlinelibrary.wiley.com

Collins, G. S., & Wilkinson, J. (n.d.). Statistical issues in the development a COVID19 prediction models. Journal of Medical Virology, n/a(n/a). https://doi.org/10.1002/jmv.26390


covid19.iza.org covid19.iza.org

ExponentialGrowth Prediction Bias and Compliance with Safety Measures in the Times of COVID19. COVID19 and the Labor Market. (n.d.). IZA – Institute of Labor Economics. Retrieved August 5, 2020, from https://covid19.iza.org/publications/dp13257/


www.bloomberg.com www.bloomberg.com

U.S. Economy Shrinks at Record 32.9% Pace in Second Quarter. (2020, July 30). Bloomberg.Com. https://www.bloomberg.com/news/articles/20200730/useconomyshrinksatrecord329paceinsecondquarter


jaspstats.org jaspstats.org

Introducing JASP 0.11: The Machine Learning Module. (2019, September 24). JASP  Free and UserFriendly Statistical Software. https://jaspstats.org/2019/09/24/introducingjasp011themachinelearningmodule/


www.bbc.co.uk www.bbc.co.uk

BBC Radio 4—The Political School, Episode 1. (n.d.). BBC. Retrieved August 2, 2020, from https://www.bbc.co.uk/programmes/m000kv6v

 Jul 2020

walkerdata.com walkerdata.com

Working with Census microdata. (n.d.). Retrieved July 31, 2020, from https://walkerdata.com/tidycensus/articles/pumsdata.html


www.sg.uu.nl www.sg.uu.nl

Dr. Maarten van Smeden (2020, May 11). Understanding the statistics of the coronavirus. Universiteit Utrecht. https://www.sg.uu.nl/video/2020/06/understandingstatisticscoronavirus



Gleeson, J. P., Onaga, T., Fennell, P., Cotter, J., Burke, R., & O’Sullivan, D. J. P. (2020). Branching process descriptions of information cascades on Twitter. ArXiv:2007.08916 [Physics]. http://arxiv.org/abs/2007.08916


twitter.com twitter.com

Maarten van Smeden on Twitter: “This is a kind reminder that most issues with data (e.g. measurement error, incomplete data, confounding, selection) do not disappear just because you have N = ginormous” / Twitter. (n.d.). Twitter. Retrieved July 19, 2020, from https://twitter.com/MaartenvSmeden/status/1283313496382373890


osf.io osf.io

Adjiwanou, V., Alam, N., Alkema, L., Asiki, G., Bawah, A., Béguy, D., Cetorelli, V., Dube, A., Feehan, D., Fisker, A. B., Gage, A., Garcia, J., Gerland, P., Guillot, M., Gupta, A., Haider, M. M., Helleringer, S., Jasseh, M., Kabudula, C., … You, D. (2020). Measuring excess mortality during the COVID19 pandemic in low and lowermiddle income countries: The need for mobile phone surveys [Preprint]. SocArXiv. https://doi.org/10.31235/osf.io/4bu3q



Luppi, F., Arpino, B., & Rosina, A. (2020). The impact of COVID19 on fertility plans in Italy, Germany, France, Spain and UK [Preprint]. SocArXiv. https://doi.org/10.31235/osf.io/wr9jb


www.youtube.com www.youtube.com

Communicating statistics, risk and uncertainty in the age of Covid—Prof. David Spiegelhalter. (2020, June 30). https://www.youtube.com/watch?v=Dq7W1l7RptQ&feature=youtu.be



Uchikoshi, F. (2020). COVerAGEJP: COVID19 Deaths by Age and Sex in Japan [Preprint]. SocArXiv. https://doi.org/10.31235/osf.io/cpqrt


projecteuclid.org projecteuclid.org

Shmueli, G. (2010). To Explain or to Predict? Statistical Science, 25(3), 289–310.


www.theguardian.com www.theguardian.com

Spiegelhalter, D. (2020, July 5). Risks, R numbers and raw data: How to interpret coronavirus statistics. The Observer. https://www.theguardian.com/world/2020/jul/05/risksrnumbersandrawdatahowtointerpretcoronavirusstatistics


www.jclinepi.com www.jclinepi.com

Sperrin, M., Martin, G. P., Sisk, R., & Peek, N. (2020). Missing data should be handled differently for prediction than for description or causal explanation. Journal of Clinical Epidemiology, 0(0). https://doi.org/10.1016/j.jclinepi.2020.03.028


jaspstats.org jaspstats.org

Introducing JASP 0.13. (2020, July 2). JASP  Free and UserFriendly Statistical Software. https://jaspstats.org/?p=6483

 Jun 2020

psyarxiv.com psyarxiv.com

Lakens, D. (2019). The practical alternative to the pvalue is the correctly used pvalue [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/shm8v


psyarxiv.com psyarxiv.com

Parsons, Sam. ‘Reliability Multiverse’, 26 June 2020. https://doi.org/10.31234/osf.io/y6tcz.


twitter.com twitter.com

Amy Perfors on Twitter: “I’ve been having a difficult time lately — partly because of [insert frantic gesturing at the state of the world], partly personal — but one thing has been a real bright light for me in the last few months. I think it has some broader lessons that might give some hope, so THREAD” / Twitter. (n.d.). Twitter. Retrieved June 26, 2020, from https://twitter.com/amyperfors/status/1275931919897595904


www.lshtm.ac.uk www.lshtm.ac.uk

Causal inference isn’t what you think it is. (n.d.). LSHTM. Retrieved June 26, 2020, from https://www.lshtm.ac.uk/newsevents/events/causalinferenceisntwhatyouthinkit


Local file Local file

higher when Ericksen conflict was present (Figure 2A)
Yeah, in single neurons you can show the detection of general conflict this way, and it was not partitionable into different responses...

G)
Very clear effect! suspicious? how exactly did they even select the pseudopopulations, its not clear exactly from the methods to me

pseudotrial vector x
one trial for all different neurons in the current pseudopopulation matrix?

The separating hyperplane for each choice i is the vector (a) that satisfies: 770 771 772 773 Meaning that βi is a vector orthogonal to the separating hyperplane in neuron774 dimensional space, along which position is proportional to the log odds of that correct 775 response: this is the the coding dimension for that correct response
Makes sense: If Beta is proportional to the logodds of a correct response, a is the hyperplane that provides the best cutoff, which must be orthogonal. Multiplying two orthogonal vectors yields 0.

X is the trials by neurons pseudopopulation matrix of firing rates
So these pseudopopulations were random agglomerates of single neurons that were recorded, so many fits for random groups, and the best were kept?

Within each neuron, 719 we calculated the expected firing rate for each task condition, marginalizing over 720 distractors, and for each distractor, marginalizing over tasks.
Distractor = specific stimulus / location (e.g. '1' or 'left')?
Task = conflict condition (e.g. Simon or Ericksen)?

conditionaveraged within neurons (9 data points per 691 neuron, reflecting all combinations of the 3 correct response, 3 Ericksen distractors, and 3 692 Simon distractors)
How do all combinations of 3 responses lead to only 9 data points per neuron? 3x2x2 = 12.


twitter.com twitter.com

Twitter. (n.d.). Twitter. Retrieved June 22, 2020, from https://twitter.com/JASPStats/status/1274764017752592384


twitter.com twitter.com

Prof Shamika Ravi on Twitter: “1) ACTIVE cases...shows which countries have 1) Peaked: Germany, S Korea, Japan, Italy, Spain... 2) Plateaued: France 3) Yet to peak: US, UK, Brazil, India...active cases still rising. 4) Second wave: Iran and.... Spain (?) https://t.co/C5c3gAhINc” / Twitter. (n.d.). Twitter. Retrieved June 2, 2020, from https://twitter.com/ShamikaRavi/status/1267664491040440322


iebh.bond.edu.au iebh.bond.edu.au

Institute for EvidenceBased Healthcare. (n.d.) 2 week systematic reviews (2weekSR). https://iebh.bond.edu.au/educationservices/2weeksystematicreviews2weeksr


medium.com medium.com

Morey, R. D. (2020, June 12). Power and precision. Medium. https://medium.com/@richarddmorey/powerandprecision47f644ddea5e


www.rbloggers.com www.rbloggers.com

Dablander, F. (2020, June 11). Interactive exploration of COVID19 exit strategies. RBloggers. https://www.rbloggers.com/interactiveexplorationofcovid19exitstrategies/



Brodeur, A., Cook, N., & Heyes, A. (2020). A Proposed Specification Check for pHacking. AEA Papers and Proceedings, 110, 66–69. https://doi.org/10.1257/pandp.20201078


rviews.rstudio.com rviews.rstudio.com

Views, R. (2020, May 20). An R View into Epidemiology. /2020/05/20/somerresourcesforepidemiology/


psyarxiv.com psyarxiv.com

Hopp, F. R., Fisher, J. T., Cornell, D., Huskey, R., & Weber, R. (2020). The Extended Moral Foundations Dictionary (eMFD): Development and Applications of a CrowdSourced Approach to Extracting Moral Intuitions from Text [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/924gq


www.tandfonline.com www.tandfonline.com

Efron, B. (2020). Prediction, Estimation, and Attribution. Journal of the American Statistical Association, 115(530), 636–655. https://doi.org/10.1080/01621459.2020.1762613


twitter.com twitter.com

Adam Kucharski on Twitter: “I’m getting asked more about the ‘k’ parameter that describes variation in the reproduction number, R (i.e. describes superspreading). But what does this parameter actually mean? A short statistical thread... 1/” / Twitter. (n.d.). Twitter. Retrieved June 4, 2020, from https://twitter.com/AdamJKucharski/status/1267737631481364480


psyarxiv.com psyarxiv.com

Han, H., & Dawson, K. J. (2020). JASP (Software) [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/67dcb


journals.sagepub.com journals.sagepub.com

Rosenbusch, H., Hilbert, L. P., Evans, A. M., & Zeelenberg, M. (2020). StatBreak: Identifying “Lucky” Data Points Through Genetic Algorithms. Advances in Methods and Practices in Psychological Science, 2515245920917950. https://doi.org/10.1177/2515245920917950


twitter.com twitter.com

Probability Fact on Twitter: “Random phenomena are not obligated to follow one of the few dozen distributions that humans have given names to.” / Twitter. (n.d.). Twitter. Retrieved June 2, 2020, from https://twitter.com/probfact/status/1267204212972236808



Cantwell, G. T., Liu, Y., Maier, B. F., Schwarze, A. C., Serván, C. A., Snyder, J., & StOnge, G. (2020). Thresholding normally distributed data creates complex networks. Physical Review E, 101(6), 062302. https://doi.org/10.1103/PhysRevE.101.062302

 May 2020

twitter.com twitter.com

🔥Kareem Carr🔥 on Twitter: “I want to talk about bugs in statistical analyses. I think many data analysts worry unnecessarily about this. I do think it’s important to put a good faith effort into avoiding bugs, but I know data analysts that live in terror of hearing there’s a bug in published work. 1/6” / Twitter. (n.d.). Twitter. Retrieved May 30, 2020, from https://twitter.com/kareem_carr/status/1266029701392412673


www.theguardian.com www.theguardian.com

Richardson, S., & Spiegelhalter, D. (2020, April 12). Coronavirus statistics: What can we trust and what should we ignore? The Observer. https://www.theguardian.com/world/2020/apr/12/coronavirusstatisticswhatcanwetrustandwhatshouldweignore


www.nytimes.com www.nytimes.com

Roberts, D. C. (2020, May 22). Putting the Risk of Covid19 in Perspective. The New York Times. https://www.nytimes.com/2020/05/22/well/live/puttingtheriskofcovid19inperspective.html


psyarxiv.com psyarxiv.com

Cuskley, C., & Wallenberg, J. (2020, May 14). Noise resistance in communication: Quantifying uniformity and optimality. https://doi.org/10.31234/osf.io/wpvq4


www.nature.com www.nature.com

Li, A., Zhou, L., Su, Q., Cornelius, S. P., Liu, Y.Y., Wang, L., & Levin, S. A. (2020). Evolution of cooperation on temporal networks. Nature Communications, 11(1), 1–9. https://doi.org/10.1038/s4146702016088w


www.estimationstats.com www.estimationstats.com

For comparisons between 3 or more groups that typically employ analysis of variance (ANOVA) methods, one can use the Cumming estimation plot, which can be considered a variant of the GardnerAltman plot.
Cumming estimation plot

Efron developed the biascorrected and accelerated bootstrap (BCa bootstrap) to account for the skew whilst obtaining the central 95% of the distribution.
Biascorrected and accelerated bootstrap (BCa boostrap) deals with skewed sample distributions. However; it must be noted that it "may not give very accurate coverage in a smallsample nonparametric situation" (simply said, take caution with small datasets)

We can calculate the 95% CI of the mean difference by performing bootstrap resampling.
Bootstrap  simple but powerful technique that creates multiple resamples (with replacement) from a single set of observations, and computes the effect size of interest on each of these resamples. It can be used to determine the 95% CI (Confidence Interval).
We can use bootstrap resampling to obtain measure of precision and confidence about our estimate. It gives us 2 important benefits:
 Nonparametric statistical analysis  no need to assume normal distribution of our observations. Thanks to Central Limit Theorem, the resampling distribution of the effect size will approach normality
 Easy construction of the 95% CI from the resampling distribution. For 1000 bootstrap resamples of the mean difference, 25th value and 975th value can be used as boundaries of the 95% CI.
Bootstrap resampling can be used for such an example:
Computers can easily perform 5000 resamples:
Tags
Annotators
URL


psyarxiv.com psyarxiv.com

Zinn, S., & Gnambs, T. (2020, April 18). Analyzing nonresponse in longitudinal surveys using Bayesian additive regression trees: A nonparametric event history analysis. https://doi.org/10.31234/osf.io/82c3w


github.com github.com

McElreath, R. Statistical Rethinking: A Bayesian Course Using R and Stan Github.com. https://github.com/rmcelreath/statrethinking_winter2019
Entire course with materials online.


statmodeling.stat.columbia.edu statmodeling.stat.columbia.edu

Statistical Modeling, Causal Inference, and Social Science. (2020 April 22). Blog Post: New analysis of excess coronavirus mortality; also a question about poststratification. https://statmodeling.stat.columbia.edu/2020/04/22/analysisofexcesscoronavirusmortalityalsoaquestionaboutpoststratification/

 Apr 2020

towardsdatascience.com towardsdatascience.com

the limitations of the PPS
Limitations of the PPS:
 Slower than correlation
 Score cannot be interpreted as easily as the correlation (it doesn't tell you anything about the type of relationship). PPS is better for finding patterns and correlation is better for communicating found linear relationships
 You cannot compare the scores for different target variables in a strict math way because they're calculated using different evaluation metrics
 There are some limitations of the components used underneath the hood
 You've to perform forward and backward selection in addition to feature selection

How to use the PPS in your own (Python) project
Using PPS with Python
 Download ppscore:
pip install ppscore
shell  Calculate the PPS for a given pandas dataframe:
import ppscore as pps pps.score(df, "feature_column", "target_column")
 Calculate the whole PPS matrix:
pps.matrix(df)
 Download ppscore:

The PPS clearly has some advantages over correlation for finding predictive patterns in the data. However, once the patterns are found, the correlation is still a great way of communicating found linear relationships.
PPS:
 good for finding predictive patterns
 can be used for feature selection
 can be used to detect information leakage between variables
 interpret PPS matrix as a directed graph to find entity structures Correlation:
 good for communicating found linear relationships

Let’s compare the correlation matrix to the PPS matrix on the Titanic dataset.
Comparing correlation matrix and the PPS matrix of the Titanic dataset:
findings about the correlation matrix:
 Correlation matrix is smaller because it doesn't work for categorical data
 Correlation matrix shows a negative correlation between
TicketPrice
andClass
. For PPS, it's a strong predictor (0.9 PPS), but not the other wayClass
toTicketPrice
(ticket of 500010000$ is most likely the highest class, but the highest class itself cannot determine the price)
findings about the PPS matrix:
 First row of the matrix tells you that the best univariate predictor of the column
Survived
is the columnSex
(Sex
was dropped for correlation) TicketID
uncovers a hidden pattern as well as it's connection with theTicketPrice

Let’s use a typical quadratic relationship: the feature x is a uniform variable ranging from 2 to 2 and the target y is the square of x plus some error.
In this scenario:
 we can predict y using x
 we cannot predict x using y as x might be negative or positive (for y=4, x=2 or 2
 the correlation is 0. Both from x to y and from y to x because the correlation is symmetric (more often relationships are assymetric!). However, the PPS from x to y is 0.88 (not 1 because of existing error)
 PPS from y to x is 0 because there's no relationship that y can predict if it only knows its own value

how do you normalize a score? You define a lower and an upper limit and put the score into perspective.
Normalising a score:
 you need to put a lower and upper limit
 upper limit can be F1 = 1, and a perfect MAE = 0
 lower limit depends on the evaluation metric and your data set. It's the value that a naive predictor achieves

For a classification problem, always predicting the most common class is pretty naive. For a regression problem, always predicting the median value is pretty naive.
What is a naive model:
 predicting common class for a classification problem
 predicting median value for a regression problem

Let’s say we have two columns and want to calculate the predictive power score of A predicting B. In this case, we treat B as our target variable and A as our (only) feature. We can now calculate a crossvalidated Decision Tree and calculate a suitable evaluation metric.
If the target (B) variable is:
 numeric  we can use a Decision Tree Regressor and calculate the Mean Absolute Error (MAE)
 categoric  we can use a Decision Tree Classifier and calculate the weighted F1 (or ROC)

More often, relationships are asymmetric
a column with 3 unique values will never be able to perfectly predict another column with 100 unique values. But the opposite might be true

there are many nonlinear relationships that the score simply won’t detect. For example, a sinus wave, a quadratic curve or a mysterious step function. The score will just be 0, saying: “Nothing interesting here”. Also, correlation is only defined for numeric columns.
Correlation:
 doesn't work with nonlinear data
 doesn't work for categorical values
Examples:


math.stackexchange.com math.stackexchange.com

Suppose you have only two rolls of dice. then your best strategy would be to take the first roll if its outcome is more than its expected value (ie 3.5) and to roll again if it is less.
Expected payoff of a dice game:
Description: You have the option to throw a die up to three times. You will earn the face value of the die. You have the option to stop after each throw and walk away with the money earned. The earnings are not additive. What is the expected payoff of this game?
Rolling twice: $$\frac{1}{6}(6+5+4) + \frac{1}{2}3.5 = 4.25.$$
Rolling three times: $$\frac{1}{6}(6+5) + \frac{2}{3}4.25 = 4 + \frac{2}{3}$$


math.stackexchange.com math.stackexchange.com

Therefore, En=2n+1−2=2(2n−1)
Simplified formula for the expected number of tosses (e) to get
n
consecutive heads(n≥1)
:$$e_n=2(2^n1)$$
For example, to get 5 consecutive heads, we've to toss the coin 62 times:
$$e_n=2(2^51)=62$$
We can also start with the longer analysis of the 5 scenarios:
 If we get a tail immediately (probability 1/2) then the expected number is e+1.
 If we get a head then a tail (probability 1/4), then the expected number is e+2.
 If we get two head then a tail (probability 1/8), then the expected number is e+2.
 If we get three head then a tail (probability 1/16), then the expected number is e+4.
 If we get four heads then a tail (probability 1/32), then the expected number is e+5.
 Finally, if our first 5 tosses are heads, then the expected number is 5.
Thus:
$$e=\frac{1}{2}(e+1)+\frac{1}{4}(e+2)+\frac{1}{8}(e+3)+\frac{1}{16}\\(e+4)+\frac{1}{32}(e+5)+\frac{1}{32}(5)=62$$
We can also generalise the formula to:
$$e_n=\frac{1}{2}(e_n+1)+\frac{1}{4}(e_n+2)+\frac{1}{8}(e_n+3)+\frac{1}{16}\\(e_n+4)+\cdots +\frac{1}{2^n}(e_n+n)+\frac{1}{2^n}(n) $$


psyarxiv.com psyarxiv.com

Derks, K., de swart, j., van Batenburg, P., Wagenmakers, E., & wetzels, r. (2020, April 28). Priors in a Bayesian Audit: How Integration of Existing Information into the Prior Distribution Can Increase Transparency, Efficiency, and Quality. Retrieved from psyarxiv.com/8fhkp


stats.stackexchange.com stats.stackexchange.com

Repeated measures involves measuring the same cases multiple times. So, if you measured the chips, then did something to them, then measured them again, etc it would be repeated measures. Replication involves running the same study on different subjects but identical conditions. So, if you did the study on n chips, then did it again on another n chips that would be replication.
Difference between repeated measures and replication


psyarxiv.com psyarxiv.com

Olapegba, P. O., Ayandele, O., Kolawole, S. O., Oguntayo, R., Gandi, J. C., Dangiwa, A. L., … Iorfa, S. K. (2020, April 12). COVID19 Knowledge and Perceptions in Nigeria. https://doi.org/10.31234/osf.io/j356x
Tags
 news
 perception
 symptom
 knowledge
 COVID19
 data collection
 China
 media
 precaution
 information
 Nigeria
 public health
 prevention
 behavior
 lang:en
 health information
 questionnaire
 transmission
 descriptive statistics
 general public
 misinformation
 is:preprint
 infection
 lockdown
 misconception
Annotators
URL


arxiv.org arxiv.org

Taleb, N. N. (2019). On the Statistical Differences between Binary Forecasts and Real World Payoffs. ArXiv:1907.11162 [Physics, qFin]. http://arxiv.org/abs/1907.11162


doi.org doi.org

Hossain, M. A. (2020). Is the spread of COVID19 across countries influenced by environmental, economic and social factors? [Preprint]. Epidemiology. https://doi.org/10.1101/2020.04.08.20058164


users.ox.ac.uk users.ox.ac.uk

Bird, S., Nielsen, B. (2020 April 20). Nowcasting of Covid19 deaths in English Hospitals. http://users.ox.ac.uk/~nuff0078/Covid/index.htm
