23 Matching Annotations

Mar 2024
static1.squarespace.com static1.squarespace.com

XPT.pdf

16
1. daaronr 07 Mar 2024
  
  in Public
  
  AI-concerned think the risk that a genetically engineered pathogen will killmore than 1% of people within a 5-year period before 2100 is 12.38%, while the AIskeptics forecast a 2% chance of that event, with 96% of the AI-concerned abovethe AI skeptics’ median forecast
  
  this seems like a sort of ad-hoc way of breaking up the data. What exactly is the question here, and why is this the best way to answer it?
2. daaronr 07 Mar 2024
  
  in Public
  
  hose who did best on reciprocal scoring had lower forecasts ofextinction risk.72 We separately compare each forecaster’s forecast of others’ forecasts on ten key questions, for both expertsand superforecasters. We rank each forecaster’s accuracy on those 20 quantities relative to other participants,and then we compute each forecaster’s average rank to calculate an overall measure of intersubjective accuracy.73 This may be because superforecasters are a more homogenous group, who regularly interact with eachother outside of forecasting tournaments like this.74 Pavel Atanasov et al., “Full Accuracy Scoring Accelerates the Discovery of Skilled Forecasters,” SSRN WorkingPaper, (February 14, 2023), http://dx.doi.org/10.2139/ssrn.4357367.
  
  This seems visually the case, but I don't see metrics or statistical inference here.
3. daaronr 07 Mar 2024
  
  in Public
  
  ithin both groups—experts and superforecasters—more accurate reciprocalscores were correlated with lower estimates of catastrophic and extinction risk. Inother words, the better experts were at discerning what other people would predict,the less concerned they were about extinction
  
  But couldn't this just be because people who think there is high Xrisk think others are likely to think like themselves? Is it more finely grained 'better reciprocal accuracy' than that?
4. daaronr 07 Mar 2024
  
  in Public
  
  otal Catastrophic Risk
  
  The differences in the total x-risk are not quite so striking-- about 2:1 vs 6:1 What accounts for this? Hmm, this look different from the 'Total Extinction risk' in table 4. Here a notebook would be helpful. Ahh, it's because this is for catastrophic risk, not extinction risk.
5. daaronr 07 Mar 2024
  
  in Public
  
  First, we can rule out the possibility that experts can’t persuade others of the severityof existential risks simply because of a complete lack of sophistication, motivation,or intelligence on the part of their audience. The superforecasters have all thosecharacteristics, and they continue to assign much lower chances than do experts.
  
  This paragraph seems a bit loosely argued.
6. daaronr 06 Mar 2024
  
  in Public
  
  Question and resolution details
  
  They seem to have displayed the questions along with particular “Prior Forecasts” — is that appropriate? Could that be driving the persistent difference between the superforecasters and experts?
7. daaronr 06 Mar 2024
  
  in Public
  
  general x-riskexperts
  
  What are 'general x-risk experts'? Give some examples.
8. daaronr 06 Mar 2024
  
  in Public
  
  The median participant who completedthe tournament earned $2,500 in incentives, but this figure is expected to rise asquestions resolve in the coming years.
  
  fairly substantial incentives ... but it may have been time consuming; how many hours did it take?... and how much variation was there in the incentive pay/how sensitive was it to the predictions?
9. daaronr 06 Mar 2024
  
  in Public
  
  with 111completing all stages of the tournament
  
  Would this attrition matter?
10. daaronr 06 Mar 2024
  
  in Public
  
  Participants made individual forecasts2. Teams comprised entirely of either superforecasters or experts deliberated andupdated their forecasts3. Blended teams from the second stage, consisting of one superforecaster team andone expert team, deliberated and updated their forecasts4. Each team saw one wiki summarizing the thinking of another team and againupdated their forecasts
  
  with incentives for accuracy (or 'intersubjective' accuracy) at each stage, or only at the very end? Aldo incentives for making strong comments and (?) convincing others/
11. daaronr 06 Mar 2024
  
  in Public
  
  We also advertised broadly, reaching participants withrelevant experience via blogs and Twitter. We received hundreds of expressions ofinterest in participating in the tournament, and we screened these respondents forexpertise, offering slots to respondents with the most expertise after a review of theirbackgrounds.1
  
  Recruitment of experts.
12. daaronr 06 Mar 2024
  
  in Public
  
  We explained that after the tournament we would show the highest-qualityanonymized rationales (curated by independent readers) to panels of online surveyparticipants who would make forecasts before and after reading the rationale. Prizesgo to those whose rationales helped citizens update their forecasts toward greateraccuracy, using both proper scoring rules for resolvable questions and intersubjectiveaccuracy for unresolvable questions.21
  
  Is this approach valid? Would it give powerful incentives to be persuasive? What is are these rationales used for? Note that 'intersubjective accuracy' is not a ground truth for the latter questions.
13. daaronr 06 Mar 2024
  
  in Public
  
  One common challenge in forecasting tournaments is to uncover the reasoningbehind predictions.
  
  How does this 'uncover the reasoning behind predictions'?
14. daaronr 06 Mar 2024
  
  in Public
  
  scoring ground rules: questions resolving by 2030were scored using traditional forecasting metrics where the goal was to minimize thegap between probability judgments and reality (coded as zero or one as a function ofthe outcome). However, for the longer-run questions, participants learned that theywould be scored based on the accuracy of their reciprocal forecasts: the better theypredicted what experts and superforecasters would predict for each question, thebetter their score.
  
  Is the 'reciprocal scoring' rule likely to motivate honest (incentive-compatible) predictions? Is it likely to generate useful information in this context?
15. daaronr 06 Mar 2024
  
  in Public
  
  When we report probabilities of long-run catastrophic andexistential risk in this report, we report forecasters’ own (unincentivized) beliefs. But,we rely on the incentivized forecasts to calculate measures of intersubjective accuracy
  
  This is a bit confusing. The language needs clarification. What exactly is 'intersubjective accuracy'?
16. daaronr 06 Mar 2024
  
  in Public
  
  the XPT:• What will be the global surface temperature change as compared to 1850–1900, indegrees Celsius? (By 2030, 2050, 2100)• By what year will fusion reactors deliver 1% of all utility-scale power consumed inthe U.S.?• How much will be spent on compute [computational resources] in the largest AIexperiment? (By 2024, 2030, 2050)• What is the probability that artificial intelligence will be the cause of death, within a5-year period, for more than 10% of humans alive at the beginning of that period?(By 2030, 2050, 2100)• What is the overall probability of human extinction or a reduction in the globalpopulation below 5,000? (By 2030, 2050, 2100)18 Participants also consented to participate in this study, via the University of Pennsylvania’s InstitutionalReview Board. The consent form detailed the format of the study.19 We define a catastrophic event as one causing the death of at least 10% of humans alive at the beginning ofa five-year period. We define extinction as reduction of the global population to less than 5,000.
  
  I appreciate these links to the full question content.
Visit annotations in context

Annotators

daaronr

URL

static1.squarespace.com/static/635693acf15a3e2a14a56a4a/t/64f0a7838ccbf43b6b5ee40c/1693493128111/XPT.pdf
Dec 2023
static1.squarespace.com static1.squarespace.com

XPT.pdf

7
1. daaronr 04 Dec 2023
  
  in Public
  
  3. How the XPT works
  
  A web site/wiki thing with dynamic explanations seems better for this section
2. daaronr 04 Dec 2023
  
  in Public
  
  1.33% [0.17,
  
  tables should be forematted better
3. daaronr 04 Dec 2023
  
  in Public
  
  The median is straightforward tocalculate, transparent, robust to extreme outlying observations, and understandableto people with a basic knowledge of statistics. Also, reassuringly, it is never thehighest nor the lowest of the five methods we considered as potential aggregationmethods. For these reasons, we think the median provides an ideal middle ground foraggregating forecasts in this project.
  
  This seems very much ad-hoc and not meant for a specialist audience. There is a whole literature on this, and much more theoretically grounded approaches, as you know. The justification given here is rather incomplete.
4. daaronr 04 Dec 2023
  
  in Public
  
  otal Extinction Risk
  
  This stuff could be better presented as a dashboard/hosted Quarto type thing
5. daaronr 04 Dec 2023
  
  in Public
  
  bold claims that attract audiences and funding—and to keep their predictions vagueenough so they can never be proven wrong.
  
  this seems somewhat contradictory
6. daaronr 04 Dec 2023
  
  in Public
  
  Some have argued more broadl
  
  if this were a part of the project being evaluated we would ask for a reference here ('who are these people?'). But maybe OK for exec. summary.
7. daaronr 04 Dec 2023
  
  in Public
  
  I"m not sure a pdf is the best format for this. I suspect more interactive web presentation would be better
Visit annotations in context

Annotators

daaronr

URL

static1.squarespace.com/static/635693acf15a3e2a14a56a4a/t/64f0a7838ccbf43b6b5ee40c/1693493128111/XPT.pdf

David Reinstein

Dr. David Reinstein, Senior Economist, Rethink Priorities https://daaronr.github.io/markdown-cv/

Project pages: innovationsinfundraising.org, giveifyouwin.org

Twitter: @givingtools

Annotations: 1,066

Joined: July 17, 2019

Location: Western Massachussets

Link: daaronr.github.io/markdown-cv/

ORCID: 0000-0002-0470-4991

Annotators

URL

Annotators

URL