23 Matching Annotations
  1. Mar 2024
  2. static1.squarespace.com static1.squarespace.com
    1. AI-concerned think the risk that a genetically engineered pathogen will killmore than 1% of people within a 5-year period before 2100 is 12.38%, while the AIskeptics forecast a 2% chance of that event, with 96% of the AI-concerned abovethe AI skeptics’ median forecast

      this seems like a sort of ad-hoc way of breaking up the data. What exactly is the question here, and why is this the best way to answer it?

    2. hose who did best on reciprocal scoring had lower forecasts ofextinction risk.72 We separately compare each forecaster’s forecast of others’ forecasts on ten key questions, for both expertsand superforecasters. We rank each forecaster’s accuracy on those 20 quantities relative to other participants,and then we compute each forecaster’s average rank to calculate an overall measure of intersubjective accuracy.73 This may be because superforecasters are a more homogenous group, who regularly interact with eachother outside of forecasting tournaments like this.74 Pavel Atanasov et al., “Full Accuracy Scoring Accelerates the Discovery of Skilled Forecasters,” SSRN WorkingPaper, (February 14, 2023), http://dx.doi.org/10.2139/ssrn.4357367.

      This seems visually the case, but I don't see metrics or statistical inference here.

    3. ithin both groups—experts and superforecasters—more accurate reciprocalscores were correlated with lower estimates of catastrophic and extinction risk. Inother words, the better experts were at discerning what other people would predict,the less concerned they were about extinction

      But couldn't this just be because people who think there is high Xrisk think others are likely to think like themselves? Is it more finely grained 'better reciprocal accuracy' than that?

    4. otal Catastrophic Risk

      The differences in the total x-risk are not quite so striking-- about 2:1 vs 6:1 What accounts for this? Hmm, this look different from the 'Total Extinction risk' in table 4. Here a notebook would be helpful. Ahh, it's because this is for catastrophic risk, not extinction risk.

    5. First, we can rule out the possibility that experts can’t persuade others of the severityof existential risks simply because of a complete lack of sophistication, motivation,or intelligence on the part of their audience. The superforecasters have all thosecharacteristics, and they continue to assign much lower chances than do experts.

      This paragraph seems a bit loosely argued.

    6. Question and resolution details

      They seem to have displayed the questions along with particular “Prior Forecasts” — is that appropriate? Could that be driving the persistent difference between the superforecasters and experts?

    7. general x-riskexperts

      What are 'general x-risk experts'? Give some examples.

    8. The median participant who completedthe tournament earned $2,500 in incentives, but this figure is expected to rise asquestions resolve in the coming years.

      fairly substantial incentives ... but it may have been time consuming; how many hours did it take?... and how much variation was there in the incentive pay/how sensitive was it to the predictions?

    9. with 111completing all stages of the tournament

      Would this attrition matter?

    10. Participants made individual forecasts2. Teams comprised entirely of either superforecasters or experts deliberated andupdated their forecasts3. Blended teams from the second stage, consisting of one superforecaster team andone expert team, deliberated and updated their forecasts4. Each team saw one wiki summarizing the thinking of another team and againupdated their forecasts

      with incentives for accuracy (or 'intersubjective' accuracy) at each stage, or only at the very end? Aldo incentives for making strong comments and (?) convincing others/

    11. We also advertised broadly, reaching participants withrelevant experience via blogs and Twitter. We received hundreds of expressions ofinterest in participating in the tournament, and we screened these respondents forexpertise, offering slots to respondents with the most expertise after a review of theirbackgrounds.1

      Recruitment of experts.

    12. We explained that after the tournament we would show the highest-qualityanonymized rationales (curated by independent readers) to panels of online surveyparticipants who would make forecasts before and after reading the rationale. Prizesgo to those whose rationales helped citizens update their forecasts toward greateraccuracy, using both proper scoring rules for resolvable questions and intersubjectiveaccuracy for unresolvable questions.21

      Is this approach valid? Would it give powerful incentives to be persuasive? What is are these rationales used for? Note that 'intersubjective accuracy' is not a ground truth for the latter questions.

    13. One common challenge in forecasting tournaments is to uncover the reasoningbehind predictions.

      How does this 'uncover the reasoning behind predictions'?

    14. scoring ground rules: questions resolving by 2030were scored using traditional forecasting metrics where the goal was to minimize thegap between probability judgments and reality (coded as zero or one as a function ofthe outcome). However, for the longer-run questions, participants learned that theywould be scored based on the accuracy of their reciprocal forecasts: the better theypredicted what experts and superforecasters would predict for each question, thebetter their score.

      Is the 'reciprocal scoring' rule likely to motivate honest (incentive-compatible) predictions? Is it likely to generate useful information in this context?

    15. When we report probabilities of long-run catastrophic andexistential risk in this report, we report forecasters’ own (unincentivized) beliefs. But,we rely on the incentivized forecasts to calculate measures of intersubjective accuracy

      This is a bit confusing. The language needs clarification. What exactly is 'intersubjective accuracy'?

    16. the XPT:• What will be the global surface temperature change as compared to 1850–1900, indegrees Celsius? (By 2030, 2050, 2100)• By what year will fusion reactors deliver 1% of all utility-scale power consumed inthe U.S.?• How much will be spent on compute [computational resources] in the largest AIexperiment? (By 2024, 2030, 2050)• What is the probability that artificial intelligence will be the cause of death, within a5-year period, for more than 10% of humans alive at the beginning of that period?(By 2030, 2050, 2100)• What is the overall probability of human extinction or a reduction in the globalpopulation below 5,000? (By 2030, 2050, 2100)18 Participants also consented to participate in this study, via the University of Pennsylvania’s InstitutionalReview Board. The consent form detailed the format of the study.19 We define a catastrophic event as one causing the death of at least 10% of humans alive at the beginning ofa five-year period. We define extinction as reduction of the global population to less than 5,000.

      I appreciate these links to the full question content.

  3. Dec 2023
  4. static1.squarespace.com static1.squarespace.com
    1. 3. How the XPT works

      A web site/wiki thing with dynamic explanations seems better for this section

    2. 1.33% [0.17,

      tables should be forematted better

    3. The median is straightforward tocalculate, transparent, robust to extreme outlying observations, and understandableto people with a basic knowledge of statistics. Also, reassuringly, it is never thehighest nor the lowest of the five methods we considered as potential aggregationmethods. For these reasons, we think the median provides an ideal middle ground foraggregating forecasts in this project.

      This seems very much ad-hoc and not meant for a specialist audience. There is a whole literature on this, and much more theoretically grounded approaches, as you know. The justification given here is rather incomplete.

    4. otal Extinction Risk

      This stuff could be better presented as a dashboard/hosted Quarto type thing

    5. bold claims that attract audiences and funding—and to keep their predictions vagueenough so they can never be proven wrong.

      this seems somewhat contradictory

    6. Some have argued more broadl

      if this were a part of the project being evaluated we would ask for a reference here ('who are these people?'). But maybe OK for exec. summary.

    7. I"m not sure a pdf is the best format for this. I suspect more interactive web presentation would be better