9 Matching Annotations
  1. Feb 2023
    1. Evaluation 1 (Seth Benzell)

      Editors' note (David Reinstein): I converted math and Greek letters into latex format, mainly to demonstrate this capacity. I made no other changes.

      Thanks to the Unjournal for their invitation to review “Artificial Intelligence and Economic Growth”. In this essay the authors have three announced goals: Help set an agenda for research on the impact of AI on growth, refine research questions on the subject, and summarise and recontextualize key previous findings with an emphasis on Baumol’s cost disease. This is an ambitious task, but the authors largely succeed!

      In the first four sections of the paper, the authors do a wonderful job of outlining a general neoclassical model of automation. They explain how the key parameters of the model determine the impact of automation. They distinguish between two types of economic singularity, and show how the more extreme variety emerges naturally from some parameterizations of their model – something which I believe is an important innovation of this paper (including above Nordhaus (2015) a direct antecedent paper). These models stimulate the reading researcher to ask how these parameters could be estimated, opening a door to applied economists to contribute to the macroeconomic question of growth and AI. After this, section 5 is a bit of a disappointment. It lists several additional economic phenomena that might be caused by AI and automation, and occasionally ties these ideas back to economic growth, but in a less organised way without the assistance of a model. The essay closes with empirical evidence on capital shares and automation, which was adequate for the time empirically, but is somewhat lacking in its interpretation of the data.

      Let me start by going into detail about what I liked about the first several sections, including some complementary thoughts it inspired in me. Then I’ll explain what I consider the main factor omitted in these sections: the impact of automation and AI on saving and investment. I’ll close with some thoughts on the limitations of sections 5 and 6, and how they might be improved.

      Section 2 of the paper lays out a general, neoclassical, model of automation, drawing on Zeira (1998) and Acemoglu and Restrepo (2016). The key equations are clearly presented. The authors highlight Baumol’s "cost disease" -- the phenomenon that an increase in output of one sector of the economy will make goods in a complementary sector more expensive -- as a key phenomenon to be understood for projecting AI and automation-led growth. '\(\rho\)' is the parameter in the model that governs how substitutable different goods (for example, automatable and non-automatable ones) are in the economy. When \(\rho\) is smaller, the economy is relatively more limited by its scarce labor than it is boosted by automation. It is more likely for interest rates and the capital share to even decline because of greater automation. This effect is exacerbated by capital accumulation over time, in contrast to labor which is inelastically supplied. The way I once heard this phenomenon described is "You can always have more capital per-capita, but you can't have more capita per-capita", and the authors do a good job of explaining this theme from the previous literature.

      The authors do a great job of highlighting the importance of “\(\rho\)” to economic growth. Implicitly the authors are suggesting to applied researchers to go out and measure this elasticity! Between automated and non-automated tasks, or between relatively capital intensive and labor intensive sectors, for example.

      The authors explain several special cases of their model, to explain how other parameters balance against each other as well. They focus on the role of “\(\beta\)”, the share of sectors which are automated. I think the authors are correct in taking a narrative approach to possible paths ‘beta’ can take, rather than following Acemoglu and Restrepo (2016) and trying to endogenize it to the decisions of scientists. It's the right level of detail to stop at, given their more general concerns.

      Sections 3 and 4 go farther beyond the current state of the literature, introducing AI as an input to technology production functions and considering versions of an economic singularity. Section 3's formalization is clear, but I might have appreciated a note from the author that other approaches to modelling "AI in the idea production function" might be better -- whereas I think the model in section 2 is more solidly paradigmatic. The key parameter here turns out to be "\(\phi\)", the rate at which knowledge growth is increasing/decreasing in the stock of knowledge.

      In section 4, the authors layout what I think are the best taxonomy of economic singularities I've seen (I think the best alternative that would have been in the literature at the time would have been Nordhaus 2015's). While these are somewhat extreme scenarios, they immediately ground themselves by showing how a type I case is the natural result of the oldest economic model of automation -- the AK growth model. I would make the connection between the AK growth model and the "\(\rho=\infty\)" (i.e. all goods are perfect substitutes) case of the general model in section 3 more explicit. The authors then show that the key parameter determining whether type-2 singularity is \(\phi\). In the simpler model (example 2), \(\phi\) being greater than 0 is enough to create an infinite-economic-output singularity. In the third example, the condition is a slightly more complicated function of \(\phi\). The section closes with an ok discussion of some more general related concerns regarding an economic singularity, returning again to \(\rho\) and the role of 'scarce bottlenecks' in output.

      I really appreciated these sections, and feel they do a generally good job at agenda setting for both theorists and applied researchers. For applied researchers, I think the way the paper identifies "rho", "phi", and "beta" as especially important serves as a useful directive towards what they should attempt to measure. What might have made the paper even better is a small table with empirical evidence on these parameters so far, to give the applied researcher inspired by this paper a starting point.

      For the theorist, the mind swims with possible extensions to and variations on the approaches presented. Obviously a paper like this can't cover or even suggest every possibility. One might imagine variations of a growth model that allows for "\(\rho\)" - which can be interpreted as a taste parameter - to be endogenous in some way. In section 5, the authors hint that markups changing over time could be important. They do the same, in referencing Acemoglu and Restrepo (2016) about making "\(\beta\)" endogenous. Another natural extension makes labor supply endogenous, or might explore an automation -->politics-->growth public choice mechanism. I don't think it's a problem that the authors failed to mention all these possibilities, but some of these I do think are more interesting and directly connect AI and growth than some of the other epiphenomena discussed in section 5 (some of which are less clearly reasoned -- for example, isn't it just as plausible to think that AI will increase centralization and superstar firms as it is to decrease it?) .

      Still I do think that the authors fall down in not focusing more heavily on the role of saving in the model. Throughout the paper, the saving rate in the model is assumed to be constant -- a hypothesis that isn't well grounded in either a representative agent model (which achieves a constant interest rate in the long run) or an OLG model (in which saving will be a function of many other considerations). I think this is an important oversight for a document that wants to set the agenda.

      I’ll admit I’m a bit of a partisan for this issue, having considered it in (Benzell et. al. 2015) and (Benzell et. al. 2022). In the first paper, we show how in OLG models automation technologies can actually lower output and welfare for future generations. The reason is that savings are made by the young out of their labor incomes, for consumption in their retirements. When automation accumulates, the share of income going to young and laboring savers decreases, and the share going to old spenders increases. This reduces the amount which is saved and reinvested. In certain cases, the reduced saving effect is large enough to more than offset the productivity growth effect of automation. The possibility that a new technology could lower long-run output is not admitted for in the authors' model – ruling out certain conceptually coherent scenarios such as the one imagined in Asimov’s “The Caves of Steel” – where highly productive AGIs and automation exists, but a low saving and reinvestment rate by a socialist government keeps society impoverished.

      More generally, the exogenous saving framework pursued by the authors doesn't allow for any inter-generational analysis of the impact of automation. On a more practical level, interpreting the decrease in the global interest rate as telling us something about automation (for example, see the recent "cite") needs to account for global demographic and distributional factors that have created a "global saving glut" (cite). In (Benzell et. al. 2021), we find that even a rate of automation at 5x the historical rate would fail to overcome this headwind and increase interest rates.

      This brings me to the final section of the paper, on the evidence to date on automation and capital shares. Karabarbounis and Neiman (2014) is correctly taken as the starting point, and I think the discussion is ok for the time overall. My main quibble is with the characterization of Autor et. al. (2017) and Barkai (2017). These are presented as 'alternative theories of capital share's increase' but they're more like alternate theories of what K+N are measuring. These papers and Barkai and Benzell (2018) claim it is the profit share of income which is increasing, not the capital share, a theory which is consistent with the microevidence on markups (for example, De Loecker et al 2020). . That has tremendous implications for its interpretation in a model of automation. For example Benzell et al (2022) theorise that the profit share has increased because certain inelastically supplied inputs in the economy are complements to automation and measured as profits. Why do I mention this? Well, because it has dramatic implications for whether the rho<1 or rho>1 case is true: If rho<1 then "capital share" shouldn't be increasing, especially if interest rates and growth are low. On the other hand, rho>1 implies an AK world asymptotically, which also seems unlikely. We think it more likely that \(\rho\) is \(\lt 1\), but physical capital's share is actually decreasing, which is how Benzell et al (2022) reconciles this riddle.

      Works Cited:

      Acemoğlu, D., & Restrepo, P. (2016). The race between machines and humans: Implications for growth, factor shares and jobs.

      Autor, D., Dorn, D., Katz, L. F., Patterson, C., & Van Reenen, J. (2020). The fall of the labor share and the rise of superstar firms. The Quarterly Journal of Economics, 135(2), 645-709.

      Barkai, S. (2020). Declining labor and capital shares. The Journal of Finance, 75(5), 2421-2463.

      Barkai, S., & Benzell, S. G. (2018). 70 years of US corporate profits (No. 277). Working Paper.

      Benzell, S. G., Brynjolfsson, E., & Saint-Jacques, G. (2022b). Digital Abundance Meets Scarce Architects: Implications for Wages, Interest Rates, and Growth.

      Benzell, S. G., Kotlikoff, L. J., LaGarda, G., & Sachs, J. D. (2015). Robots are us: Some economics of human replacement (No. w20941). National Bureau of Economic Research.

      Benzell, S. G., Kotlikoff, L. J., LaGarda, G., & Ye, V. Y. (2021). Simulating Endogenous Global Automation (No. w29220). National Bureau of Economic Research.

      De Loecker, J., Eeckhout, J., & Unger, G. (2020). The rise of market power and the macroeconomic implications. The Quarterly Journal of Economics, 135(2), 561-644.

      Karabarbounis, L., & Neiman, B. (2014). The global decline of the labor share. The Quarterly journal of economics, 129(1), 61-103.

      Nordhaus, W. D. (2015). Are we approaching an economic singularity? information technology and the future of economic growth (No. w21547). National Bureau of Economic Research.

      Zeira, J. (1998). Workers, machines, and economic growth. The Quarterly Journal of Economics, 113(4), 1091-1117.

      Evaluator details

      How long have you been in this field?

      I started my PhD in Economics in 2012. I became interested in the impact of automation on economic growth shortly after, so about 10 years.

      How many proposals and papers have you evaluated?

      I have reviewed about 30 papers. I’d say about ⅓ to ½ of these are broadly on the subject of automation.

    2. Author response

      Thanks to Seth and Phil for their careful, thorough, and generous reading of our paper. They make many excellent points, and we agree with them all.

      Before commenting on some specifics, we'll take this opportunity to reminisce a bit about the backstory behind this paper. Ajay Agrawal, Joshua Gans, and Avi Goldfarb organized the inaugural NBER conference on The Economics of Artificial Intelligence. They solicited authors from across the different subfields of economics, including each of us independently, and put together a phenomenally broad and interesting set of experts. The three of us had all thought a bit about the growth consequences of artificial intelligence, but many of our thoughts were nebulous and not obviously suited to a standard economics journal. The conference organizers suggested we might work together, and we quickly agreed. Great fun ensued!

      So that's part of the explanation as to why, as both reviewers indicate, the paper reads to a great extent as a bunch of different ideas and suggestions, rather than a laser-like focus on a single idea that is worked through thoroughly. We found it intriguing that the Baumol-like forces that make some essential tasks hard to automate might constrain the effects of automation. But we also found it fascinating that it is easier than we had guessed to get explosive growth and singularities, even without complete automation. Our hope then and now is that these ideas will stimulate researchers to work on this topic.

      And now an important mea culpa: we apologize for the embarrasing number of typos and for one significant mistake that Phil caught and corrected (in the proof of the singularity in Example 3 on page 256). We maintain a "corrected proof" on our web pages that includes all corrections we've found or received, and we'll continue to do that.

      Seth raises many interesting points, especially about what might happen if the saving rate were endogenized. He is surely right that it would be interesting to push our work in this direction, and his research on this topic highlights several insights.

      Each of the referees’ suggestions (endogenous labor supply, the effect of automation on politics and growth policy, endogenous savings, endogenizing the automation of research tasks…) deserves a paper on its own. And when commenting on Section 5, the referees accurately stress that the remarks laid out in that section remain speculative and deserve deeper analyses. Here it may be worth mentioning that each of us separately are pursuing the “AI-Automation” research agenda in ways that partly intersect with the referee’s comments. For example, Seth puts forward the idea that AI should lead to more centralization and favor the emergence of super-star firms. A recent paper entitled “A Theory of Falling Growth and Rising Rents” develops and calibrates a growth model where the IT revolution leads in this direction. The bottom line is that the effect of IT and AI revolutions on long-run productivity growth can become negative if competition policy does not adapt.

      As we thought about the topic of artificial intelligence and economic growth, what stood out most to us is the incredible range of possibilities. Both evaluations correctly point out that this remains a wide-open area for more research, both on the theory side and certainly in terms of using evidence to help convert the possibilities to probabilities. Thanks again for the excellent and insightful evaluations.

    3. Ratings and Predictions

      Ratings (1-100)

      <table> <tr> <td> </td> <td><span style="text-decoration:underline;">Evaluator 1 (of 2): Seth Benzell</span> </td> <td> </td> <td> </td> <td><span style="text-decoration:underline;">Evaluator 2: Phil Trammell</span> </td> <td> </td> <td> </td> </tr> <tr> <td>Rating category </td> <td>Rating (0-100) </td> <td>90% CI (0-100)* </td> <td>Comments </td> <td>Rating (0-100) </td> <td>90% CI (0-100)* </td> <td>Comments </td> </tr> <tr> <td>Overall assessment </td> <td>80 </td> <td>70-90 </td> <td> </td> <td>92 </td> <td>80-100 </td> <td> </td> </tr> <tr> <td>Advancing knowledge and practice </td> <td>75 </td> <td>65-85 </td> <td> </td> <td>97 </td> <td>80-100 </td> <td> </td> </tr> <tr> <td>Methods: Justification, reasonableness, validity, robustness </td> <td>80 </td> <td>75-85 </td> <td> </td> <td>70 </td> <td>40-90 </td> <td>Somewhat scattered, and in that sense less robust </td> </tr> <tr> <td>Logic & communication </td> <td>70 </td> <td>60-80 </td> <td> </td> <td>45 </td> <td>30-70 </td> <td>An awkward combination of intensive focus on some things and selective breadth in others. Also, unusually many typos and minor errors. On the other hand, logical and very clearly written. </td> </tr> <tr> <td>Open, collaborative, replicable </td> <td>95 </td> <td>90-100 </td> <td> </td> <td>80? It’s theory, and clearly written </td> <td>? </td> <td> </td> </tr> <tr> <td>Relevance to global priorities </td> <td>90 </td> <td>85-100 </td> <td> </td> <td>92 </td> <td>80-100 </td> <td>Sharpens our thinking about an extremely important topic, but does not include direct discussions about decision-relevance. </td> </tr> </table>

      Journal predictions (1-100)

      <table> <tr> <td> </td> <td><span style="text-decoration:underline;">Evaluator 1: Benzell</span> </td> <td> </td> <td> </td> <td><span style="text-decoration:underline;">Evaluator 2: Trammell</span> </td> <td> </td> <td> </td> </tr> <tr> <td>Prediction metric </td> <td>Rating (0-5) </td> <td>90% CI (0-5)* or low to high </td> <td>Comments </td> <td>Rating (0-5) </td> <td>90% CI (0-5)* or low to high </td> <td>Comments </td> </tr> <tr> <td>What ‘quality journal’ do you expect this work will be published in?

      Note: 0= lowest/none, 5= highest/best </td> <td>I don’t think it will be published </td> <td>90% </td> <td> </td> <td>N/A (published) \ \ But I think I would have guessed ~3.5, i.e. a decent field journal, in light of other AI-and-growth-theory papers’ long periods of languishing before publication (such as Nordhaus’s). </td> <td> </td> <td> </td> </tr> <tr> <td>On a ‘scale of journals’, what tier journal should this be published in?

      Note: 0= lowest/none, 5= highest/best </td> <td>4; </td> <td>3.5-5.0 </td> <td>I think it would be a good fit for the Journal of Economic Perspectives or Journal of Economic Literature </td> <td>5 </td> <td>Medium-high </td> <td>As currently written, it is a chapter in a wide-ranging book, not a journal article. But I think that the content could have been written as (perhaps two) journal articles, each of which would have had very important new things to say about a very important subject. </td> </tr> </table>

    4. Editorial Note (David Reinstein)

      We are grateful to the authors of this paper for agreeing to participate and engage with the Unjournal’s evaluation of this paper, and for following through with this. (Although this was an NBER working paper this was selected before we began the “Unjournal Direct track”.)

      In our current phase, The Unjournal is mainly targeting empirical papers (and papers with quantitative simulations, impact evaluations, direct policy recommendations, etc.) This paper would instead mainly be considered ‘applied macroeconomic/growth theory’. Nonetheless, we saw this work as particularly important and influential for reasons mentioned here (considering tradeoffs between positive and negative consequences of AI; explicit economic modeling of 'singularities; and the paper appears in ‘economics of effective altruism and longermism’ syllabi; and its nearly 500 citations).

      We are also grateful for the extremely diligent work of the evaluators. My impression (from my own experience, from discussions, and given the incentives we have in place) is that we rarely see referees and colleagues actually reading and the checking math and proofs in their peers’ papers. Here Phil Trammel did so and spotted an error in a proof of one of the central results of the paper (the ‘singularity’ in Example 3). Thankfully, he was able to communicate with the authors, and work out a corrected proof of the same result (see philiptrammell.com “Growth given Cobb-Douglas Automation”) currently linked here.

      The authors have acknowledged and this error (and a few smaller bugs), confirmed the revised proof, and link a marked up version on their page. This is ‘self-correcting research’, and it’s great!

      Even though the same result was preserved, I believe this provides a valuable service.

      1. Readers of the paper who saw the incorrect proof (particularly students) might be deeply confused. They might think ‘Can I trust this papers’ other statements?’ ‘Am I deeply misunderstanding something here? Am I not suited for this work?’ This happened to me a lot in graduate school; at least some of the time it may have been because of errors and typos in the paper.
      2. I suspect many math-driven paper also contain flaws which are never spotted, and these sometimes may affect the substantive results (unlike in this case).

      Again, I’m grateful for the present authors for being willing to put their work through this public checking, and acknowledging and correcting the errors. I now have more confidence that the paper’s results are valid, and that the authors have confidence in their work. This makes their research output more credible overall, and it sets a great example for the field.

      Evaluators were asked to follow the general guidelines available here . For this paper we did not give specific suggestions on ‘which aspects to evaluate’. In addition to written evaluations (similar to journal peer review), we ask evaluators to provide quantitative metrics on several aspects of each article. These are put together below.

    5. Evaluation 2 (Phil Trammell)

      Written report

      This piece is the chapter on AI and economic growth in Agrawal et al.’s 2019 Economics of Artificial Intelligence: An Agenda. In introducing their chapter, Aghion et al. write that their “primary goal” with it “is to help shape an agenda for future research.” In total, the piece seems to have three goals. First: section 2 contributes to the theory of bread-and-butter automation and industrial growth, supported in part by empirical observations presented in section 6. Second: sections 3 and 4 contribute to the theory of AI and economic growth, in the setting of an R&D-based growth model. (An appendix does so in the setting of a Schumpeterian growth model.) Finally: section 5 informally discusses the implications of AI for growth within models that give firm incentives a central role, and topics for future research in this area.

      It is a shame that the authors felt compelled to pack so much in. Each of these three components could easily have generated an excellent piece of its own. Indeed, in my judgment, both of the paper’s original contributions far outshine its commentary on future research directions. Some compression of this kind was probably warranted in context, given the rest of the Agenda’s relative neglect of growth, and how much on the subject there is to say. Nevertheless, the result is a document that both abounds with a truly remarkable array of important new insights about AI and growth, and has somewhat more than the usual share of mistakes and awkward inclusions or omissions.

      The outright mistakes are perhaps the more minor flaws, since they are easily corrected on a close reading. Indeed, the PDF on one author’s website when this review was being written already corrected five, in red, from the version published in the Agenda. Writing this review uncovered five more (now incorporated in a further edited PDF, mostly in blue). Of course, some mistakes are understandable, and none so far identified overturn the paper’s central conclusions. Still, they make it harder for a reader to trust any results he has not checked.

      The greater flaws, in my view, are the scattered inclusions and important-seeming omissions. As discussed further below, furthermore, these decisions on both counts tend to steer the paper away from scenarios in which AI produces a departure from the “Kaldor Facts” of constant growth rates and factor shares.

      The body of the paper opens by exploring how AI might come to replace human labor in every task yet fail to produce any break in economic trends. It does so by introducing in section 2 a simple model in which, over time, asymptotically 100% of tasks are automated, yet the stylized facts of historical growth all asymptotically obtain. In particular, the model asymptotically yields a constant and positive labor share, growth rate, level of capital-augmenting technology, and growth rate in labor-augmenting technology.

      Though the model is presented as a baseline from which to explore AI and growth further, it is a brilliant insight on its own. Uzawa’s (1961) Theorem teaches us that to match the broad strokes of industrialized growth, all technology growth can—and sometimes must—be modeled as labor-augmenting. This result offers a valuable guide to closed-form modeling, but no intuition about how technology develops “under the hood”. The image it most directly invokes—of workers buzzing about their work ever faster, and capital accumulating unchanged beside them—is absurd. But more realistic models, in which technological progress consists primarily in the creation of more capable machinery (and leaves workers’ flesh and bones largely untouched), had proved difficult to reconcile with the stylized facts above. Zeira’s (1998) model of automation, for instance, predicts an ever-increasing growth rate and capital share. For offering such an elegant, tractable, and intuitive reconciliation of automation with the stylized facts, I would say that the model of section 2 deserves a place in all but the most elementary introductions to growth theory.

      Its quality as a contribution to the theory of historical growth, in turn, strengthens it as a contribution to the theory of growth under AI. The insight that, in the long run, an arbitrarily high fraction of human jobs may be automated without changing the labor share or growth rate is valuable, and at odds with much of the public conversation around automation and work. But after reflecting briefly on the implications of low substitutability across tasks, it is not very surprising that one can write down some model in which this occurs. The surprise, at least to me, is that arguably the most reasonable stylized account of historical automation to date turns out to be just such a model. This observation constitutes a powerful argument for the classic view that, for the foreseeable future, AI advances will amount only to “more of the same”.

      The case for this model, or at least this view, is bolstered by the observation in section 6 that in industries with more automation, labor productivity rises but not the capital share.

      Having delivered this excellent contribution, section 2 closes with a rather ad hoc simulation in which automation proceeds not continuously but on and off in 30-year spurts. The simulation reveals that exogenous fluctuations in automation can produce fluctuations in growth rates and factor shares, and can generate a capital share that rises, falls, or stays constant over the longer run. The motivation for this flourish is evidently that the simple model generates constant growth and a capital share that rises over time (albeit asymptotically, to a value below 1), whereas the received wisdom is that growth rates and factor shares fluctuate but exhibit no trend at all.

      On its own, the fact that fluctuations in make fluctuations out is no surprise. Moreover, the fact that the simpler model produces an asymptotically rising capital share is to my mind not a weakness but yet another strength. The capital share hasrisen over time, both recently and over the longer run, as documented by e.g. Piketty (2014). This trend has coincided with a rise in the capital-to-output ratio: a coincidence that, given a conventional CES production function, would imply that labor and capital are already gross substitutes.

      Piketty famously accepts this conclusion, despite extensive evidence against it from other domains, and makes it the cornerstone of his policy agenda. The Aghion et al. model of automation, meanwhile, departing only slightly from conventional CES, manages to reconcile the evidence of a historically (but not boundlessly) rising capital share with the evidence that labor and capital are still gross complements. Any reflections on the significance of this reconciliation, however, are seemingly crowded out of the paper by an awkward model-tweak that eliminates the reconciliation so as to hew to the “stylized facts” even more closely.

      With this foundation, sections 3 and 4 explore conditions under which even more thorough automation does produce more extreme consequences. In particular, it explores a Jones (1995)-style R&D-based growth model in which both a “final goods” sector and a “research” sector may be automated. Unfortunately, the results are presented in a way that somewhat deemphasizes the most radical growth possibilities. Still, they are taken more seriously than in any other economics publication to date.

      The paper’s first contribution in this direction is a labeling of explosive growth scenarios. Those in which the time-path of output has a vertical asymptote—a time before which output exceeds any finite level—are termed “Type II” growth explosions. (These vertical asymptotes are the mathematical singularities for which techno-accelerationist views are sometimes called “singularitarian”.) Growth scenarios in which the exponential growth rate of output rises boundlessly without producing a vertical asymptote are termed “Type I” growth explosions. Objections that either scenario is physically impossible miss the point. Eternal exponential growth, and even eternally constant output, are presumably impossible as well. What a taxonomy of this kind gives us is a guide to the circumstances under which AI developments should be expected to accelerate growth, and, at least in qualitative terms, how dramatically.

      Section 3 explains that asymptotic automation of research tasks, along the lines of section 2’s asymptotic automation of good production tasks, can allow for exponential growth in research inputs, technology, and thus output, even without population growth or any automation of final good production. Absent research automation, one of the latter two processes would be necessary for exponential output growth.

      The discussion here feels incomplete. Since the automation of research tasks is presumably itself the result of technological development, one wonders under what conditions this process can sustain itself. Here, however, the automation of research is simply presented as exogenous.

      Section 4.1 gives four examples of scenarios in which automation, within the frameworks introduced so far, can yield a growth explosion. Again, the discussion feels incomplete, now for two reasons. First, the examples are not systematic. Indeed, the scenario that would follow most straightforwardly from section 3—it turns out that, for some parameter values, growth is not only sustained but explosive when research automation is modeled as the output of technological development—is not discussed at all. Second, the discussion of the scenarios themselves is sometimes patchy, as outlined below.

      Example 1 notes that full automation of final good production generates an “AK” economy. Output thus grows exponentially absent growth in technology (“A”), and double-exponentially given exponential growth in A. Not discussed is that output exhibits a Type I growth explosion even if we don’t simply stipulate exponential growth in A, but instead maintain the standard Jones idea production function and a constant population. In this case, A rises subexponentially but still unboundedly, and the exponential growth rate of output accordingly does the same.

      Examples 2 and 4 find that full automation of idea production alone suffices to produce a Type II growth explosion as long as ideas do not “get harder to find” too quickly. (That said, as noted in section 4.2, recent estimates suggest they do.)

      Example 3 finds that sufficient automation of good and idea production together produce a Type II growth explosion. A fortiori, it thus finds that the full automation of good and idea production—i.e., simply general AI—always produces a Type II growth explosion, whatever the rate at which ideas get harder to find and whatever values any other parameters take on. This could have been the paper’s headline result, but it is not even quite stated, let alone emphasized.

      Section 4’s discussion of explosive growth scenarios concludes by giving various roadblocks to them the “last word”. Some tasks may be near-impossible to automate, for instance, or near-impossible to make more productive even once automated. In the face of bottlenecks like these, singularitarian dynamics might break down.

      Finally, section 5 informally explores the growth implications of AI on a variety of views in which growth depends centrally on firm incentives. An appendix then delves formally into one such view: a Schumpterian model in which AI can slow growth by making it easier for actors to steal or replace each other’s innovations, thus disincentivizing their development.

      The three classes of considerations discussed in sections 5.1, 5.2, and 5.3 are AI’s growth implications via impacts on market structure, resource allocation across sectors, and firm organization respectively. In short, AI could increase or decrease an industry’s competitiveness, by making it easier to overcome barriers to entry (say, verifying quality in the absence of reputation) or to erect them (say, with closed networks). Models like that of Aghion and Howitt (1992), in turn, teach us that increases in competitiveness can increase or decrease innovation incentives. AI could also affect growth in other ways (say, by facilitating adjudication in the face of incomplete contracts). Collectively, these considerations render AI’s growth implications complex and ambiguous. They allow for the paper’s most explicit calls for follow-up research, and its most wide-ranging.

      It is not clear why this broad and open-ended discussion is reserved for firm-centric growth considerations in particular. As noted earlier, one can imagine a version of this paper that remains focused on formal results within the Jones-style R&D-based framework. But a paper surveying AI’s growth possibilities more broadly would ideally explore these implications from something closer to the full range of mainstream growth perspectives, including e.g. those with a central role for institutions or for human (and given AI, presumably machine) capital accumulation. Also, even within the firm-centric discussion, a singularity-sympathetic reader will again find something of a de-emphasis of AI’s radical potential. The singularities of section 4.1 are followed in 4.2 by the point that automation could face bottlenecks, for instance; the observation that attempts at growth-slowing idea theft could face bottlenecks too is left to the reader.

      The conclusion reinforces this slant. The only paragraph on explosive growth opens by noting it as a “(theoretical) possibility”, and goes on primarily to summarize why the possibility may fail.

      In fairness, this reticence may be due to a perception that many economists, jaded by the Luddite track record, would react poorly to models in which capital ever thoroughly substitutes for labor. For what it’s worth, Patrick Francois’s comment on the paper, published just after it in the Agenda, offers at least some evidence to the contrary: he quickly accepts the plausibility of near-term general AI, but muses on its implications for political economy rather than growth.

      All that said, in sum, Aghion et al. provide excellent and wide-ranging analyses of automation and of AI-driven growth. They take several valuable steps beyond prior work in either area (such as Nordhaus’s 2020-published exploration of a model with high substitutability in final goods but mere exogenous growth in technology). In effect, they synthesize and rigorize a number of observations about AI’s growth potential from the likes of Solomonoff (1985)—formerly perhaps best summarized by Sandberg (2010)—and bring them to economists’ attention. They then augment these observations with powerful new results and framings of their own. The result is the best economics paper published to date on what has as good a claim as anything to being the most important subject in the history of the world.

      Link to corrected proof.

      Evaluator details

      How long have you been in this field? \ ~4 years, if you mean doing original research in economic theory with a large proportion of my time; ~2.5 years, if you mean having some particular focus on growth theory or the economics of AI. \

      How many proposals and papers have you evaluated? \ I’m not sure how to interpret this.

      1. I’ve peer-reviewed one paper on the economics of AI.
      2. I’ve “evaluated” around 30 papers on the economics of AI in the course of writing a literature review on the subject.
      3. More generally, I’ve given informal feedback on many research ideas and papers in progress by fellow researchers at GPI, fellow economics graduate students, and people (usually undergraduates) interested in doing EA-relevant economics work who reach out or are put in touch with me in some way.
    1. Compiled ratings, author response, and editorial comment


      Ratings and predictions

      Ratings (1-100)

      <table> <tr> <td> </td> <td>Evaluator 1 </td> <td> </td> <td>Evaluator 2 </td> <td> </td> <td>Evaluator 3 </td> <td> </td> </tr> <tr> <td>Rating category </td> <td>Rating (0-100) </td> <td>90% CI </td> <td>Rating (0-100) </td> <td>90% CI </td> <td>Rating (0-100) </td> <td>Confidence </td> </tr> <tr> <td> </td> <td> </td> <td> </td> <td> </td> <td> </td> <td> </td> <td> </td> </tr> <tr> <td>Overall assessment </td> <td>40 </td> <td>20-60 </td> <td>80 </td> <td>60-90 </td> <td>65 </td> <td>Medium </td> </tr> <tr> <td>Advancing knowledge and practice </td> <td>30 </td> <td>20-60 </td> <td>80 </td> <td>70-90 </td> <td>70 </td> <td>Medium </td> </tr> <tr> <td>Methods: Justification, reasonableness, validity, robustness </td> <td>50 </td> <td>40-60 </td> <td>70 </td> <td>50-90 </td> <td>Not qualified </td> <td> </td> </tr> <tr> <td>Logic & communication </td> <td>60 </td> <td>40-75 </td> <td>85 </td> <td>65-95 </td> <td>80 </td> <td>Medium-to-high </td> </tr> <tr> <td>Open, collaborative, replicable </td> <td>70 </td> <td>40-75 </td> <td>73 </td> <td>50-95 </td> <td>Not qualified </td> <td> </td> </tr> <tr> <td>Relevance to global priorities </td> <td>90 </td> <td>60-95 </td> <td>85 </td> <td>70-90 </td> <td>80 </td> <td>High </td> </tr> </table>

      Journal predictions (1-5)

      <table> <tr> <td> </td> <td>Evaluator 1 </td> <td> </td> <td>Evaluator 2 </td> <td> </td> <td>Evaluator 3 </td> <td> </td> </tr> <tr> <td>Prediction metric </td> <td>Rating (0-5) </td> <td>90% CI </td> <td>Rating (0-5) </td> <td>90% CI </td> <td>Rating (0-5) </td> <td>Confidence </td> </tr> <tr> <td>What ‘quality journal’ do you expect this work will be published in? </td> <td>2 </td> <td>1-2 </td> <td>3.5 </td> <td>3-5 </td> <td>3.5 </td> <td>Medium </td> </tr> <tr> <td>On a ‘scale of journals’, what tier journal should this be published in? </td> <td>2 </td> <td>1-2 </td> <td>4 </td> <td>3-5 </td> <td>3.5 </td> <td>High </td> </tr> </table>

      Author response

      To start with we would like to commend the format and reviewer comments which were of extremely high quality. The evaluations provided well thought out and constructively critical analysis of the work, pointing out several assumptions which could impact findings of the paper while also recognizing the value of the work in spite of some of these assumptions. Research in this space is difficult due to the highly interdisciplinary nature of the questions being asked, and the major uncertainties that need to be addressed. We value good epistemics and understand that it takes many people critically looking at a problem to achieve this, which is what motivated our participation in the Unjournal pilot. A format which allows work to be published and reviewed in an open nuanced manner can reduce the friction of working on such questions and speed up communal sense making on important questions. We are excited to have participated and look forward to seeing how Unjournal progresses. We hope that future work highlighted by the reviewers that addresses assumptions and issues of the paper will be undertaken, by external parties who are better equipped to critically analyse this area of research improving epistemics in relation to nuclear risk, resilient foods, AGI safety and the greater the existential risk space.

      To clarify, the intention of the comparison of resilient foods to AGI safety was chosen as AGI safety is considered the greatest x-risk by many. Consequently, comparison of cost-effectiveness of resilient foods to AGI safety was intended to highlight the merit of resilient foods to motivate further investment, as opposed to motivating redirecting funding from AGI safety to resilient foods.

      We have included responses to aspects of the evaluations below.

      Evaluation 1

      Structure of cost-effectiveness argument

      • The biggest issue with interpretability this causes is that I struggle to understand what features of the analysis are making resilient food appear cost-effective because of some feature of resilient food, and which are making resilient food appear cost-effective because of some feature of AI. The methods used by the authors mean that a mediocre case for resilient food could be made to look highly cost-effective with an exceptionally poor case for AI, since their central result is the multiplier of value on a marginally invested dollar for resilient food vs AI. This is important, because the authors’ argument is that resilient food should be funded because it is more effective than AI Risk management, but this is motivated by AI Risk proponents agreeing AI Risk is important – in scenarios where AI Risk is not worth investing in then this assumption is broken and cost effectiveness analysis against a ’do nothing’ alternative is required. For example, the authors do not investigate scenarios where the benefit of the intervention in the future is negative because “negative impacts would be possible for both resilient foods and AGI safety and there is no obvious reason why either would be more affected”. While this is potentially reasonable on a mathematical level, it does mean that it would be perfectly possible for resilient foods to be net harmful and the paper not correctly identify that funding them is a bad idea – simply because funding AI Risk reduction is an even worse idea, and this is the only given alternative. If the authors want to compare AGI risk mitigation and resilient foods against each other without a ‘do nothing’ common comparator (which I do not think is a good idea), they must at the very least do more to establish that the results of their AI Risk model map closely to the results which cause the AI Risk community to fund AI Risk mitigation so much. As this is not done in the paper, a major issue of interpretability is generated.

      We could have compared to the Open Philanthropy last dollar if that had been available at the time of publishing ($200 trillion per world saved or 0.05 basis points of existential risk per $billion): https://forum.effectivealtruism.org/posts/NbWeRmEsBEknNHqZP/longterm-cost-effectiveness-of-founders-pledge-s-climate. Our median for spending $100 million is ~2x10^-10 far future potential increase per dollar, or 500 basis points per $billion, or ~10,000 times as cost-effective. Ours is about 500 times as cost effective as the upper bound on that page.

      • More generally, this causes the authors to have to write up their results in a non-natural fashion. As an example of the sort of issues this causes, conclusions are expressed in entirely non-natural units in places (“Ratio of resilient foods mean cost effectiveness to AGI safety mean cost effectiveness” given $100m spend), rather than units which would be more natural (“Cost-effectiveness of funding resilient food development”). I cannot find expressed anywhere in the paper a simple table with the average costs and benefits of the two interventions, although a reference is made to Denkenberger & Pearce (2016) where these values were presented for near-term investment in resilient food. This makes it extremely hard for a reader to draw sensible policy conclusions from the paper unless they are already an expert in AGI risk and so have an intuitive sense of what an intervention which is ‘3-6 times more cost-effective than AGI risk reduction’ looks like. The paper might be improved by the authors communicating summary statistics in a more straightforward fashion.

      Figure 5 is Far future potential increase per $, which is an absolute value. That said, we acknowledge that the presentation of findings throughout could have been made more straightforward for non-expert readers and will aim to communicate summary statistics in a more accessible way in future work.

      Continuing on from this point, I don’t understand the conceptual framework that has the authors consider the value of invested dollars in resilient food at the margin. The authors’ model of the value of an invested dollar is an assumption that it is distributed logarithmically. Since the entire premise of the paper hinges on the reasonability of this argument, it is very surprising there is no sensitivity analysis considering different distributions of the relationship between intervention funding and value. Nevertheless, I am also confused as to the model even on the terms the authors describe; the authors’ model appears to be that there is some sort of ‘invention’ step where the resilient food is created and discovered (this is mostly consistent with Denkenberger & Pearce (2016), and is the only interpretation consistent with the question asked in the survey). In which case, the marginal value of the first invested dollar is zero because the ’invention’ of the food is almost a discrete and binary step. The marginal value per dollar continues to be zero until the 86 millionth dollar, where the marginal value is the entire value of the resilient food in its entirety. There seems to be no reason to consider the marginal dollar value of investment when a structural assumption made by the authors is that there is a specific level of funding which entirely saturates the field, and this would make presenting results significantly more straightforward – it is highly nonstandard to use marginal dollars as the unit of cost in a cost-effectiveness analysis, and indeed is so nonstandard I’m not certain fundamental assumptions of cost-effectiveness analysis still hold.

      In the survey, we ask about the job of spending $100 million, but then we refer to the cost per life saved paper which discusses separate interventions of research, planning, and piloting, some of these interventions such as early stage research don't cost very much money and increase the probability of success, which is why we argue marginal thinking makes sense. For instance, significant progress in the last year in prioritizing the most cost-effective resilient foods that also feed a lot of people has been achieved, this could lead to development and deployment of much more effective food production methods for such scenarios.

      Methods

      The presentation of the sensitivity analysis as ‘number of parameters needed to flip’ is nonstandard, but a clever way to intuitively express the level of confidence the authors have in their conclusions. Although clever, I am uncertain if the approach is appropriately implemented; the authors limit themselves to the 95% CI for their definition of an ‘unfavourable’ parameter, and I think this approach hides massive structural uncertainty with the model. For example, in Table 5 the authors suggest their results would only change if the probability of nuclear war per year was 4.8x10^-5 (plus some other variables changing) rather than their estimated of 7x10^-3 (incidentally, I think the values for S model and E model are switched in Table 5 – the value for pr(nuclear war) in the table’s S model column corresponds to the probability given in the E model).

      This appears to be a coincidence, the lowest 5th percentile value of all nuclear war probabilities was used, which was given by the furthest year into the future with no nuclear war. For S model this is 49 years into the future and has a value of 4.8x10^-5 and for E model this is 149 years into the future and has a value of 1.8X10^-4 (see inserted screen shots).

      S model 5th percentile of nuclear war probability per year after x years of no nuclear war (lowest probability of nuclear war per year after 49 years no nuclear war): Figure link

      image1

      E model 5th percentile of nuclear war probability per year after x years of no nuclear war (lowest probability of nuclear war per year after 149 years no nuclear war): Figure link

      image2

      Third, the authors could have done more to make it clear that the ‘Expert Model’ was effectively just another survey with an n of 1. Professor Sandburg, who populated the Expert Model, is also an author on this paper and so it is unclear what if any validation of the Expert Model could reasonably have been undertaken – the E model is therefore likely to suffer from the same drawbacks as the S model. It is also unclear if Professor Sandburg knew the results of the S Model before parameterising his E Model – although this seems highly likely given that 25% of the survey’s respondents were Professor Sandburg’s co-authors. This could be a major source of bias, since presumably the authors would prefer the two models to agree and the expert parameterising the model is a co-author.

      Professor Sandberg was not shown the S model parameters to avoid introducing bias. That said, we acknowledge that the small size of the existential risk field, and influence of several highly cited early works such as the FHI TECHNICAL REPORT Global Catastrophic Risks Survey have the potential to introduce anchoring bias.

      Parameter estimates

      Notwithstanding my concerns about the use of the survey instrument, I have some object level concerns with specific parameters described in the model.

      • The discount rate for both costs and benefits appears to be zero, which is very nonstandard in economic evaluation. Although the authors make reference to “long termism, the view that the future should have a near zero discount rate”, the reference for this position leads to a claim that a zero rate of pure time preference is common, and a footnote observing that “the consensus against discounting future well-being is not universal”. To be clear, pure time preference is only one component of a well-constructed discount rate and therefore a discount rate should still be applied for costs, and probably for future benefits too. Even notwithstanding that I think this is an error of understanding, it is a limitation of the paper that discount rates were not explored, given they seem very likely to have a major impact on conclusions.

      Thank you for highlighting this point, this is an important consideration that would make valuable future work.

      • A second concern I have relating to parameterisation is the conceptual model leading to the authors’ proposed costing for the intervention. The authors explain their conceptual model linking nuclear war risk to agricultural decline commendably clearly, and this expands on the already strong argument in Denkenberger & Pearce (2016). However, I am less clear on their conceptual model linking approximately $86m of research to the widescale post-nuclear deployment of resilient foods. The assumption seems to be (and I stress this is my assumption based on Denkenberger & Pearce (2016) – it would help if the authors could make it explicit) that $86m purchases the ‘invention’ of the resilient food, and once the food is ‘invented’ then it can be deployed when needed with only a little bit of ongoing training (covered by the $86m). This seems to me to be an optimistic assumption; there seems to be no cost associated with disseminating the knowledge, or any raw materials necessary to culture the resilient food. Moreover, the model seems to structurally assume that distribution chains survive the nuclear exchange with 100% certainty (or that the materials are disseminated to every household which would increase costs), and that an existing resilient food pipeline exists at the moment of nuclear exchange which can smoothly take over from the non-resilient food pipeline.

      Denkenberger & Pearce (2016) does not include costs post GCR and only considers R&D, and response and preparedness planning, and related costs pre disaster. This work pre disaster would likely result in expenditure post disaster being significantly lower than stored food.

      I have extremely serious reservations about these points. I think it is fair to say that an economics paper which projected benefits as far into the future as the authors do here without an exploration of discount rates would be automatically rejected by most editors, and it is not clear why the standard should be so different for existential risk analysis. A cost of $86m to mitigate approximately 40% of the impact of a full-scale nuclear war between the US and a peer country seems prima facie absurd, and the level of exploration of such an important parameter is simply not in line with best practice in a cost-effectiveness analysis (especially since this is the parameter on which we might expect the authors to be least expert). I wouldn’t want my reservations about these two points to detract from the very good and careful scholarship elsewhere in the paper, but neither do I want to give the impression that these are just minor technical details – these issues could potentially reverse the authors’ conclusions, and should have been substantially defended in the text.

      We agree that this estimate from the published work is likely low and have since updated our view on cost upwards. The nuclear war probability utilized does not include other sources of nuclear risk such as accidental detonation of nuclear weapons leading to escalation, intentional attack, or dyads involving China.

      Evaluation 2

      The Methods section is well organised and documented, but once in a while it lacks clarity and it uses terminology that may or may not be appropriate. Here’s a list of things Ii found a bit confusing:

      • Terminology
        • The submodels for food and AGI are said to be “independent”; is this meant in a probabilistic way? Are there no hidden/not modelled variables that influence both?

      In reality we anticipate that there are a myriad of ways in which nuclear risk and AGI would interact with one another. Are AI systems implemented in nuclear command and control? If so when and how does this change nuclear war probability? What will data sets used to train AI systems post nuclear exchange look like compared to present? Post nuclear exchange will there be greater pressure to utilize autonomous systems? How many/which chip fabs will be destroyed during a nuclear exchange?

      Capturing such interactions in the model in a rigorous way would have required a considerable section within the paper, which was beyond the scope of what could be included. We raised that the submodels are independent to make people aware of this simplifying assumption.

      We believe that investigating the interdependence of x-risks is an important open question that would make valuable future work.

      • The “expert” model was quite confusing for me, maybe because “Sandberg” and the reference number after “Sandberg” don’t match, or maybe because I was expecting a survey vs. expert judgement quantification of uncertainty. As I said (structured) expert judgement is one of my interests: https://link.springer.com/book/10.1007/978-3-030-46474-5

      There is an error in the referencing, this should have linked to the following guesstimate model: Denkenberger, D., Sandberg, A., Cotton-Barrat, O., Dewey, D., & Li, S. (2019b). Food without the sun and AI X risk cost effectiveness general far future impact publication. Guesstimate. https://www.getguesstimate.com/models/11691

      • In the caption of fig 2, “index nodes” and “variable nodes” are introduced. Index nodes are later described, but I don't think I understood what was meant by “variable” nodes. Aren’t all probabilistic nodes variable?

      This language comes from analytica taxonomies of the different types of nodes, this is simply describing what the nodes are in the analytica implementation. See this link for more information: https://docs.analytica.com/index.php/Create_and_edit_nodes

      • Underlying assumptions/definitions
      • The structure of the models is not discussed. How did you decide that this is a robust structure (no sensitivity to structure performed as far as I understood)

      An earlier model only considered collapse and nonrecovery of civilization as the route to far future impact. The current structure developed the structure further and is more inclusive.

      • What is meant by “the data from surveys was used directly instead of constructing continuous distributions”?

      Instead of sampling from a distribution created from the survey data, the model randomly draws a survey response value from the index of values for each of the 32000 model runs.

      It is great that the models are available upon request, but it would be even better if they would be public so the computational reproducibility could be evaluated as well.

      Links to the models are available at the following links.

      S-model: https://www.getguesstimate.com/models/13082

      E-model: https://www.getguesstimate.com/models/11691


      Editorial note

      Evaluators were asked to follow the general guidelines available here. They were also provided with this document with additional resources specific to the paper, rationale for its selection, and an ‘editorial’ first pass of aspects of the paper to consider.

      Note that this evaluation was organized during the Unjournal pilot phase and was managed manually using several Google Docs. The format may differ from future evaluations that will be managed with the Kotahi Platform.

      The evaluations were conducted on the version of the article published in the International Journal of Disaster Risk Reduction, however, we have posted the evaluations on the preprint for accessibility.

  2. Jan 2023
    1. Evaluation 3


      Ratings and predictions

      Ratings (1-100)

      • Overall assessment: 65 Confidence: Medium
      • Advancing knowledge and practice: 70 Confidence: Medium
      • Methods: Justification, reasonableness, validity, robustness: Not qualified
      • Logic & communication: 80 Confidence: Medium-to-high
      • Open, collaborative, replicable: Not qualified
      • Relevance to global priorities: 80 Confidence: High

      Journal predictions (1-5)

      • What ‘quality journal’ do you expect this work will be published in? 3.5 Confidence: Medium
      • On a ‘scale of journals’, what tier journal should this be published in? 3.5 Confidence: Medium

      Written report

      I am a political scientist specializing in science policy (i.e., how expertise and knowledge production influences the policymaking process and vice-versa), with a focus on “decision making under conditions of uncertainty,” R&D prioritization, and the governance of systemic and catastrophic risk. With respect to the various categories of expertise highlighted by the authors, I can reasonably be considered a “policy analyst.”

      Potential conflict of interest/source of bias: one of the authors (Dr. Anders Sandberg) is a friend and former colleague. He was a member of my PhD dissertation committee.

      A quick further note on the potential conflict of interest/bias of the authors (three of the four are associated with ALLFED, which, as the authors note, could stand to benefit financially from the main implication of their analysis - that significant funding be allocated to resilient food research in the short-term). In my opinion, this type of “self-advocacy” is commonplace and, to some extent, unavoidable. Interest and curiosity (and by extension, expertise) on a particular topic motivates deep analysis of that topic. It’s unlikely that this kind of deep analysis (which may or may not yield these sorts of “self-confirming” conclusions/recommendations) would ever be carried out by individuals who are not experts on - and often financially implicated in - the topic. I think their flagging of the potential conflict of interest at the end of the paper is sufficient - and exercises like this Unjournal review further increase transparency and invite critical examinations of their findings and “positionality.”

      I am unqualified to provide a meaningful evaluation of several of the issues “flagged” by the authors and editorial team, including: the integration of the sub-models, sensitivity analysis, and alternative approaches to the structure of their Monte Carlo analysis. Therefore, I will focus on several other dimensions of the paper.

      Context and contribution

      This paper has two core goals: (1) to explore the value and limitations of relative long-term cost effectiveness analysis as a prioritization tool for disaster risk mitigation measures in order to improve decision making and (2) to use this prioritization tool to determine if resilient foods are more cost effective than AGI safety (which would make resilient food the highest priority area of GCR/X-risk mitigation research). As I am not qualified to directly weigh in on the extent to which the authors’ achieved either goal, I will reflect on the “worthiness” of this goal within the broader context of work going on in the fields of X-risk/GCR, long-termism, science policy, and public policy - and the extent to which the authors’ findings are effectively communicated to these audiences.

      Within this broader context, I believe that these are indeed worthy (and urgent) objectives. The effective prioritization of scarce resources to the myriad potential R&D projects that could (1) reduce key uncertainties, (2) improve political decision-making, and (3) provide solutions that decrease the impact and/or likelihood of civilization-ending risk events is a massive and urgent research challenge. Governments and granting agencies are desperate for rigorous, evidence-based guidance on how to allocate finite funding across candidate projects. Such prioritization is impeded by uncertainty about the potential benefits of various R&D activities (partially resulting from uncertainty about the likelihood and magnitude of the risk event itself - but also from uncertainty about the potential uncertainty-reducing and harm/likelihood-reducing “power” of the R&D). Therefore, the authors’ cost-effectiveness model, which attempts to decrease uncertainty about the potential uncertainty-reducing and harm/likelihood-reducing “power” of resilient food R&D and compare it to R&D on AGI safety, is an important contribution. It combines and applies a number of existing analytical tools in a novel way and proposes a tool for quantifying the relative value of (deeply uncertain) R&D projects competing for scarce resources.

      Overall, the authors are cautious and vigilant in qualifying their claims - which is essential when conducting analysis that relies on the quasi-quantiative aggregation of the (inter)subjective beliefs of experts and combines several models (each with their own assumptions).

      Theoretical/epistemic uncertainty

      I largely agree with the authors’ dismissal of theoretical/epistemic uncertainty (not that they dismiss its importance or relevance - simply that they believe there is essentially nothing that can be done about it in their analysis). Their suggestion that “results should be interpreted in an epistemically reserved manner” (essentially a plea for intellectual humility) should be a footnote in every scholarly publication - particularly those addressing the far future, X-risk, and value estimations of R&D.

      However, the authors could have bolstered this section of the paper by identifying some potential sources of epistemic uncertainty and suggesting some pathways for further research that might reduce it. I recognize that they are both referring to acknowledged epistemic uncertainties - which may or may not be reducible - as well as unknown epistemic uncertainties (i.e., ignorance - or what they refer to as “cluelessness”). It would have been useful to see a brief discussion of some of these acknowledged epistemic uncertainties (e.g., the impact of resilient foods on public health, immunology, and disease resistance) to emphasize that some epistemic uncertainty could be reduced by exactly the kind of resilient food R&D they are advocating for.

      Presentation of model outputs

      When effectively communicating uncertainties associated with research findings to multiple audiences, there is a fundamental tradeoff between the rigour demanded by other experts and the digestibility/usability demanded by decision makers and lay audiences. For example, this tradeoff has been well-documented in the literature on the IPCC’s uncertainty communication framework (e.g., Janzwood 2020). What fellow-modelers/analysts want/need is usually different from what policymakers want/need. The way that model outputs are communicated in this article (e.g. 84% confidence that the 100 millionth dollar is more cost-effective) leans towards rigour and away from digestibility/usability. A typical policymaker who is unfamiliar with the modeling tools used in this analysis may assume that an 84% probability value was derived from historical frequencies/trials in some sort of experiment - or that it simply reflects an intersubjective assessment of the evidence by the authors of the article. Since the actual story for how this value was calculated is rather complex (it emerges from a model derived from the aggregation of the outputs of two sub-models, which both aggregate various types of expert opinions and other forms of data) - it might be more useful to communicate the final output qualitatively.

      This strategy has been used by the IPCC to varying levels of success. These qualitative uncertainty terms can align with probability intervals. For example, 80-90% confidence could be communicated as “high confidence” or “very confident.” >90% could be communicated as “extremely confident.” There are all sorts of interpretation issues associated with qualitative uncertainty scales - and some scales are certainly more effective than others (again, see Janzwood 2020) but it is often useful to communicate findings in two “parallel tracks” - one for experts and one for a more lay/policy-focused audience.

      Placing the article’s findings within the broader context of global priorities and resource allocation

      Recognizing the hard constraints of word counts - and that a broader discussion of global priorities and resource allocation was likely “out of scope” - this article could be strengthened (or perhaps simply expanded upon in future work) by such a discussion. The critical piece of context is the scarcity of resources and attention within the institutions making funding decisions about civilization-saving R&D (governments, granting organizations, private foundations, etc.). There are two dimensions worth discussing here. First, R&D activities addressing risks that are generally considered low-probability/high-impact with relatively long timelines (although I don’t think the collapse of global agricultural would qualify as low-risk - nor is the likely timeline terribly long - but those are my priors) are competing for scarce funding/attention against R&D activities addressing lower-impact risks believed to be shorter-term and more probable (e.g., climate change, the next pandemic, etc.). I think most risk analysts - even hardcore “long-termists” - would agree that an ideal “R&D funding portfolio” be somewhat diversified across these categories of risk. It is important to acknowledge the complexity associated with resource allocation - not just between X-risks but between X-risks and other risks.

      Second, there is the issue of resource scarcity itself. On the one hand, there are many “high value” candidate R&D projects addressing various risks that societies can invest in - but only a finite amount of funding and attention to allocate between them. So, these organizations must make triage decisions based on some criteria. On the other hand, there are also a lot of “low” or even “negative value” R&D activities being funded by these organizations - in addition to other poor investments - that are providing little social benefit or are actively increasing the likelihood/magnitude of various risks. I believe that it is important in these sort of discussions about R&D prioritization and resource scarcity to point out that the reosource pool need not be this shallow - and to identify some of the most egregious funding inefficiencies (e.g., around fossil fuel infrastructure expansion). It should go without saying— but ideally, we could properly resource both resilient food and AGI safety research.


      Evaluator details

      1. Name: Scott Janzwood
      2. How long have you been in this field? 10 years
      3. How many proposals and papers have you evaluated? ~25 proposals, ~10 papers
    2. Evaluation 2


      Ratings and predictions

      Ratings (1-100)

      • Overall assessment: 80 CI: 60-90
      • Advancing knowledge and practice: 80 CI: 70-90
      • Methods: Justification, reasonableness, validity, robustness: 70 [(60+80+70+70)/4] CI: 50-90
        • Comment: Many components to rate, composite uncertainty in my rating :| many choices left unexplained (on the other hand there were so many choices that it would’ve been hard to justify all), the majority seemed reasonable. Validity and robustness for me are hard to assess, but I based my numbers on the discussion about uncertainties and the sensitivity analysis.
      • Logic & communication: 85 [(90 +80)/2] CI: 65-95
        • Comment: Both excellent, but because the paper is so dense it’s sometimes hard to follow.
      • Open, collaborative, replicable: 73 [(80+80+60)/3] CI: 50-95
        • Comment: The large uncertainty and a reduced score for replicable is due to the various data sources/references used in the quantification of the models. Many can be recovered from other papers which may or may not be replicable/straight forward. A table with all data sources and functional relationships would’ve been useful. Also a bit more info on model E would help reduce my uncertainty here.
      • Relevance to global priorities: 85 CI: 70-90

      Journal predictions (1-5)

      • What ‘quality journal’ do you expect this work will be published in? 3.5 CI: 3–5
        • Comment: Biased by the knowledge that it has been published and my trust in the reviewing process.
      • On a ‘scale of journals’, what tier journal should this be published in? 4 CI: 3-5

      Written report

      I really enjoyed reading this paper and I did learn a lot as well, so thank you for putting it together. It is a very clearly presented, very dense piece of research. My area of expertise however is risk and decision analysis under uncertainty. I am a probabilistic modeller with loads of experience in structured expert judgement used to quantify uncertainty when data are sparse or lacking. My evaluation will therefore not cover the application area as such, as I have no experience with catastrophic or existential risk. I assume the cited literature is appropriate and not a non-representative sample, but I did not spend time verifying this assumption.

      At a first read, both the title and the abstract left me wondering if the present analysis compares cost-effectiveness (of resilient foods) with safety (of AGI), which would’ve been a strange comparison to make. However, after reading the very clear (and dense) Introduction, things became very clear. The only minor comment I have about the Introduction is that it sounds more ambitious than what the results provide with respect to the second objective.

      The Methods section is well organised and documented, but once in a while it lacks clarity and it uses terminology that may or may not be appropriate. Here’s a list of things Ii found a bit confusing:

      • Terminology
        • The first sentence mentioned “parameters” without the context of what these parameters may be (sometimes random variables are called parameters, some other times the parameters of a distribution are referred to as parameters, etc)
        • The probability distribution of the “expected cost effectiveness”. Is “expected” in this context meant in a probabilistic sense, i.e., the expectation of the random variable “cost effectiveness”?
        • The submodels for food and AGI are said to be “independent”; is this meant in a probabilistic way? Are there no hidden/not modelled variables that influence both?
        • The “expert” model was quite confusing for me, maybe because “Sandberg” and the reference number after “Sandberg” don’t match, or maybe because I was expecting a survey vs. expert judgement quantification of uncertainty. As I said (structured) expert judgement is one of my interests (https://link.springer.com/book/10.1007/978-3-030-46474-5 )
        • In the caption of fig 2, “index nodes” and “variable nodes” are introduced. Index nodes are later described, but I don't think I understood what was meant by “variable” nodes. Aren’t all probabilistic nodes variable?
      • Underlying assumptions/definitions
        • Throughout the methods section I missed a table with a list of all variables, how where they measured, on what sort of scale, or using what formula, where were they quantified from (data, surveys, literature +reference, and if taken from other studies, what were the limitations of those studies)
        • Some of the parameters of the, say, Beta distributions are mentioned but not justified
        • The structure of the models is not discussed. How did you decide that this is a robust structure (no sensitivity to structure performed as far as I understood)
        • What is meant by “the data from surveys was used directly instead of constructing continuous distributions”?
        • The arcs in Fig 1 are unclear, some of them seem misplaced, while others seem to be missing. This can be a misunderstanding from my part, so maybe more text about Fig.1 would help.
        • It is unclear if the compiled data sets are compatible. I think the quantification of the model should be documented better or in a more compact way.

      The Results section is very clear and neatly presented and I did enjoy the discussion on the several types of uncertainty.

      It is great that the models are available upon request, but it would be even better if they would be public so the computational reproducibility could be evaluated as well.

      Some of the references are missing links in the text, and at least one does not link to the desired bibitem.


      Evaluator details

      1. Name: Anca Hanea
      2. How long have you been in this field? It depends what you mean by “this field”. I have been working in risk and uncertainty analysis and decision making under uncertainty since 2009 (I’ve defended my PhD in december 2008). I have never collaborated on a global catastrophic or existential risk application. I am familiar with food (in)security and collaborated on a project on that once.
      3. How many proposals and papers have you evaluated? I lost count of the papers. I am a reviewer for more than 20 journals, I have co-edited special issues and I am one of the editors for one journal. I have evaluated less than ten research proposals throughout the years.
    3. Evaluation 1


      Ratings and predictions

      Ratings (1-100)

      • Overall assessment: 40 CI: 20-60
        • Comment: See main review
      • Advancing knowledge and practice: 30 CI: 20-60
        • Comment: The paper itself makes an important argument about resilient foods, but I don’t know if the additional element of AGI risk adds much to Denkenberger & Pearce (2016)
      • Methods: Justification, reasonableness, validity, robustness: 50 CI: 40-60
        • Comment: Very major limitations around the survey method, and implementation of certain parts of the parameter sensitivity analysis. However many elements of a high standard
      • Logic & communication: 60 CI: 40-75
        • Comment: Major limitations around the logic and communication of the theoretical model of cost-effectiveness used in the paper. Minor limitations of readability and reporting which could have been addressed before publication (such as reporting 95% CIs without medians, and not reporting overall cost and benefit estimates)
      • Open, collaborative, replicable: 70 CI: 40-75
        • Comment: Provided models are shared with any reader who asks, I couldn’t ask for more here. Limitations of survey replicability (particularly E model) prevent perfect score
      • Relevance to global priorities: 90 CI: 60-95
        • Comment: I’d be surprised if I ever read a paper with more relevance to global priorities, although as mentioned there are a few version of this argument circulating such as Denkenberger & Pearce (2016)

      Journal predictions (1-5)

      • What ‘quality journal’ do you expect this work will be published in? 2 CI: 1-2
      • On a ‘scale of journals’, what tier journal should this be published in? 2 CI: 1-2

      Written report

      This is a very interesting paper on an important and neglected topic. I’d be surprised if I ever again read a paper with such potential importance to global priorities. The authors motivate the discussion well, and should be highly commended for their clear presentation of the structural features of their model, and the thoughtful nature in which uncertainty was addressed head-on in the paper.

      Overall, I suspect the biggest contribution this paper will make is contextualising the existing work done by the authors on resilient food into the broader literature of long-termist interventions. This is a significant achievement, and the authors should feel justifiably proud of having accomplished it. However, the paper unfortunately has a number of structural and technical issues which should significantly reduce a reader’s confidence in the quantitative conclusions which aim to go beyond this contextualisation.

      In general, there are three broad areas where I think there are material issues with the paper:

      1. The theoretical motivation for their specific philosophy of cost-effectiveness, and specifically whether this philosophy is consistent throughout the essay
      2. The appropriateness of the survey methods, in the sense of applying the results of a highly uncertain survey to an already uncertain model
      3. Some specific concerns with parameterisation

      None of these concerns touch upon what I see at the main point of the authors, which I take to be that ‘fragile’ food networks should be contextualised alongside other sources of existential risk. I think this point is solidly made, and important. However, they do suggest that significant additional work may be needed to properly prove the headline claim of the paper, which is that in addition to being a source of existential risk the cost-effectiveness of investing in resilient food is amongst the highest benefit-per-cost of any existential risk mitigation.

      Structure of cost-effectiveness argument

      One significant highlight of the paper is the great ambition it shows in resolving a largely intractable question. Unfortunately, I feel this ambition is also something of a weakness of the paper, since it ends up difficult to follow the logic of the argument throughout.

      • Structurally, the most challenging element of this paper in terms of argumentative flow is the decision to make the comparator for cost-effectiveness analysis ‘AGI Catastrophe’ rather than ‘do nothing’. My understanding is that the authors make this decision to clearly highlight the importance of resilient food – noting that, “if resilient foods were more cost effective than AGI safety, they could be the highest priority [for the existential risk community]” (since the existential risk community currently spends a lot on AGI Risk mitigation). So roughly, they start with the assumption that AI Risk must be cost-effective, and argue that anything more cost-effective than this must therefore also be cost-effective. The logic is sound, but this decision causes a number of problems with interpretability, since it requires the authors to compare an already highly uncertain model of food resilience against a second highly uncertain model of AGI risk.
      • The biggest issue with interpretability this causes is that I struggle to understand what features of the analysis are making resilient food appear cost-effective because of some feature of resilient food, and which are making resilient food appear cost-effective because of some feature of AI. The methods used by the authors mean that a mediocre case for resilient food could be made to look highly cost-effective with an exceptionally poor case for AI, since their central result is the multiplier of value on a marginally invested dollar for resilient food vs AI. This is important, because the authors’ argument is that resilient food should be funded because it is more effective than AI Risk management, but this is motivated by AI Risk proponents agreeing AI Risk is important – in scenarios where AI Risk is not worth investing in then this assumption is broken and cost effectiveness analysis against a ’do nothing’ alternative is required. For example, the authors do not investigate scenarios where the benefit of the intervention in the future is negative because “negative impacts would be possible for both resilient foods and AGI safety and there is no obvious reason why either would be more affected”. While this is potentially reasonable on a mathematical level, it does mean that it would be perfectly possible for resilient foods to be net harmful and the paper not correctly identify that funding them is a bad idea – simply because funding AI Risk reduction is an even worse idea, and this is the only given alternative. If the authors want to compare AGI risk mitigation and resilient foods against each other without a ‘do nothing’ common comparator (which I do not think is a good idea), they must at the very least do more to establish that the results of their AI Risk model map closely to the results which cause the AI Risk community to fund AI Risk mitigation so much. As this is not done in the paper, a major issue of interpretability is generated.
      • A second issue this causes is that the authors must make an awkward ‘assumption of independence’ between nuclear risk, food security risk and AI risk. Although the authors identify this as a limitation of their modelling approach, the assumption does not need to be made if AI risk is not included as a comparator in the model. I don’t think this is a major limitation of the work, but an example of how the choice of comparator has an impact on structural features of the model beyond just the comparator.
      • More generally, this causes the authors to have to write up their results in a non-natural fashion. As an example of the sort of issues this causes, conclusions are expressed in entirely non-natural units in places (“Ratio of resilient foods mean cost effectiveness to AGI safety mean cost effectiveness” given $100m spend), rather than units which would be more natural (“Cost-effectiveness of funding resilient food development”). I cannot find expressed anywhere in the paper a simple table with the average costs and benefits of the two interventions, although a reference is made to Denkenberger & Pearce (2016) where these values were presented for near-term investment in resilient food. This makes it extremely hard for a reader to draw sensible policy conclusions from the paper unless they are already an expert in AGI risk and so have an intuitive sense of what an intervention which is ‘3-6 times more cost-effective than AGI risk reduction’ looks like. The paper might be improved by the authors communicating summary statistics in a more straightforward fashion. For example, I have spent some time looking for the probability the model assigns to no nuclear war before the time horizon (and hence the probability that the money spent on resilient food is ‘wasted’ with respect to the 100% shortfall scenario) but can’t find this – that seems to be quite an important summary statistic but it has to be derived indirectly from the model.

      Fundamentally, I don’t understand why both approaches were not compared to a common scenario of ‘do nothing’ (relative to what we are already doing). The authors’ decision to compare AGI Risk mitigation to resilient foods directly would only be appropriate if the authors expect that increasing funding for resilient food decreased funding for AI safety (that is to say, the authors are claiming that there is a fixed budget for AI-safety-and-food-resilience, and so funding for one must come at the expense of the other). This might be what the authors have in mind as a practical consequence of their argument, as there is an implication that funding for resilient foods might come from existing funding deployed to AGI Risk. But it is not logically necessary that this is the case, and so it creates great conceptual conclusion to include it in a cost-effectiveness framework that requires AI funding and resilient food funding to be strictly alternatives. To be clear, the ‘AI subunit’ is interesting and publishable in its own right, but in my opinion simply adds complexity and uncertainty to an already complex paper.

      Continuing on from this point, I don’t understand the conceptual framework that has the authors consider the value of invested dollars in resilient food at the margin. The authors’ model of the value of an invested dollar is an assumption that it is distributed logarithmically. Since the entire premise of the paper hinges on the reasonability of this argument, it is very surprising there is no sensitivity analysis considering different distributions of the relationship between intervention funding and value. Nevertheless, I am also confused as to the model even on the terms the authors describe; the authors’ model appears to be that there is some sort of ‘invention’ step where the resilient food is created and discovered (this is mostly consistent with Denkenberger & Pearce (2016), and is the only interpretation consistent with the question asked in the survey). In which case, the marginal value of the first invested dollar is zero because the ’invention’ of the food is almost a discrete and binary step. The marginal value per dollar continues to be zero until the 86 millionth dollar, where the marginal value is the entire value of the resilient food in its entirety. There seems to be no reason to consider the marginal dollar value of investment when a structural assumption made by the authors is that there is a specific level of funding which entirely saturates the field, and this would make presenting results significantly more straightforward – it is highly nonstandard to use marginal dollars as the unit of cost in a cost-effectiveness analysis, and indeed is so nonstandard I’m not certain fundamental assumptions of cost-effectiveness analysis still hold. I can see why the authors have chosen to bite this bullet for AI risk given the existing literature on the cost of preventing AI Catastrophe, but there seems to be no reason for it when modelling resilient food and it departs sharply from the norm in cost-effectiveness analysis.

      Finally, I don’t understand the structural assumptions motivating the cost-effectiveness of the 10% decline analysis. The authors claim that the mechanism by which resilient foods save lives in the 10% decline analysis is that “the prices [of non-resilient food] would go so high that those in poverty may not be able to afford food” with the implication that resilient foods would be affordable to those in poverty and hence prevent starvation. However, the economic logic of this statement is unclear. It necessitates that the production costs of resilient food is less than the production costs of substitute non-resilient food at the margin, which further implies that producers of resilient food can command supernormal profits during the crisis, which is to say the authors are arguing that resilient foods represent potentially billions of dollars of value to their inventor within the inventor’s lifetime. It is not clear to me why a market-based solution would not emerge for the ‘do nothing’ scenario, which would be a critical issue with the authors’ case since it would remove the assumption that ‘resilient food’ and ‘AGI risk’ are alternative uses of the same money in the 10% scenario, which is necessary for their analysis to function. The authors make the further assumption that preparation for the 100% decline scenario is highly correlated with preparation for the 10% decline scenario, which would mean that a market-based solution emerging prior to nuclear exchange would remove the assumption that ‘resilient food’ and ‘AGI risk’ are alternative uses of the same money in the 100% decline scenario. A supply and demand model might have been a more appropriate model for investigating this effect. Once again, I note that the supply and demand model alone would have been an interesting and publishable piece of work in its own right.

      Overall, I think the paper would have benefitted from more attention being paid to the underlying theory of cost-effectiveness motivating the investigation. Decisions made in places seem to have multiplied uncertainty which could have been resolved with a more consistent approach to analysis. As I highlighted earlier, the issues only stem from the incredible ambition of the paper and the authors should be commended for managing to find a route to connect two separate microsimulations, an analysis of funding at the margin and a supply-and-demand model. Nevertheless, the combination of these three approaches weakens the ability to draw strong conclusions from each of these approaches individually.

      Methods

      With respect to methods, the authors use a Monte Carlo simulation with distributions drawn from a survey of field experts. The use of a Monte Carlo technique here is an appropriate choice given the significant level of uncertainty over parameters. The model appears appropriately described in the paper, and functions well (I have only checked the models in Guesstimate, as I could not make the secondary models in Analytica function). A particular highlight of the paper is the figures clearly laying out the logical interrelationship of elements of the model, which made it significantly easier to follow the flow of the argument. I note the authors use ‘probability more effective than’ as a key result, which I think is a natural unit when working in Guesstimate. This is entirely appropriate, but a known weakness of the approach is that it can bias in favour of poor interventions with high uncertainty. The authors could also have presented a SUCRA analysis which does not have this issue, but they may have considered and rejected this approach as unnecessary given the entirely one-sided nature of the results which a SUCRA would not have reversed.

      The presentation of the sensitivity analysis as ‘number of parameters needed to flip’ is nonstandard, but a clever way to intuitively express the level of confidence the authors have in their conclusions. Although clever, I am uncertain if the approach is appropriately implemented; the authors limit themselves to the 95% CI for their definition of an ‘unfavourable’ parameter, and I think this approach hides massive structural uncertainty with the model. For example, in Table 5 the authors suggest their results would only change if the probability of nuclear war per year was 4.8x10^-5 (plus some other variables changing) rather than their estimated of 7x10^-3 (incidentally, I think the values for S model and E model are switched in Table 5 – the value for pr(nuclear war) in the table’s S model column corresponds to the probability given in the E model). But it is significantly overconfident to say that risk of nuclear war per year could not possibly be below 4.8x10^-5, so I think the authors overstate their certainty when they say “reverting [reversing?] the conclusion required simultaneously changing the 3-5 most important parameters to the pessimistic ends”; in fact it merely requires that the authors have not correctly identified the ‘pessimistic end’ of any one of the five parameters, which seems likely given the limitations in their data which I will discuss momentarily. I personally would have found one- and two-dimensional threshold analysis a more intuitive way to present the results, but I think the authors have a reasonable argument for their approach. As described earlier, I have some concerns that an appropriate amount of structural sensitivity analysis was undertaken, but the presentation of uncertainty analysis is appropriate in its own terms (if somewhat nonstandard).

      Overall, I have no major concerns about the theory or application of the modelling approach. However, I have a number of concerns with the use of the survey instrument:

      First, the authors could have done more to explain the level of uncertainty their survey instrument contains. They received eight responses, which is already a very low number of responses for a quantitative survey. In addition, two of the eight responses were from authors of the paper. The authors discuss ‘response bias’ and ‘demand characteristic bias’ which would not typically be applied to data generated by an approximately autoethnographic process – it is obvious that the authors of a survey instrument know what purpose the instrument is to be used for, and have incentives to make the survey generate novel and interesting findings. It might have been a good sensitivity analysis to exclude responses from the authors and other researchers associated with ALLFED since there is a clear conflict of interest that could bias results here.

      Second, issues with survey data collection are compounded by the fact that some estimates which are given in the S Model are actually not elicited with the survey technique – they are instead cited to Denkenberger & Pearce (2016) and Denkenberger & Pearce (2017). This is described appropriately in the text, but not clearly marked in the summary Table 1 where I would expect to see it, and the limitation this presents is not described clearly. To be explicit, the limitation is that at least two key parameters in the model are based on a sample of the opinions of two of the eight survey respondents, rather than the full set of eight respondents. As an aside on presentation, the decision to present lower and upper credible intervals in Table 1 rather than median is non-standard for an economics paper, although perhaps this is a discipline-specific convention I am unaware of. Regardless, I’m not sure it is appropriate to present the lowest of eight survey responses as the ‘5th percentile’, as it is actually the 13th percentile and giving 95% confidence intervals implies a level of accuracy the survey instrument cannot reach. While I appreciate the 13th percentile of 8 responses will be the same as the 5th centile of 100 samples drawn from those responses, this is not going to be clear to a casual reader of the paper. ‘Median (range)’ might be a better presentation of the survey data in this table, with better clarity on where each estimate comes from. Alternatively, the authors could look at fitting a lognormal distribution to the survey results using e.g. method of moments, and then resample from the new distribution to create a genuine 95% CI. Regardless, given the low number of responses, it might have been appropriate simply to present all eight estimates for each relevant parameter in a table.

      Third, the authors could have done more to make it clear that the ‘Expert Model’ was effectively just another survey with an n of 1. Professor Sandburg, who populated the Expert Model, is also an author on this paper and so it is unclear what if any validation of the Expert Model could reasonably have been undertaken – the E model is therefore likely to suffer from the same drawbacks as the S model. It is also unclear if Professor Sandburg knew the results of the S Model before parameterising his E Model – although this seems highly likely given that 25% of the survey’s respondents were Professor Sandburg’s co-authors. This could be a major source of bias, since presumably the authors would prefer the two models to agree and the expert parameterising the model is a co-author. I also think more work is needed to be done establishing the Expert’s credentials in the field of agricultural R&D (necessary for at least some of the parameter estimates); although I happily accept Professor Sandburg is a world expert on existential risk and a clear choice to act as the parameterising ‘expert’ for most parameters, I think there may have been alternative choices (such as agricultural economists) who may have been more obviously suited to giving some estimates. There is no methodological reason why one expert had to be selected to populate the whole table, and no defence given in the text for why one expert was selected - the paper is highly multidisciplinary and it would be surprising if any one individual had expert knowledge of every relevant element. Overall, this limitation makes me extremely hesitant to accept the authors’ argument that the fact that S model and E model are both robust means the conclusion is equally robust

      Generally, I am sympathetic to the authors’ claim that there is unavoidable uncertainty in the investigation of the far future. However, the survey is a very major source of avoidable uncertainty, and it is not a reasonable decision of the authors to present the uncertainty due to their application of survey methods as the same kind of thing as uncertainty about the future potential of humanity. There are a number of steps the authors could have taken to improve the validity and reliability of their survey results, some of which would not even have required rerunning the survey (to be clear however, I think there is a good case for rerunning the survey to ensure a broader panel of responses). With the exception of the survey, however, methods were generally appropriate and valid.

      Parameter estimates

      Notwithstanding my concerns about the use of the survey instrument, I have some object level concerns with specific parameters described in the model.

      • The discount rate for both costs and benefits appears to be zero, which is very nonstandard in economic evaluation. Although the authors make reference to “long termism, the view that the future should have a near zero discount rate”, the reference for this position leads to a claim that a zero rate of pure time preference is common, and a footnote observing that “the consensus against discounting future well-being is not universal”. To be clear, pure time preference is only one component of a well-constructed discount rate and therefore a discount rate should still be applied for costs, and probably for future benefits too. Even notwithstanding that I think this is an error of understanding, it is a limitation of the paper that discount rates were not explored, given they seem very likely to have a major impact on conclusions.
      • A second concern I have relating to parameterisation is the conceptual model leading to the authors’ proposed costing for the intervention. The authors explain their conceptual model linking nuclear war risk to agricultural decline commendably clearly, and this expands on the already strong argument in Denkenberger & Pearce (2016). However, I am less clear on their conceptual model linking approximately $86m of research to the widescale post-nuclear deployment of resilient foods. The assumption seems to be (and I stress this is my assumption based on Denkenberger & Pearce (2016) – it would help if the authors could make it explicit) that $86m purchases the ‘invention’ of the resilient food, and once the food is ‘invented’ then it can be deployed when needed with only a little bit of ongoing training (covered by the $86m). This seems to me to be an optimistic assumption; there seems to be no cost associated with disseminating the knowledge, or any raw materials necessary to culture the resilient food. Moreover, the model seems to structurally assume that distribution chains survive the nuclear exchange with 100% certainty (or that the materials are disseminated to every household which would increase costs), and that an existing resilient food pipeline exists at the moment of nuclear exchange which can smoothly take over from the non-resilient food pipeline.

      I have extremely serious reservations about these points. I think it is fair to say that an economics paper which projected benefits as far into the future as the authors do here without an exploration of discount rates would be automatically rejected by most editors, and it is not clear why the standard should be so different for existential risk analysis. A cost of $86m to mitigate approximately 40% of the impact of a full-scale nuclear war between the US and a peer country seems prima facie absurd, and the level of exploration of such an important parameter is simply not in line with best practice in a cost-effectiveness analysis (especially since this is the parameter on which we might expect the authors to be least expert). I wouldn’t want my reservations about these two points to detract from the very good and careful scholarship elsewhere in the paper, but neither do I want to give the impression that these are just minor technical details – these issues could potentially reverse the authors’ conclusions, and should have been substantially defended in the text.

      Conclusions

      Overall, this is a novel and insightful paper which is unfortunately burdened with some fairly serious conceptual issues. The authors should be commended for their clear-sighted contextualisation of resilient foods as an issue for discussion in existential risk, and for the scope of their ambition in modelling. Academia would be in a significantly better place if more authors tried to answer far-reaching questions with robust approaches, rather than making incremental contributions to unimportant topics.

      Where the issues of the paper lie are structural weaknesses with the cost-effectiveness philosophy deployed, methodological weaknesses with the survey instrument and two potentially conclusion-reversing issues with parameterisation which should have been given substantially more discussion in the text. I am not convinced that the elements of the paper which are robust are sufficiently robust to overcome these weaknesses – my view is that it would be premature to reallocate funding from AI Risk reduction to resilient food on the basis of this paper alone. The most serious conceptual issue which I think needs to be resolved before this can happen is to demonstrate that ‘do nothing’ would be less cost-effective than investing $86m in resilient foods, given that the ‘do nothing’ approach would potentially include strong market dynamics leaning towards resilient foods. I agree with the authors that an agent-based model might be appropriate for this, although a conventional supply-and-demand model might be simpler.

      I really hope the authors are interested in publishing follow-on work, looking at elements which I have highlighted in this review as being potentially misaligned to the paper that was actually published but which are nevertheless potentially important contributions to knowledge. In particular, the AI subunit is novel and important enough for its own publication.


      Evaluator details

      1. Name: Alex Bates
      2. How long have you been in this field? In the field of cost-effectiveness analysis, 10 years. I wouldn’t consider myself to be in the field of x-risk
      3. How many proposals and papers have you evaluated? I’ve lost count, but probably mid double figures - perhaps 50?