- Mar 2024
-
static1.squarespace.com static1.squarespace.comXPT.pdf16
-
AI-concerned think the risk that a genetically engineered pathogen will killmore than 1% of people within a 5-year period before 2100 is 12.38%, while the AIskeptics forecast a 2% chance of that event, with 96% of the AI-concerned abovethe AI skeptics’ median forecast
this seems like a sort of ad-hoc way of breaking up the data. What exactly is the question here, and why is this the best way to answer it?
-
hose who did best on reciprocal scoring had lower forecasts ofextinction risk.72 We separately compare each forecaster’s forecast of others’ forecasts on ten key questions, for both expertsand superforecasters. We rank each forecaster’s accuracy on those 20 quantities relative to other participants,and then we compute each forecaster’s average rank to calculate an overall measure of intersubjective accuracy.73 This may be because superforecasters are a more homogenous group, who regularly interact with eachother outside of forecasting tournaments like this.74 Pavel Atanasov et al., “Full Accuracy Scoring Accelerates the Discovery of Skilled Forecasters,” SSRN WorkingPaper, (February 14, 2023), http://dx.doi.org/10.2139/ssrn.4357367.
This seems visually the case, but I don't see metrics or statistical inference here.
-
ithin both groups—experts and superforecasters—more accurate reciprocalscores were correlated with lower estimates of catastrophic and extinction risk. Inother words, the better experts were at discerning what other people would predict,the less concerned they were about extinction
But couldn't this just be because people who think there is high Xrisk think others are likely to think like themselves? Is it more finely grained 'better reciprocal accuracy' than that?
-
otal Catastrophic Risk
The differences in the total x-risk are not quite so striking-- about 2:1 vs 6:1 What accounts for this? Hmm, this look different from the 'Total Extinction risk' in table 4. Here a notebook would be helpful. Ahh, it's because this is for catastrophic risk, not extinction risk.
-
First, we can rule out the possibility that experts can’t persuade others of the severityof existential risks simply because of a complete lack of sophistication, motivation,or intelligence on the part of their audience. The superforecasters have all thosecharacteristics, and they continue to assign much lower chances than do experts.
This paragraph seems a bit loosely argued.
-
Question and resolution details
They seem to have displayed the questions along with particular “Prior Forecasts” — is that appropriate? Could that be driving the persistent difference between the superforecasters and experts?
-
general x-riskexperts
What are 'general x-risk experts'? Give some examples.
-
The median participant who completedthe tournament earned $2,500 in incentives, but this figure is expected to rise asquestions resolve in the coming years.
fairly substantial incentives ... but it may have been time consuming; how many hours did it take?... and how much variation was there in the incentive pay/how sensitive was it to the predictions?
-
with 111completing all stages of the tournament
Would this attrition matter?
-
Participants made individual forecasts2. Teams comprised entirely of either superforecasters or experts deliberated andupdated their forecasts3. Blended teams from the second stage, consisting of one superforecaster team andone expert team, deliberated and updated their forecasts4. Each team saw one wiki summarizing the thinking of another team and againupdated their forecasts
with incentives for accuracy (or 'intersubjective' accuracy) at each stage, or only at the very end? Aldo incentives for making strong comments and (?) convincing others/
-
We also advertised broadly, reaching participants withrelevant experience via blogs and Twitter. We received hundreds of expressions ofinterest in participating in the tournament, and we screened these respondents forexpertise, offering slots to respondents with the most expertise after a review of theirbackgrounds.1
Recruitment of experts.
-
We explained that after the tournament we would show the highest-qualityanonymized rationales (curated by independent readers) to panels of online surveyparticipants who would make forecasts before and after reading the rationale. Prizesgo to those whose rationales helped citizens update their forecasts toward greateraccuracy, using both proper scoring rules for resolvable questions and intersubjectiveaccuracy for unresolvable questions.21
Is this approach valid? Would it give powerful incentives to be persuasive? What is are these rationales used for? Note that 'intersubjective accuracy' is not a ground truth for the latter questions.
-
One common challenge in forecasting tournaments is to uncover the reasoningbehind predictions.
How does this 'uncover the reasoning behind predictions'?
-
scoring ground rules: questions resolving by 2030were scored using traditional forecasting metrics where the goal was to minimize thegap between probability judgments and reality (coded as zero or one as a function ofthe outcome). However, for the longer-run questions, participants learned that theywould be scored based on the accuracy of their reciprocal forecasts: the better theypredicted what experts and superforecasters would predict for each question, thebetter their score.
Is the 'reciprocal scoring' rule likely to motivate honest (incentive-compatible) predictions? Is it likely to generate useful information in this context?
-
When we report probabilities of long-run catastrophic andexistential risk in this report, we report forecasters’ own (unincentivized) beliefs. But,we rely on the incentivized forecasts to calculate measures of intersubjective accuracy
This is a bit confusing. The language needs clarification. What exactly is 'intersubjective accuracy'?
-
the XPT:• What will be the global surface temperature change as compared to 1850–1900, indegrees Celsius? (By 2030, 2050, 2100)• By what year will fusion reactors deliver 1% of all utility-scale power consumed inthe U.S.?• How much will be spent on compute [computational resources] in the largest AIexperiment? (By 2024, 2030, 2050)• What is the probability that artificial intelligence will be the cause of death, within a5-year period, for more than 10% of humans alive at the beginning of that period?(By 2030, 2050, 2100)• What is the overall probability of human extinction or a reduction in the globalpopulation below 5,000? (By 2030, 2050, 2100)18 Participants also consented to participate in this study, via the University of Pennsylvania’s InstitutionalReview Board. The consent form detailed the format of the study.19 We define a catastrophic event as one causing the death of at least 10% of humans alive at the beginning ofa five-year period. We define extinction as reduction of the global population to less than 5,000.
I appreciate these links to the full question content.
-
- Dec 2023
-
www.nber.org www.nber.org
-
feed these motivations through several potential mechanism
I don't see the connection to household bargaining here
-
conflict environments can also exacerbate existing gender-basedinequalities or challenge traditional gender roles in a society
This seems logically incoherent. Why would conflict challenge gender roles? And if the opposite, why would greater inequality (favoring men, I guess) make them want to resort to violence to 'reassert' their power?
-
establishment of peace in the public space has substantial positive spillover effects in enhancingwomen’s well-being in the private space
relevant to prioritization
-
Therefore, we identify the population average treatment effect (PATE) ofarmed combat exposure
whole population or only 90%?
-
Probing the mechanisms, our analysis first renders the use of violence as an instrumental behavior in intrahousehold bargaining as an unlikely mechanism by eliminating labor market outcomes and economic- and social-controlling behaviors from the list of usual suspects.
This sentence is confusing. And why would I expect that 'violence ... [for[ intrahousehold bargaining' would be particularly driven by having been assigned to a conflict zone?
-
-
hughjonesd.github.io hughjonesd.github.io
-
Usage
I think you need to explain a possible workflow here. E.g.,
- Open the tex, markdown, etc. file in a text editor
- Apply the suggested suggestions using the above syntax and save it with a different name
- Use the 'suggs' tools below to manage this
Or 1. Open the old.txt (tex, markdown, etc. file) in a text editor 2. Just make your suggested changes (and comments?) and save it as new.txt, don't use the syntax 3. Use the utility 'suggs diff' to make a third file that highlights these suggestions (what is the name of the new file? do I need to pipe it into something?)
Also, would you want to have this syntax and style of suggs mapped for key text editors for syntax highlighting and maybe shortcut keys?
-
Create a suggestions file from the difference between old.txt and new.txt: suggs diff old.txt new.txt
I'm curious how this will work -- who does it attribute the changes to?
-
To review suggestions:
Can you signpost or tease whether this will be done manually or whether there will be suggested shortcut keys etc?
-
The original text, ++[your suggested addition,]++
why two "+" signs and not just one?
-
The handle must start with @ and must be the last word:
See above suggestion about how people grok the "@". I suggest
--David
or something like this instead... maybe even three dashes to avoid confusion with actual double-dash content. -
You can sign the comment with a @handle as the last word.
"@" always makes me think you are flagging the OTHER guy ... and you expect it alerts them somehow. Maybe a double dash instead?
%%[ This clarifies the argument, right, @stephen? --Reinstein ]%%
-
To make a comment, enclose it in %%[ and [%%:
Typo -- close bracket
-
ke this: The original text, ++[your suggested addition,]++ and more text.
Formatting of this documentation file -- I can barely see the text in those boxes ... make the boxes taller
-
-
static1.squarespace.com static1.squarespace.comXPT.pdf7
-
3. How the XPT works
A web site/wiki thing with dynamic explanations seems better for this section
-
1.33% [0.17,
tables should be forematted better
-
The median is straightforward tocalculate, transparent, robust to extreme outlying observations, and understandableto people with a basic knowledge of statistics. Also, reassuringly, it is never thehighest nor the lowest of the five methods we considered as potential aggregationmethods. For these reasons, we think the median provides an ideal middle ground foraggregating forecasts in this project.
This seems very much ad-hoc and not meant for a specialist audience. There is a whole literature on this, and much more theoretically grounded approaches, as you know. The justification given here is rather incomplete.
-
otal Extinction Risk
This stuff could be better presented as a dashboard/hosted Quarto type thing
-
bold claims that attract audiences and funding—and to keep their predictions vagueenough so they can never be proven wrong.
this seems somewhat contradictory
-
Some have argued more broadl
if this were a part of the project being evaluated we would ask for a reference here ('who are these people?'). But maybe OK for exec. summary.
-
I"m not sure a pdf is the best format for this. I suspect more interactive web presentation would be better
-
- Nov 2023
-
globalprioritiesinstitute.org globalprioritiesinstitute.org
-
Do ‘broad’ approaches to improving effective governance, and ultimately serving the farfuture, tend to be more or less effective in expectation than ‘narrow’ approaches (such asworking on reducing the risk of bioengineered pandemics)?
A very big question -- would be helpful to pose some possible building blocks to answering this question that gives people a hint at how to take a stab at it.
-
How can evidence be disseminated mosteffectively?
Disseminate: By whom, to whom, with what theory of change/path to impact?
-
Under what conditions should a social planner preserve ‘option value’ bydelaying an important, irreversible decision to acquire more information, thereby delegatingdecision-making authority to future agents with potentially different values and preferences(cf. Bishop 1982; Dixit and Pindyck 1994)?
To me this seems distinct from the rest of the bullet point
-
hen et al 2023; Toma andBell 2023
biblio entries missing
-
Intergenerational governance and policy-making
It's unclear whether we are talking about A. "intergenerational governance and international policymaking" or
B. "1. International governance and 2. Policymaking in general".
The latter bullet points and cited papers (e.g., Vivalt and Coville) do not seem to always relate to intergenerational governance
-
Vivalt, Coville, KC 2023
This seems relevant for The Unjournal's consideration/evaluation (but this may fall into the 'ask authors' permission' category). It is empirical and apparently rigorously quantitative and seems highly-relevant to policymaking and impact evaluation research, and influencing policy, all crucial to 'the impact agenda'. Hopefully also follows open, robust science standards (prereg, etc.).
-
Gonzalez-Ricoy and Gosseries 2017
biblio entry missing
-
Can we design mechanisms to ensure that AI systems exhibit desirable behavioursuch as truth-telling or a lack of deception?
Perhaps this should be elaborated and defined somewhat more formally. Reference a particular issue in mechanism design here that is particularly relevant to AI systems, perhaps.
-
govern powerful non-state entities
I'm not sure what is meant by 'powerful non-state entities' here. This seems under-defined.
-
inequalities,
political inequalities?
-
Read and Toma, 2023
biblio entry missing. Would be useful to know what this one is.
-
Song et al 2012)
biblio missing
-
Healy and Malhorta 2009
-
economic models predict the impact of advancedAI systems on political institutions and inequalities
A reference would be very helpful here. It's hard for me to see what sort of economic models are relevant here.
-
(Acemoglu, 2023
biblio missing. This seem potentially relevant for an Unjournal evaluation, although we tend not to focus on 'broad think piece' work, which this might be
-
Bersiroglu andErdil, 2023
biblio entry missing
-
Shulman, C., & Thornley, E. (2023). How Much Should Governments Pay to Prevent Catastrophes? Longtermism'sLimited Role. In Barratt, Greaves, Thorstad (eds.) Essays on Longtermism.
interesting but probably not quantitative/formally specific enough for The Unjournal
-
Alexandrie and Eden, forthcoming
biblio missing
-
E.g. Jordà et al. 2022
not sure this is getting at 'long run' in the sense that longtermists care about
-
ased on the historical record of such events, what is the tail distributionof harmful impacts (e.g., fatalities) from pandemics, asteroids, wars, and other potentialdisasters? (E.g. Marani 2021;
not really economics but that's not so important
-
Aschenbrenner2019
biblio entry missing
-
2023
This is mainly about the welfare tradeoff between economic growth and x-risk in a theoretical sense; I don't think it's about the 'impact of growth on GCRs' per se
-
Klenow et al. 2023
biblio entry missing
-
To what extent are forecasting methods informative for assessing the probability of globalcatastrophic risks and other future events of special importance for social welfare? (Karger
empirical and seems very relevant and strong; adding it to the Unjournal database
-
Karger, E., Rosenberg, J., Jacobs, Z., Hickman, M., Hadshar, R., Gamin, K., ... & Tetlock, P. E. (2023). ForecastingExistential Risks Evidence from a Long-Run Forecasting Tournament. FRI Working Paper No. 1.
-
Kalai & Kalai 2001)
biblio entry missing
-
Andreoni 2018
this is a practical applied policy paper that seems informative for donors considering their own charity decisionmaking
-
1.1 Strategic issues in altruistic decision
they largely mention theory papers (micro theory, optimization, axiomatic/normative), not empirical work here
-
(cf.Andreoni & Payne 2003)
The Andreoni and Payne paper is about the government crowdout of private philanthropy (there are a bunch of papers about this), not about the reverse nor about crowding out among donors.
-
Whatdetermines the optimal spending schedule for altruistic decision-makers?
Practically speaking, this seems largely about the impact of interventions (funded by charity) over time; however it does connect with donors' to the extent it involves personal finance and issues like value drift.
OK but the Trammell paper is addressing something different -- coordination in a public goods provision model.
-
n altruistic decision-maker that funds a charitable intervention may crowd out fundingfrom other actors (e.g., governments or philanthropists)
I might add a related issue -- decisions to give to one charity may crowd out other donations; the extent to which this is the case ('donations are substitutes') informs strategies for convincing people to give 'more effectively' vs 'give to effective causes.' See my notes: https://daaronr.github.io/ea_giving_barriers/chapters/substitution.html
-
Research agenda draft for GPI Economics
Does anyone know if this is the most updated statement of GPI’s economics agenda?
“Economics ‘draft agenda’” Anyone know when it was updated?
-
- Oct 2023
-
-
single chat requests: chatgpt_single()
chatgpt_single(prompt_role = 'user', prompt_content = 'say something relevant' , temperature = 0.8 , n = 2 , max_tokens = 15)
n = 2 -- 2 responses max_tokens = 15 ... maybe it means 'try 15 times' to get the best 2 from
prompt_role = 'user' -- just ask a question for the gpt ('assistant') to respond. I'm not sure why it would make sense for single chat requests to choose the role 'system' or 'assistant' as this wouldn't persist (?)
-
-
deliverypdf.ssrn.com deliverypdf.ssrn.com
-
ecause military shocks generate plausibly exogenous variationin economic production, the findings improve upon the correlational evidence in Section 3.Second, the implied biodiversity-GDP elasticities from the quasi-experiment are largerthan their OLS counterparts. By leveraging shocks in a 2SLS setting, the quasi-experiment-based elasticity estimates alleviate classic measurement error and endogeneity problems.However, the fact that the two methods produce elasticities of similar order of magnitudeadds confidence to the overall credibility of the estimates.
are they ignoring the heterogeneity and "LATE" issue?
-
states’relative differences in response to aggregate military buildups (which are themselves largelydriven by geopolitical factors) – are unlikely to be correlated with unobservable determinantsof local biodiversity. That is, we assume the United States will not increase national mili-tary spending because states that receive larger military procurement contracts have worsebiodiversity
but couldn't both of these be driven by a third factor ... the state is less environmentally friendly. or maybe I am missing something
-
First, we producecausal estimates of the elasticities between biodiversity outcomes and air pollution. We use aresearch design that isolates variation in local pollution driven by transported pollution fromdistant, upwind cities (e.g., Deryugina et al., 2019; Anderson, 2020). We show that “upwindpollution” coming from areas over 300 km away generates substantial variation in local airquality, and these imported pollution shocks cause reductions in local biodiversity outcomes.Second, we estimate the impact of the military spending shocks on air pollution, and multiplythese estimates by the biodiversity-pollution elasticities we obtain from step one. Together,these exercises give us the expected impact of the military shocks on biodiversity throughair pollution. We find that pollution accounts for 20-60 percent of the reduced form effect ofmilitary shocks, suggesting air pollution is a first-order pathway underlying the production-biodiversity link
have they successfully shown this 'mediation' channel ... with 2 separate sources of exogenous variation? (That's always very challenging to identify)
-
Second, there is substantial distributional heterogeneity, where the negativeassociation at the lowest decile of biodiversity is almost twice as large as the average.
But could this reflect something mechanical like a nonlinearity?
Tags
Annotators
URL
-
- Aug 2023
-
github.com github.com
-
FC = /opt/homebrew/bin/gfortran-11
For fortran
-
CXX = /opt/homebrew/bin/g++-11
Compiler for C++
-
CC = /opt/homebrew/bin/gcc-11
Tells R which C compiler to use
-
- Jul 2023
-
unjournal.github.io unjournal.github.io
-
book
It's a work in progress
-
-
forum.effectivealtruism.org forum.effectivealtruism.org
-
We are now planning a further update in response to additional comments (e.g., from James Snowden and GiveWell). We expect this will include updating our analysis with recently completed studies and refining some technical aspects of the analysis, including:Our systematic review, and the weight we place on different sources of evidenceEstimated spillover benefits for household membersCost estimatesTechnical details, such as:How long do the effects of psychotherapy last?How important is the expertise of the deliverer or number of sessions?Are the effects of psychotherapy affected by publication bias?
This seems extremely high-value and potentially ideal for the Unjournal's non-academic stream. Ryan 'had this in mind too'
-
but longtermists often claim priorities such as AI alignment and preventing pandemics are important, even if we solely consider present wellbeing, so we shouldn’t dismiss the possibility.
I don't see how this argues against the 'suspicious convergence' claim... OK, I see Jack Malde's comment now, which basically gets at my doubts here.
-
we have a full list of research ideas that we hope to explore
this is the list you linked above under 'organizations'. fwiw it's an interesting list but it's very sparsely populated (most columns have a name only). Some fleshing out and ranking/prioritizing could be helpful here.
-
If such views are true, that would count against longtermism
I don't see this as a promising research agenda. My sense of it is that it is pretty intractable. (I'm not saying if it is true/false/wrong/right, just that I am not sure if there will be a lot of practical value in pursuing it? OK I see some approaches that might be helpful, if one has a tractible way to model welfare considerations with PAV it might win some people over.)
-
credible cause for longtermists.
'cause' or indirect instrumental goal?
-
We’ve published two working papers on moral uncertainty: The property rights approach to moral uncertainty and Wheeling and dealing: An internal bargaining approach to moral uncertainty, which both explore a novel, bargaining-based approach to acting when you’re uncertain what’s morally right. (This is very roughly akin to the ‘moral parliament’ approach.) We’re currently working with two external co-authors on a new paper that combines these ideas, which we plan to publish in an academic journal.
Potential relevant to #unjournalresearchprioritization, depending on the approach
-
5.1 Using WELLBYs to compare the value of extending lives against improving lives
Somewhat relevant to #unjournalresearchprioritization
-
Although unlikely, we may also do some work relating to animal welfare; a challenge is that we prefer to rely on self-reports, which animals can’t give.
this could be relevant for Unjournal. How unlikely? If it's so unlikely, why mention it?
-
Assess the social desirability bias and other self-reporting biases in SWB data (for example: Do people give answers surveyors want? Is it a problem? If so, can anything be done?)Explore whether the measure of SWB matters (for example, if the key outcome is happiness rather than life satisfaction, do we get different priorities?)
unjournalresearchprioritization
-
Our working paper A Happy Possibility about Happiness (and other) Scales, a working paper attempts to provide the first overview of both the theory and evidence of the comparability of subjective wellbeing scales (e.g., is your 7/10 the same as my 7/10?). We plan to revise this for publication in an academic journal.
-
Our article To WELLBY or not to WELLBY? sets out the WELLBY method, its strengths, weaknesses, and areas for future work. To expand on this, we are:
This seems very relevant for Unjournal
-
3. The nature of wellbeing
Probably not relevant for the Unjournal at this point, but there may be some overlap
-
existing work (where a public document is available)
academics (at least in my field) would distinguish a fourth stage 'having been accepted in a journal after peer review'. Not sure how important that distinction is for you.
Note that The Unjournal is trying to make that last stage less burdensome and more informative by commissioning public evaluation and rating of work (rather than relying on tedious and imprecise the 'which journal was it published in' measure)
-
Some work is both existing and current (where we have extant research we are updating).
that's the best, I like to see all research as 'permanent alpha' mode
-
which means we have a number of ongoing projects.
'which means' --- the implication is not clear here
-
The notable ones are:
for me the more concrete measurement issues are at least as important ... you include these, but I don't see it in this paragraph.
-
We also have a long list of organisations we would like to explore, including the Shamiri Institute, Action for Happiness, and Koko.
- The airtable view is linking interventions and cause areas, not organizations
- Why and how did you choose and prioritize these? It's a huge space to explore?
-
e expect others will provide different types of mental health interventions, such as social-emotional learning. We expect to examine Friendship Bench, Sangath, and CorStone unless we find something more promising.
Does that mean you will need to assess (and consider research evidence) on these other non-psychotherapy interventions? If so, that deserves its own section perhaps?
-
Based on our cause area report on mental health and our cost-effectiveness analysis of psychotherapy, we think mental health is a promising area in which to find cost-effective interventions to improve wellbeing.
this paragraphs seems unneeded and repetitive. Or am I missing something here?
-
so we will also update our assessment of StrongMinds after we update our psychotherapy evaluation.
Maybe restate this to clarify that you are not reevaluating SM as an organization again, but will update the evaluation of their impact in light of your updated evaluation of the intervention?
-
2.1 Updated evaluation of psychotherapy
this part still seems like an intervention not an organization
-
Are the effects of psychotherapy affected by publication bias?
Pedantic: I'd say the 'estimated effects' here ... obviously the effects themselves are not affected by this bias
-
e’ve found that psychotherapy for depression is several times more cost-effective than cash transfers for improving happiness, deworming has an unclear long-term effect,
in the statement in the 1-pager, you stated
we’ve found that psychotherapy is several times more cost-effective than cash transfers or deworming for improving happiness.
That's not entirely consistent with this sentence
-
From this work, we’ve found that psychotherapy is several times more cost-effective than cash transfers or deworming for improving happiness. We concluded that comparing psychotherapy to antimalarial bednets, a life-saving intervention, depends heavily on various philosophical assumptions: treating depression ranges from about as good as to several times better than antimalarial bednets, depending on the assumptions.
these sentences repeat the sentences above in the one-pager
-
evaluated the cost-effectiveness of organisations that provide psychotherap
The psychotherapy report doesn't seem to be about a particular organization. I'm a bit confused about the structure here. How does this differ from a 'cause area exploration' at this point?
-
e, including psychedelics, opioids, poverty, loneliness, sleep, and air pollution.
These again combine problems with potential remedies and interventions. And 'psychedelics' -- is that aimed at curing 'problems' or boosting the upper end joys of life?
-
Longlist of future cause areas to explore
this needs a header (it now is under 1.4). Not sure this long list is helpful here though? What's actionable about this? Is it linked to an appeal for more funding?
-
1.2 Child development effects (e.g., abuse, trauma, nutrition)
I suspect some strong Unjournal/academic research links here. Also to the house improvement ones and possibly the fistula ones too.
-
that may have large impacts on wellbeing as well. So far we have completed shallow reviews on pain, lead exposure, and immigration.
For unjournal research prioritization, I guess I will have to dig into those reviews to identify the most pivotal research to have evaluated?
These link articles but don't contain a 'list of works cited' at bottom. Could you provide that ... even better an 'annotated/categorized/prioritized list' explaining which ones you rely on most heavily, and which you have the most uncertainty over?
-
Research agenda
In the previous agenda you tried
to articulate, within each research area, where additional research seems more (or less) useful, and therefore what our research agenda is for the next one to two years.
This seemed particularly relevant to helping the Unjournal help you. Not sure this new agenda does this as much.
-
A working paper exploring a bargaining-based approach to moral uncertainty
How are you defining and considering a 'working paper' here?
"Units of value" .. maybe add a few more words to clarify this?
I assume this will be a theoretical paper (i.e., no surveys or data?)
-
A revised paper on the theory and current evidence on scale cardinality (e.g., is your 7/10 the same as my 7/10?)
I see a lot of benefit in engaging with academics on this paper, and getting and responding to feedback, possibly within The Unjournal's framework
-
An academic paper setting out our method for measuring impact using wellbeing
Not sure what is meant by 'an academic paper'. Typically it would be hard to publish a paper in an academic journal that simply 'describes' (or even justifies) the approach that a particular organization takes.
You might have to frame it more as answering or providing evidence on a question of general interest, and/or formally arguing for the 'most appropriate approach' under certain defensible criteria.
-
We will conduct new research on how to measure and interpret subjective wellbeing measures:
A 2021 priority was "Examining how best to convert between different SWB, as well as other, measures (1.2.1)" This seems to have strong academic links relevant to the Unjournal. Is it still a priority?
-
2. Organisation evaluations
I expect Unjournal-evaluated research to provide inputs relevant to these evaluations, but not to directly evaluate particular organisations. However, we might be able to cover some of this within our 'less academic stream'.
-
Non-mood-related mental health issues (e.g., psychotic and trauma-related disorders)Child development effects (e.g., abuse, trauma, nutrition)Fistula repair surgeryBasic housing improvements (e.g., concrete floors)
I suspect there are a range of academic papers (in development economics, health economics, psychology, policy, and the social-sciency side of biomedicine) that will inform this, that Unjournal might evaluate.
This can include work that constitutes - impact evaluation of specific interventions, including RCTs non-experimental causal inference - work exploring the impact of specific paths to impact through these interventions (e.g., the career costs of childhood trauma) - work exploring costs and impacts on the market (e.g., impact of housing improvements on the local economy, price elasticities, etc.)
-
1. Cause area explorations
In the 2021 agenda you stated "Our main current focus, and where the majority of our eΛort will go, is Area 2.3: using subjective well-being scores to compare the cost-effectiveness of highly-regarded health and development interventions used in low-income countries."
Is this still your priority? Is this in line with the 'Cause area explorations" category here?
-
Conduct further theoretical work:
The boundary between theoretical and applied is not always clear here. Some research, maybe the methodological and measurement research in particular, has both theoretical aspects and very applicable and even empirical aspects. Calling this 'theoretical' might confuse people who would conflate theoretical with philosophical. E.g., research into which survey and other reporting instruments are more reliable, better reflect the actual measures of interest ... this seems very applied to me, and probably relevant to The Unjournal's scope as well.
-
asurement of wellbeing. This has included evaluating philosophical views of wellbeing and life satisfaction, pioneering methods to conduct cost-effectiveness analyses using wellbeing, and conducting novel research on wellbeing measurement. We
I think you may have moved a lot of the content outlined in the previous Research Agenda into those linked reports?
-
Where relevant, we hone in and compare the top organisations implementing those interventions.
Is this necessary in the wheelhouse of HLI? If your focus is on assessing impact from the wellbeing perspective, does that interact with things like 'organizational capabilities' at all?
Maybe better to outsource the latter?
-
picking new cause areas to investigate, then narrowing down to the specific organisations – which will enable us to look broadly and deeply at the same time.
When will you do each? Do you anticipate returning to broad cause areas you've previously decided not to pursue?
How much will you defer to other orgs and researchers in the broader prioritization?
-
large, solvable, and unduly neglected.
why not just namecheck the ITN framework here?
-
broad analyses of different causes.
how do you divide up the 'cause space' and define each 'cause'? Give some examples here? E.g., is "animal welfare' a cause area ... or 'farmed animal welfare' or 'chicken welfare' or 'promoting regulation of chicken farms' (the latter is more of an intervention IMHO)
-
An ultimate goa
one of several ultimate goals? To be pedantic, the 'improve global wellbeing' would seem to be the ultimate goal .... the 'identify the opportunities' is instrumental to that
-
We will explore whether we should improve the wellbeing of people alive now or in future generations:
Why not both? Maybe rephrase this?
Also, will you consider the empirical tradeoffs here, or deeper philosophical issues, or?
-
An academic journal book review of Will MacAskill’s What We Owe The Future
what academic journal are you thinking?
-
An experimental survey to test assumptions about subjective wellbeing measures, including comparability, linearity, and the neutral point
I talked to someone recently who had done some survey work in this area -- maybe remind me to get back to you on it.
-
An academic paper on life satisfaction theories of wellbeing
This seems underexplained ... what and why? (add a bit or link)
-
the cost-effectiveness of several organisations (partially informed by our cause exploration work)
mention how you will do it better/different or add value to what other orgs are doing?
-
Applied research to maximise global wellbeing
Doing this yourselves? Synthesizing work? Sponsoring work?
-
Non-mental health organisations (organisations TBD)
Non mental-health orgs within the global health or global health and development space, or much more widely ranging, across vastly different causes etc?
-
To find new promising solutions to the biggest problems,
this phrase seems a bit vague? Also, are these 'cause areas' or possible interventions, or a mix?
-
From this work, we’ve found that psychotherapy is several times more cost-effective than cash transfers or deworming for improving happiness.
I think you would be more convincing if you linked or footnoted some of the critiques of this, as well as your responses. This reads well for a general audience but maybe not for a research and EA audience?
-
engineering a paradigm shift towards a wellbeing approach among decision-makers
This seems to move a bit towards advocacy, perhaps in contrast to the more neutral approach you mention elsewhere or previously. In other writings it's more like 'get people to consider a well-being based approach, and whether and when it makes sense to use it'.
-
Ultimately, we measure impact in WELLBYs (wellbeing-adjusted life years), a method born in academia
A citation/link here would be great
-
For the first time in human history
Very small comment ... 'for the first time in human history' tends to come across as overblown whenever people use it. At least it has that connotation
-
The idea that the quality of a society should be judged by the happiness of its people is an old idea, stretching back at least to the Enlightenment, if not Aristotle.
Your previous "Research Agenda and Context" sent a lot of time defining and arguing for this. I don't see that here (for better or for worse).
-
[This post contains the Happier Lives Institute's research agenda for next 18 month. After a foreword, we give a brief summary of our plans, then go into more depth]
Typo -- '18 month'
-
- May 2023
-
evalresearch.weebly.com evalresearch.weebly.comReport1
-
Making referee payments or charity donations: Three-quarters of our respondents said that referees would do a better job if they were better rewarded for their effort. Among them, about 75% indicated that referees should be paid for timely completion of the report. This payment could take many forms e.g., a donation to a charity or research fund.
unjournal
-
-
daaronr.github.io daaronr.github.io
-
OftW pre-giving-tuesday-email upselling split test (considering ‘impact vs emotion’) c
PUT THE TAKEAWAYS HERE!!!
-
-
daaronr.github.io daaronr.github.io
-
Phase 2: EAMTT – Bringing together and engaging Academics, EA orgs, and marketers
Jack - single biggest barrier (stated) is "you should give where you live" ...
Move people who are somewhat aligned?
Effektiv Spenden ... some people are drawn
Which segment to appeal to?
-
-
willemsleegers.com willemsleegers.com
-
The output shows we need to set a prior on sigma, the Intercept, and on the male coefficient.
I'm trying to interpret the output. So it's suggesting a 'flat' (uniform?) prior on
b male
(or also on anotherb
? -- but what isb
?) and a students' T distribution for the intercept and for sigma, maybe with the latter being truncated?Why does it make these particular choices of distributions?
And it doesn't seem to be saying anything about the distribution of the outcome around it's mean, correct?
-
- Apr 2023
-
charity-elections.netlify.app charity-elections.netlify.app
-
Post-event (complete) response rates, ratings (1-6), by school
something clearly went wrong here, but I think it's fixed now. Will push the results again
-
-
willemsleegers.com willemsleegers.com
-
Add
library(readr) url <- "https://raw.githubusercontent.com/rmcelreath/rethinking/master/data/Howell1.csv" read_delim(url, delim = ";")
to help people plug and play
-
- Mar 2023
-
www.metacausal.com www.metacausal.com
-
we can change the question to “What is the probability that this intervention is better than 1x (i.e. cash transfers)?” We can set a critical value for that threshold (e.g. we accept programs that we are 90% sure are better than cash transfers). As above, that value comes straight out of the distribution from our PSA: it’s simply the proportion of outcome results from our PSA-generated distribution which are ≥1.
But this would require some assumptions over the underlying distribution of effectiveness
-
as a preview of what might happen with a full accounting of uncertainty. Code, data, and modified workbook are available.
This seems to be done in the file
sensitivity analysis.R
, pulling parameters from the linked Gsheet -
Looked for a handful of key parameters pertaining to the overall effectiveness of the program and prevalence of the issue being addressed Traced those parameter values back to the original data, and located the statistical sampling uncertainty provided
Focused on the statistical uncertainty only
-
we see that there is a substantial bias in the programs that we select. Programs that we select have a large positive bias on average.
the standard 'winners' curse'
-
n the first tab (“True vs false rejections”), we see the distribution of programs we accept and reject compared with whether or not they were truly better or worse than cash transfers. As expected, we generate many false rejections, due largely to the decision threshold of 3 being a “hedge” of sorts. More importantly, we observe false positives (i.e. programs that got “lucky”).
- "Selected" contains only where estimated CE > 3
- "Rejected" are all other programs (estimated CE<3)
-
-
-
nce this set of matched communities has been generated generalised linear mixed models (e.g., multilevel models) will be used to assess changes in outcomes before and after the intervention, at different time periods, while controlling for other variables including whether the area is a control or intervention area.
somewhat non-specific
-
-
forum.effectivealtruism.org forum.effectivealtruism.org
-
All we need to do is change the units of the calculation and see if the result changes because of it. If it does, the calculation violates scale invariance, and for some reason the result depends on the units of measure that are used to calculate it.
that is awesome!
-
-
squiggle-language.com squiggle-language.com
-
Major Future Additions
What about multivariate/correlated distributions? Or is there an easy compositional way to do this that I'm overlooking? Like maybe a 'shared random variable' that feeds into two distributions? But I'm not sure if that can be done in the current system, because ... can the 'draws from one distribution' be carried over as inputs into the 'draws from another distribution'?
-
Static / sensitivity analysis Guesstimate has Sensitivity analysis that's pretty useful. This could be quite feasible to add, though it will likely require some thinking.
Yes!
-
Right now Squiggle mostly works with probability distributions only, but it should also work smoothly with probabilities.
not sure what thi means
-
-
squiggle-language.com squiggle-language.com
-
Gallery
do any of these allow correlations between the elements we are uncertain about? (I guess in principle, correlated variables those could be combined into a distribution of some function of these variables, but that seems like part of the work Squiggle is meant to do)
-
-
squiggle-language.com squiggle-language.com
-
Some distribution operations (like horizontal shift) return an unnormalized distriibution.
explain what this means
-
distriibution
typo
-
Second argument to SampleSet.fromDist must be a number.
??
-
Recall the three formats of distributions. We can force any distribution into SampleSet format
Don't say 'recall' because this only comes up later!
-
For every point on the x-axis, operate the corresponding points in the y axis of the pdf.
Explain better how this differs from adding the distributions.
A comparison like
uniform(3,4) - uniform(0,1)
vs
uniform(3,4) .- uniform(0,1)
Could be helpful.
Also note the false intuition 'the distribution of the difference between draws uniform distributions should be uniformly distributed' can be checked by thinking about and plotting
uniform(0,1) - uniform(0,1)
However, that distribution should be triangular, and the simulated distribution in your plot looks somewhat far from this. Why not make that an analytical computation?
-
Pointwise operations are done with PointSetDist internals rather than SampleSetDist internals.
I have no idea what this means
-
TODO: this isn't in the new interpreter/parser yet.
It seems to work in the playground though
-
A projection over a stretched x-axis.
for consistency with the above, you should characterize this mathematically
-
-
squiggle-language.com squiggle-language.com
-
Samples are converted into PDF shapes automatically using kernel density estimation and an approximated bandwidth. Eventually Squiggle will allow for more specificity.
I thought Kernels can smooth things. Above it seems like a linear interpolation
-
mixture(1,2,normal(5,2)), the first two arguments will get converted into point mass distributions with values at 1 and 2.
and it gives 1/3 mass to each of 1, 2, and the distribution
-
mixture(pointMass(1),pointMass(2),pointMass(5,2)).
this throws an error in the Playground
-
Array of Distributions Input
Not sure what this is doing
-
-
squiggle-language.com squiggle-language.com
-
Most functions are namespaced under their respective types to keep functionality distinct. Certain popular functions are usable without their namespaces.
not sure what point you are trying to make here. Note the first one crashes the playground
-
For example,
In the playground, the first entry
a = List.upTo(0, 5000) |> SampleSet.fromList
Throws "This page crashed. Minimum discrete weight must be an integer
Try again"
-
Squiggle dictionaries work similarly to Python dictionaries. API.
OK these just store collections of things?
-
-
squiggle-language.com squiggle-language.com
-
Example
I don't get what these are supposed to do. This snippet throws an error when I try it in the playground:
Error merge is not defined Stack trace: <top> at line 19, column 12
-
-
squiggle-language.com squiggle-language.com
-
mixture
this needs more documentation perhaps?
-
If both values are above zero, a lognormal distribution is used. If not, a normal distribution is used.
This should be highlighted elsewhere!
-
-
forum.effectivealtruism.org forum.effectivealtruism.org
-
[5].
Not sure these footnotes line up
-
Say that $2B to $20B, or 10x to 100x the amount that Open Philanthropy has already spent, would have a 1 to 10% chance of succeeding at that goal [5].
What is this benchmarked against? If I had said a 1-3% chance or a 10-50% chance, would that have seemed equally plausible?
-
[0]. This number is $138.8 different than the $138.8M given in Open Philanthropy's website, which is probably not up to date with their grants database.
What does this mean? the two 138.8's here suggest a typo
-
For completeness, I do estimate the impacts of a standout intervention as well.
This means for some 'great intervention' ... best in class or something
-
cost = 2B to 20B
Another huge wild guess? But should the cost really vary? Shouldn't this just be done for a particular level of cost?
Also, I guess the prob. of success is likely to be related to the amount spent
-
probabilityOfSuccess = 0.01 to 0.1 # 1% to 10%.
Huge wild guess, and probably should be correlated to the acceleration and reduction in prison pop terms?
-
counterfactualAccelerationInYears = 5 to 50
huge wild guess
-
-
www.givewell.org www.givewell.org
-
We guess that implementation challenges would limit effectiveness and funding opportunities. As a result, we do not anticipate doing further research on this program in the near future.
What is the model (a VOI model?) for what to focus GW attention on?
-
-
www.givewell.org www.givewell.org
-
we estimate that this net distribution will reduce the number of deaths each year within this population from 12 to about 11.4
how does this number 'depend' on the 12.0 used above?
-
Step Four: 12 of those people are expected to die every year of any cause In order to estimate how many lives these nets might save, we first need to know how many people in this population would have died without the protection of the nets. The mortality rates and population demographics in Guinea suggest that about twelve out of 1,431 people would have died per year of any cause (including malaria).8
It's not clear how this would be included in the equation. Show the equation
-
-
forum.effectivealtruism.org forum.effectivealtruism.org
-
skeptical that a 4- to 8-week program like StrongMinds would have benefits that persist far beyond a year.
is this a reasonable justification for a skeptical prior?
-
t conclusions.
the table below should be better formatted
-
they seem unintuitive to us and further influence our belief that StrongMinds is less cost-effective than HLI’s estimates.
But this seems a bit overly driven by priors/double-counting
-
-
forum.effectivealtruism.org forum.effectivealtruism.org
-
This post provides an overview and analysis of the Doing Good Better book giveaway through Effective Altruism New Zealand (EANZ). The analysis covers data collected from survey responses between 05-Jan-17 and 17-Dec-19, for which there were a total of 298 responses, with appreciable variance in the amount of the survey which was completed. This analysis was initially completed around Jan 2020 so any reference to "to date" refers to then.
Hypothes.is comments are different -- that's the functionality I was looking for, more or less
-
-
jg-sponsorship.netlify.app jg-sponsorship.netlify.app
-
We found that Fundraising Pages which received the £5 donation raised £118 more (on average) than the pages in the control group.
We should try to replicate for effective pages
-
- Feb 2023
-
replicats.research.unimelb.edu.au replicats.research.unimelb.edu.au
-
ngle claim published in a paper to evaluating the credibility of published papers more holistically. In phase 2, which began in 2021, 200 "bushel" papers were evaluated holistically. Participants working in IDEA groups evaluated the seven credibility signals:
I'm a little unclear on what this is. Is there a concise explanation of a 'bushel' or of how this is 'holistic'?
-
-
bookdown.org bookdown.org
-
drop_na(ends_with("_s"))
ends_with("perc")
?
-
-
rethinkpriorities.github.io rethinkpriorities.github.io
-
‘BOTEC’: Back of the envelope calculations are central to RP’s work
relevance: see e.g., this private thread: https://rethinkpriorities.slack.com/archives/C04N8T10XC0/p1675265493647089
-
- Jan 2023
-
www.nber.org www.nber.org
-
A NATIONWIDE TWITTER EXPERIMENT PROMOTING VACCINATION IN INDONESIA
test comment
-
-
www.nber.org www.nber.org
-
SECTOR
test a note
-
-
osf.io osf.io
-
somemustuseIDEAprotocol,butmostcanuseasingleroundofelicitation.Whatthey allhaveincommonisthatthey mathematically aggregatejudgments abouttheprobability ofsomeeventorsubjectivedegrees ofbelief,intoasingle,value.
Will this work for continuous outcomes like the Unjournal is currently asking for?
-
-
willemsleegers.com willemsleegers.com
-
Interestingly, this also means that the prior for σ is now dependent on the prior for the slope, because
come back to this, we might be able to put this exlpicitly into the model
-
This means that the estimate for sigma is the square root of 1 minus the variance of the slope estimate (0.75²). I
Could/should we make this explicitly part of the model, i.e., constrain this?
-
prior(normal(0, 0.5), class = "b", lb = -1, ub = 1)
seems, with brms, you can set lb and ub on classes but not on individual parameters
-
add_predicted_draws(model_height_weight) %>%
here we draw 'predicted entries'
-
add_epred_draws(model_height_weight) %>%
draws from the slope parameter
-
- Dec 2022
-
exploratory-altruism.org exploratory-altruism.org
-
CEARCH discovering a Cause X every three years and significantly increasing support for it.
This seems like 'assuming the result' ... why every 3 years?
-
-
willemsleegers.com willemsleegers.com
-
sample_prior = TRUE,
Does the 'prior predictive simulation' stuff here too
-
The output shows us that we need to set two priors, one for the Intercept and one for sigma. brms also already determined a default prior for each, but we’ll ignore that for now.
It's not clear to me what
get_prior
is doing here, or what its logic is. It would seem to be using the data to suggest priors, which McElreath seems to be against (but the 'empirical bayes' people seem to like)Of course, it does at least remind you what objects you need to set priors over
-
The prior for the slope is a lot easier now. We can simply specify a normal distribution with a mean of 0 and a standard deviation equal to the size of the effect we deem likely, together with a lower bound of 0 and upper bound of 1.
Update: I was wrong on the below, the SD is not 1 here, because it's the SD for the residual term in the linear model, not the SD for the raw outcome variable.
Previous comment:...
I’m ‘worried’ that if you give it data you know has sigma=1, but you allow it to choose any combination of beta and sigma, you may be getting it to do give a weird posterior to both of the parameters, in a way you know can’t make sense, in order to find the most likely parameters for the weird geocentric model you imposed.
on the other hand I would have thought that it would tend to converge to a sigma=1 anyways as the most likely, as that is ‘allowed’ by your model
my take is that the cauchy prior you impose in that part is heliocentric; well let me expand on this. I think you know that the true std deviation of the ‘standardized heights from this population’ is 1 what you don’t know is whether it is indeed normal (i.e., whether family = gaussian is right here) thus it might be finding ‘a sigma far from 1 is likely’ under this model, because that makes your ‘skewed’ or ‘fat tailed’ data seem more likely under the normal prior A better approach might be to allow a different distribution with some sort of ‘skew’ parameter, but imposing the sd must be 1
-