Hypothesis

292 Matching Annotations

Mar 2026
Local file Local file

TEA Comparison: What Do the Studies Actually Say? – Cultured Chicken Cost Model

9
1. unjournal 31 Mar 2026
  
  in Public
  
  GFI amino acid report 2025
  
  Let's also incorporate GFI reports on other components, especially the growth factors.
2. unjournal 31 Mar 2026
  
  in Public
  
  $6/lb figure is for a 50/50 hybrid product
  
  It's not really an important caveat, as we can adjust for it. Also, please avoid bold.
3. unjournal 31 Mar 2026
  
  in Public
  
  Main Caveat
  
  column widht!
4. unjournal 31 Mar 2026
  
  in Public
  
  Time horizon: Current technology vs. 2030 projections vs. steady-state
  
  which do what... tooltip if necessary
5. unjournal 31 Mar 2026
  
  in Public
  
  Hydrolysate-based vs. pharma-grade vs. animal-component-free
  
  Which papers do what? If it's complicated, then explain it in a tooltip.
6. unjournal 31 Mar 2026
  
  in Public
  
  Pure wet cell mass vs. cultivated ingredient vs. hybrid product vs. retail-equivalent
  
  quick tooltip and also link to an explainer (in the 'learn') section detailing these differences between these definitions of output ... Also, which papers do what here? If it's complicated, do that in a tooltip.
7. unjournal 31 Mar 2026
  
  in Public
  
  hicken (Pasitka 2024) vs. generic mammalian/CHO-like cells (Humbird 2021) vs. unspecified (CE Delft 2021)
  
  tooltop direct quotes evidencing these
8. unjournal 31 Mar 2026
  
  in Public
  
  Basis
  
  Column widths are off. Make the columns with more text wider than the columns with little text. Make that a skill or a sort of general instruction. It comes up a lot whenever you generate these HTML documents. !
9. unjournal 31 Mar 2026
  
  in Public
  
  Read This First
  
  fold this -- header should be 'caveat'
Annotators

unjournal
uj-prioritization-prototype.netlify.app uj-prioritization-prototype.netlify.app

High-Impact Research Candidates - The Unjournal

2
1. unjournal 28 Mar 2026
  
  in Public
  
  here Animal welfare & food systems High Priority Mar 24, 2026 |EAFORUM gpt-5.4-mini ▾ Details This looks l
  
  the EA forum linked papers are not showing the actual paper titles
2. unjournal 25 Mar 2026
  
  in Public
  
  Comment directly on this page using the Hypothes.is sidebar (look for the < tab on the right edge of the page). Highlight any text and add your annotation — visible to all Hypothes.is users. You can also use the feedback buttons on each paper card.
  
  Add a filter by year as well.
Visit annotations in context

Annotators

unjournal

URL

uj-prioritization-prototype.netlify.app/
unjournal.github.io unjournal.github.io

Cultured Chicken Production Cost Model – Cultured Chicken Cost Model

16
1. unjournal 28 Mar 2026
  
  in Public
  
  A latent factor (0=nascent, 1=mature) that affects all technology adoption, reactor costs, and financing. High maturity = correlated improvements.
  
  better explanation (or link) how this particular modeling was chosen, as well as the defaults here
2. unjournal 28 Mar 2026
  
  in Public
  
  Cell Density Range Code viewof density_lo = Inputs.range([10, 100], { value: 30, step: 10, label: "Cell Density Low (g/L)" }) viewof density_hi = Inputs.range([50, 300], { value: 200, step: 10, label: "Cell Density High (g/L)" })
  
  allow 'reset defailt' here too
3. unjournal 28 Mar 2026
  
  in Public
  
  Model Parameters Code viewof simpleMode = Inputs.toggle({label: "Simplified view (recommended)", value: true})
  
  A button to 'hide parameter setting' and 'show parameter setting/ could help, then whin it's hidden, the rest of the page content could be bigger so we can see chart sbetter
4. unjournal 27 Mar 2026
  
  in Public
  
  Two controls for growth factors:
  
  This is too much information for this dashboard, and I think most of it is present in either the learn or the technical reference dashboard. Give it as mostly a TL;DR, and then link that section for further explanation.
5. unjournal 27 Mar 2026
  
  in Public
  
  How far along the price reduction curve we are within each regime:
  
  This needs further explanation. What year are you talking about for this "How far along"?
6. unjournal 27 Mar 2026
  
  in Public
  
  Component Distributions
  
  if possible, let them click a distribution to expand it
7. unjournal 27 Mar 2026
  
  in Public
  
  Cost Breakdown by Component (Total: $122.59/kg):where(.plot-d6a7b5) { --plot-background: white; display: block; height: auto; height: intrinsic; max-width: 100%; } :where(.plot-d6a7b5 text), :where(.plot-d6a7b5 tspan) { white-space: pre; }
  
  make chart below bigger
8. unjournal 27 Mar 2026
  
  in Public
  
  Variable Operating Costs (VOC):
  
  link the 'learn' explainer sectinos here
9. unjournal 27 Mar 2026
  
  in Public
  
  Technology Adoption by 2036
  
  these should have 'reset' buttons, where it goes back to the default
10. unjournal 27 Mar 2026
  
  in Public
  
  Total: $122.59
  
  round thr total to the nearest dollar
11. unjournal 27 Mar 2026
  
  in Public
  
  Unit Production Cost ($/kg) →
  
  make the units clearer, and also make it clearer that this refers to pure cell mass, not a blended product
12. unjournal 27 Mar 2026
  
  in Public
  
  Basic Parameters
  
  enable a 'blending share' slider, if they tick that box, and other things should adjust accordingly #implement
13. unjournal 27 Mar 2026
  
  in Public
  
  $$
  
  double $ sign here -- remonve one
14. unjournal 27 Mar 2026
  
  in Public
  
  Model Structure Code viewof include_capex = Inputs.toggle({label: "Include capital costs (CAPEX)", value: true}) viewof include_fixed_opex = Inputs.toggle({label: "Include fixed operating costs", value: true}) viewof include_downstream = Inputs.toggle({label: "Include downstream processing", value: false})
  
  You should have a box to show/hide the 'blending share' parameter
15. unjournal 27 Mar 2026
  
  in Public
  
  Pure cells vs. consumer products: Most cultivated meat products on the market or in development are hybrid products — blending a fraction of cultured cells with plant-based or mycoprotein ingredients. A product with (say) 20% cultured cells and 80% plant-based filler at $3/kg would have a blended ingredient cost far below the pure-cell cost shown here. The "price parity with conventional meat" threshold may therefore be achievable at higher per-kg cell costs than these numbers suggest.
  
  Tooltip some specific quotes on blending share
16. unjournal 27 Mar 2026
  
  in Public
  
  GitHub issues: Open an issue Email: contact@unjournal.org
  
  merge this with the above folding section #implement
Visit annotations in context

Annotators

unjournal

URL

unjournal.github.io/cm_pq_modeling/
unjournal.github.io unjournal.github.io

Technical Reference – Cultured Chicken Cost Model

1
1. unjournal 27 Mar 2026
  
  in Public
  
  Quick Reference: All Equations at a Glance
  
  some text ... "2b ... growth factors" was not rendered correctly. Fix it #implement
Visit annotations in context

Annotators

unjournal

URL

unjournal.github.io/cm_pq_modeling/docs.html
uj-prioritization-dashboard.netlify.app uj-prioritization-dashboard.netlify.app

High-Impact Research Candidates - The Unjournal

2
1. unjournal 26 Mar 2026
  
  in Public
  
  The paper’s object is an abstract characterization of strategy-proof social choice rules for selecting a public-good level. While public decision rules can matter in principle, the abstract theorem is not tied to a concrete policy domain, institution, or implementation setting. There is no evident link to a specific decision-maker, welfare question, or operational policy lever where an evaluation would affect choices at scale.
  
  So why did you rate it 10/10 for decision relevance?
2. unjournal 26 Mar 2026
  
  in Public
  
  This is a strong Unjournal candidate: it is directly about improving job recommendation systems used by a public employment service, has clear welfare implications for job seekers, and uses randomized field experiments rather than purely predictive metrics. The paper addresses a decision-relevant policy question—how to design algorithms that improve worker outcomes rather than platform clicks/applications—and appears to offer actionable guidance for public and private labor-market intermediaries. As a working paper with experimental evidence and a model-based welfare metric, it has high timing value and likely benefit from independent evaluation.
  
  I don't see what global priorities relevant decision this targets. Not sure why this was prioritized.
Visit annotations in context

Annotators

unjournal

URL

uj-prioritization-dashboard.netlify.app/
uj-cm-workshop.netlify.app uj-cm-workshop.netlify.app

Cultivated Meat Workshop | The Unjournal

5
1. unjournal 24 Mar 2026
  
  in Public
  
  Following our evaluation of Rethink Priorities' cultured meat forecasting work and ongoing TEA evaluations, this workshop focuses on what the evidence tells us about cultivated meat's production cost trajectory. We recognize that consumer acceptance, regulatory pathways, and environmental implications also matter — but we're centering on costs because this seems among the most pivotal and tractable questions right now, and we want to bring focused expertise to bear. Pivotal Questions Initiative → 📊 Cost Modeling Dashboard → EA Forum: CM Viability → CM_01 on Metaculus → RP Evaluation →
  
  this feels overwhelming/too many links -- find a way to make it less cluttered
2. unjournal 24 Mar 2026
  
  in Public
  
  Async Discussion & Suggestions
  
  we'll just do this so remove this question. #implement
3. unjournal 24 Mar 2026
  
  in Public
  
  Or mark your availability on the grid (optional)
  
  make this a folding box, folded by default #implement
4. unjournal 24 Mar 2026
  
  in Public
  
  Or mark your availability on the grid (optional) Click cells for any time blocks you could join. Click a date to select that row, a time header to select that column, or a week label to select the whole week. All times US Eastern; hover for UK/CET.
  
  adJust this to start on April 15th and go through the first week of May #implement
5. unjournal 24 Mar 2026
  
  in Public
  
  Note: This workshop is still in early planning. We're gathering initial interest and availability. Final dates and agenda will be confirmed once we have responses from key participants.
  
  Make it clear that we're planning for the late April or very early May #implement
Visit annotations in context

Annotators

unjournal

URL

uj-cm-workshop.netlify.app/schedule
uj-wellbeing-workshop.netlify.app uj-wellbeing-workshop.netlify.app

Wellbeing Pivotal Questions | The Unjournal

2
1. unjournal 19 Mar 2026
  
  in Public
  
  We're continuing the discussion asynchronously and will be publicly sharing key materials soon. This site is evolving into a resource page.
  
  We're continuing the discussion asynchronously and will be publicly sharing key materials soon. This site is evolving into a resource page and hub for feedback, dialogue, and belief elicitation.
2. unjournal 11 Mar 2026
  
  in Public
  
  1. WELLBY Reliability and Value
  
  make an anchorable link here and for the other headers.
Visit annotations in context

Annotators

unjournal

URL

uj-wellbeing-workshop.netlify.app/beliefs
uj-wellbeing-workshop.netlify.app uj-wellbeing-workshop.netlify.app

About This Workshop | The Unjournal

1
1. unjournal 19 Mar 2026
  
  in Public
  
  Join the discussion (Google Doc)
  
  probably moving to have this discussion more in hypothes.is on web content and less in that Google doc; it's hard to make the Gdoc attractive and organized.
Visit annotations in context

Annotators

unjournal

URL

uj-wellbeing-workshop.netlify.app/about.html
uj-wellbeing-workshop.netlify.app uj-wellbeing-workshop.netlify.app

Recommended Readings | Wellbeing Workshop | The Unjournal

2
1. unjournal 19 Mar 2026
  
  in Public
  
  Evaluation: Cash Transfers vs Psychotherapy in Liberia (McGuire et al.) Unjournal Evaluation Summary Applied Comparison Direct experimental comparison of cash transfers and psychotherapy in an LMIC context. Particularly relevant because it measures multiple outcomes—psychological distress, consumption, life satisfaction—allowing cross-metric comparison. Evaluation Summary
  
  This is not the title nor the authors -- fix this hallucination
2. unjournal 11 Mar 2026
  
  in Public
  
  Essential
  
  'essential' is too strong. Maybe 'Most important for discussion'. And note there's no way to do a thorough read of all of these in 2 hours. Just leave that 'time allotment' out'
Visit annotations in context

Annotators

unjournal

URL

uj-wellbeing-workshop.netlify.app/readings
uj-wellbeing-workshop.netlify.app uj-wellbeing-workshop.netlify.app

Linear WELLBY Analysis | Wellbeing Workshop | The Unjournal

6
1. unjournal 18 Mar 2026
  
  in Public
  
  The Controversy: Happier Lives Institute estimated StrongMinds a
  
  Use This link instead -- https://www.happierlivesinstitute.org/report/strongminds-cost-effectiveness-analysis/
  
  @Samuel_Dupret let me know if you think a better link is appropriate.
  
  You might be wondering why I'm still bothering with this at the workshop - I want to turn this into a resource page for further practical work and discussion.
2. unjournal 18 Mar 2026
  
  in Public
  
  otentially more cost-effective than AMF. GiveWell's 2023 assessment disagreed, citing concerns about: (1) mapping depression scales to LS, (2) assumed effect duration, (3) demand effects in self-reported outcomes, and (4) publication bias.
  
  Link needs fixing -- https://www.givewell.org/international/technical/programs/strongminds-happier-lives-institute
  
  Also mention and link HLI's response to this assessment here
3. unjournal 16 Mar 2026
  
  in Public
  
  Peasgood et al. (unpublished)
  
  We have a copy
4. unjournal 15 Mar 2026
  
  in Public
  
  Unit-change comparability
  
  I'm not sure this is stated correctly. It seems to overlap cardinality.
5. unjournal 14 Mar 2026
  
  in Public
  
  📚 Further Reading: Unjournal Evaluations The Unjournal has commissioned independent evaluations of papers relevant to this debate: → StrongMinds & Friendship Bench Evaluation — Critical assessment of HLI's meta-analysis and cost-effectiveness claims → Long-Run Effects of Psychotherapy on Depression — Cuijpers et al. meta-analysis on therapy durability → Cash Transfers vs Psychotherapy: Comparative Impact — McGuire et al. direct comparison in Liberia → Mental Health Therapy as a Core Strategy (Ghana) — Barker et al. on scaling community-based therapy
  
  Put this somewhere else - I don't think it belongs within the focal case folding box. It should have its own folding box in the reading section and references
6. unjournal 13 Mar 2026
  
  in Public
  
  mortality-focused interventions
  
  When comparing among interventions, some of which that affect mortality.
Visit annotations in context

Annotators

unjournal

URL

uj-wellbeing-workshop.netlify.app/linear-wellby-analysis
uj-wellbeing-workshop.netlify.app uj-wellbeing-workshop.netlify.app

DALY/QALY↔WELLBY Conversion | Wellbeing Workshop | The Unjournal

1
1. unjournal 16 Mar 2026
  
  in Public
  
  Practical guidance for funders now Given the uncertainties above, what should funders actually do? This section offers a decision-oriented framework, not a single prescription.
  
  I didn't want the AI to give this 'practical guidance' -- that's meant to come out of the session!!
Visit annotations in context

Annotators

unjournal

URL

uj-wellbeing-workshop.netlify.app/daly-wellby-conversion
uj-wellbeing-workshop.netlify.app uj-wellbeing-workshop.netlify.app

Live Workshop Sessions | Wellbeing Measures Workshop | The Unjournal

2
1. unjournal 15 Mar 2026
  
  in Public
  
  Zoom chat for quick reactions;
  
  No, I only want the Zoom chat to be used by the session organizers and mainly just to guide people on the structure of the workshop and where we're going next
2. unjournal 12 Mar 2026
  
  in Public
  
  Segment structure is set; timing may adjust slightly. Updated March 11, 2026
  
  12 Mar 2026 -- Not entirely set -- we may add some small things. But close to set, and trying to harden the timings so we can send out a schedule soon that people can trust
Visit annotations in context

Annotators

unjournal

URL

uj-wellbeing-workshop.netlify.app/live/
uj-wellbeing-workshop.netlify.app uj-wellbeing-workshop.netlify.app

Wellbeing Pivotal Questions | The Unjournal

3
1. unjournal 12 Mar 2026
  
  in Public
  
  calibrated
  
  Give the definition of 'calibration' here as a footnote/tooltip. Roughly, things that when you say something will happen X% of the time it in fact occurs X% of the time, not much more nor less.
  
  If you are asked to give 80% CIs, the true values should fall in those intervals close to 80% of the time. If it happens less than 8/10 times, you're being overconfident, and stating too narrow intervals. If it happens more than 8/10 times, you're being underconfident, and stating overly wide intervals
2. unjournal 12 Mar 2026
  
  in Public
  
  Consider the value obtained when using the best feasible measure for cross-intervention comparison in contexts like the focal context. What share of this value is obtained, in expectation, from using the simple linear WELLBY measure (as defined above) for all interventions?
  
  Above the 'operationalized version' Add a discussion box here for people to answer the more general question.
3. unjournal 12 Mar 2026
  
  in Public
  
  Consider the value obtained
  
  add a sub-sub-header "Operationalized version" here
Visit annotations in context

Annotators

unjournal

URL

uj-wellbeing-workshop.netlify.app/beliefs.html
uj-wellbeing-workshop.netlify.app uj-wellbeing-workshop.netlify.app

About This Workshop | The Unjournal

12
1. unjournal 10 Mar 2026
  
  in Public
  
  We're organizing the discussion around four key questions:
  
  Restate this to more directly address the question in the heading on "what we want to achieve".
  
  We want to: - Help researchers understand practitioners' highest-value questions and considerations and trade-offs. - Help practitioners understand the most relevant and useful up to date research and its implications - Enable communication and collaboration, by getting on the same page, agreeing on terminology, identifying points of consensus and high-value cruxes, etc. - State and measure our beliefs about key issues and questions openly, with precision and calibrated uncertainty, driving high "value of information" Bayesian updating - Drive better decisions over measuring the impact of interventions in LMICs and using existing measures, leading to better funding decisions
  
  (This is a bit long -- just adjust the basic first sentence a tiny bit, and then footnote this more detailed theory of change. ) #implement
2. unjournal 10 Mar 2026
  
  in Public
  
  The neutral point is the life satisfaction level representing neither positive nor negative welfare—essentially the boundary between "life worth living" and "suffering." Estimates range from 2-5 on the 0-10 scale. Peasgood et al. (2018) tentatively estimate ~2.
  
  Add: "This is particularly important for comparing interventions that have impacts on mortality (and perhaps fertility). We should discuss this in this workshop to an extent, but we might de-emphasize it to avoid overstretching the scope, depending on interest and timing.
3. unjournal 10 Mar 2026
  
  in Public
  
  evaluation summary
  
  Link it here https://unjournal.pubpub.org/pub/evalsumstrongminds/ -- however, I don't see anything in that summary that provides details suggesting this order of magnitude thing. Find a better reference.
4. unjournal 10 Mar 2026
  
  in Public
  
  QALYs (quality-adjusted life years)
  
  Link one authoritative external resource presenting these sin detail
5. unjournal 10 Mar 2026
  
  in Public
  
  instruments like EQ-5D
  
  dead link
6. unjournal 10 Mar 2026
  
  in Public
  
  Other measures include QALYs (quality-adjusted life years), income-equivalent measures, and multi-dimensional poverty indices. QALYs are similar to DALYs but measure health gained rather than lost.
  
  This is being adjusted. NB we focus more on DALY than QALY because it's used a lot more in the LMIC intervention context, largely due to its ease of collection
7. unjournal 10 Mar 2026
  
  in Public
  
  —and what would change their minds?
  
  remove 'and what would change their minds' -- this doesn't fit. #implement
8. unjournal 10 Mar 2026
  
  in Public
  
  Unlike WELLBYs, DALYs are based on expert-derived disability weights rather than self-reported wellbeing—weights are constructed through surveys of health professionals rating hypothetical health states.
  
  Are you sure that it's through surveys of health professionals? I thought the surveys were of people in the general population. And this explanation doesn't mention how an individual's DALY is constructed based on asking them about their health states or something. What's the data used?
9. unjournal 09 Mar 2026
  
  in Public
  
  Vignette exercises: respondents rate hypothetical people's life satisfaction based on descriptions, revealing how individuals anchor the scale and enabling cross-person calibration.
  
  Do they actually do this in the paper? doublecheck
10. unjournal 09 Mar 2026
  
  in Public
  
  Calibration questions ask respondents to rate well-defined scenarios (e.g., "How satisfied would you be if you won $1,000?"). By observing how people rate the same reference points, researchers can estimate individual differences in scale use.
  
  Is this a reasonable examlpe? Do they ask questions like that in the exercises mentioend in the paper?
11. unjournal 09 Mar 2026
  
  in Public
  
  Cost-effectiveness estimates vary by an order of magnitude depending on how WELLBYs are valued relative to DALYs.
  
  What's the source for this OOM claim?? Find and link it with a verbatim quote . #implement
  
  Also it's not in our 'evaluation summary as far as I know'
12. unjournal 09 Mar 2026
  
  in Public
  
  Open Philanthropy
  
  It's now "Coefficient Giving" -- correct this on every page. And hyperlink "https://coefficientgiving.org/research/cost-effectiveness/" here. #implement
Visit annotations in context

Annotators

unjournal

URL

uj-wellbeing-workshop.netlify.app/about
uj-wellbeing-workshop.netlify.app uj-wellbeing-workshop.netlify.app

Linear WELLBY Analysis | Wellbeing Workshop | The Unjournal

4
1. unjournal 05 Mar 2026
  
  in Public
  
  Each scale point represents an equal welfare increment. If violated, summing is invalid and interventions targeting different baselines become incomparable.
  
  David Reinstein --- personally, this is the one I find least plauslible and most important.
2. unjournal 05 Mar 2026
  
  in Public
  
  nterpersonal Comparability LSA = 7 ≈ LSB = 7 implies UA ≈ UB When two people report the same score, they experience similar welfare. Scale-use heterogeneity violates this assumption.
  
  I don't think this one is necessary if we can (instead) assume that differences are equivalent. For example, if we assume that person A is actually experiencing higher welfare at all levels of reported score, but the differences between the scores are comparable, then compared to interventions for measured differences in well-being, that shouldn't matter.
  
  I think it could also still be reliable if the distribution between the two populations is the same, even though we don't have specific inter-person comparability between any two compared individuals.
3. unjournal 05 Mar 2026
  
  in Public
  
  equires four implicit assumptions
  
  Give a linked source and citation for this.
4. unjournal 05 Mar 2026
  
  in Public
  
  1 WELLBY = 1-point increase on a 0-10 life satisfaction scale × 1 person × 1 year W = Σi Σt LSit
  
  Those are not clearly defined here, nor the indexing
Visit annotations in context

Annotators

unjournal

URL

uj-wellbeing-workshop.netlify.app/linear-wellby-analysis.html
uj-wellbeing-workshop.netlify.app uj-wellbeing-workshop.netlify.app

About This Workshop | The Unjournal

8
1. unjournal 03 Mar 2026
  
  in Public
  
  We'll produce a practitioner-focused summary document, belief elicitation results with confidence intervals, and structured notes.
  
  Change this to "we hope to" and "We will share outputs". -- I can't guarantee right now that we'll get enough input or have bandwidth to produce this. #implement
2. unjournal 03 Mar 2026
  
  in Public
  
  Participants can opt out of recording for specific segments if needed
  
  Add "and we will ask for final approval before posting anything". #implement
3. unjournal 03 Mar 2026
  
  in Public
  
  (Note: QALYs may be more directly comparable than DALYs for this purpose.)
  
  Leave out the QALYs parentheses bit here. Add "(or QALYs)" after "~1 SD in DALYs". #implement
4. unjournal 03 Mar 2026
  
  in Public
  
  scale?
  
  Add "is a move from 1-3 for one person as good as a move from 1-2 for 2 people"? At the end of this paragraph... "even if these don't hold, does the linear WELLBY aggregation yield 'nearly as much value' for decisionmaking as other potential measures"? #adjust #implement
5. unjournal 03 Mar 2026
  
  in Public
  
  Where is the "neutral point" on the scale?
  
  Remind me why the neutral point is important.
6. unjournal 03 Mar 2026
  
  in Public
  
  When comparing a mental health intervention (measured in WELLBYs) to a physical health intervention (measured in DALYs)
  
  Either of these, especially the physical health intervention, could be measured either way. This overstates it a bit. Perhaps, just to give this as an example, suppose there is a case... #adjust #implement
7. unjournal 03 Mar 2026
  
  in Public
  
  but more work is needed.
  
  "more work is neeeded" That's very much vague -- we nIt would be nice to have at least one specific point suggesting that the difference in scale means potentially matters and merits more study
8. unjournal 03 Mar 2026
  
  in Public
  
  Each has strengths and limitations—and how they relate to each other, and whether either reliably captures what matters for human welfare, directly affects which interventions get prioritized.
  
  I'm allergic to platitudes. IIRC you should have some notes somewhere providing at least one case where this matters .
Visit annotations in context

Annotators

unjournal

URL

uj-wellbeing-workshop.netlify.app/
Feb 2026
daaronr.github.io daaronr.github.io

Just Ask the Model: One-Shot LLM Research Evaluation and Structured Expert Review

8
1. unjournal 26 Feb 2026
  
  in Public
  
  adversarial manipulation.
  
  I don't think we discussed adversarial manipulation or have any results on it, so I'm a little worried that whatever generated this discussion is doing a sort of generic pandering and putting in what it generally expects to see in papers like this.
2. unjournal 26 Feb 2026
  
  in Public
  
  Our results support AI as structured screening and decision support rather than full automation,
  
  This seems like a sort of milquetoast generic caveat. In what sense is this what our AI results support? This seems a bit pandering.
3. unjournal 26 Feb 2026
  
  in Public
  
  xhibiting consistent failure modes: compressed rating scales, uneven criterion coverage, and variable identification of expert-flagged concerns.
  
  I'm guessing this is a bit premature/too much rounding up a few observations to general conclusions, but let me look at the results a bit more carefully.
4. unjournal 26 Feb 2026
  
  in Public
  
  often approach the ceiling implied by human inter-rater variability on several criteria,
  
  This is interesting and strong. It comes across maybe a little bit overstated, so we just need to be careful about how we're framing this result.
5. unjournal 26 Feb 2026
  
  in Public
  
  high-quality but noisy reference signal
  
  I think this is right, but the term "reference signal" sounds technical in an information theoretic sense, and we want to make sure we're not misapplying it.
6. unjournal 26 Feb 2026
  
  in Public
  
  narrative critiques
  
  Yes, we focus on the critiques here, but the on journal evaluations do more than just critique. They discuss, they offer suggestions, implications, et cetera.
7. unjournal 26 Feb 2026
  
  in Public
  
  overing economics and social-science working papers
  
  "covering ... working papers" Is mostly accurate but not quite right. We don't cover all working papers, and we have a specific focus on research relevant to global priorities. We can also evaluate post-journal publication, but I'm not sure how to best summarize this in a simple way in the abstract.
  
  The idea of "open evaluation platform" also could be a bit confusing here because it's not mainly about crowd sourcing. Yes, the "paid expert review packages" cover this, but I don't quite think this is worded in the best possible way.
8. unjournal 26 Feb 2026
  
  in Public
  
  Peer review is strained, and AI tools generating referee-like feedback are already adopted by researchers and commercial services—yet field evidence on how reliably frontier LLMs can evaluate research remains scarce.
  
  This is a decent first sentence, although it bears the marks of AI-generated text. But also I'm not sure if it's really in line with our newest spin on this.
Visit annotations in context

Annotators

unjournal

URL

daaronr.github.io/llm-paper-mirror/
Nov 2025
llm-uj-research-eval.netlify.app llm-uj-research-eval.netlify.app

Comparing LLM and human reviews of social science research using data from Unjournal.org - Data and methods

3
1. unjournal 04 Nov 2025
  
  in Public
  
  “high” reasoning effort
  
  Not relevant to Pro -- cut this
2. unjournal 04 Nov 2025
  
  in Public
  
  OpenAI Responses API
  
  "Responses" is the newer one (as of 4 Nov 2025)
3. unjournal 04 Nov 2025
  
  in Public
  
  returned file id keyed by path, size, and modification time.
  
  what does this mean? "Keyed by" ?
  
  This implies it is kept on the server and won't need a later upload.
Visit annotations in context

Annotators

unjournal

URL

llm-uj-research-eval.netlify.app/methods
llm-uj-research-eval.netlify.app llm-uj-research-eval.netlify.app

Comparing LLM and human peer reviews and ratings of social science research using data from Unjournal.org evaluations

1
1. unjournal 04 Nov 2025
  
  in Public
  
  d the best performance from top reasoning models
  
  Best relative to what? Better than the 'non-top reasoning models'? @valik
Visit annotations in context

Annotators

unjournal

URL

llm-uj-research-eval.netlify.app/
Sep 2025
llm-uj-research-eval.netlify.app llm-uj-research-eval.netlify.app

Comparing LLM and human peer reviews and ratings of social science research using data from Unjournal.org evaluations

3
1. unjournal 09 Sep 2025
  
  in Public
  
  Zhang and Abernethy (2025) propose deploying LLMs as quality checkers to surface critical problems instead of
  
  Is this the only empirical work? I thought there were others underway. Worth our digging into. Fwiw I can do an elicit.org query.
2. unjournal 09 Sep 2025
  
  in Public
  
  but still recommend human oversight.
  
  why? based on some evidence of LLM limitations or risks?
3. unjournal 09 Sep 2025
  
  in Public
  
  emphasize
  
  I'd say 'they argue' instead of 'emphasize'; the latter seems like a statement of absolute truth that we agree with.
Visit annotations in context

Annotators

unjournal

URL

llm-uj-research-eval.netlify.app/
llm-uj-research-eval.netlify.app llm-uj-research-eval.netlify.app

Comparing LLM-Generated Reviews to Human Evaluations in Social Science Research - Quantitative metrics

1
1. unjournal 09 Sep 2025
  
  in Public
  
  The population of papers
  
  Should we adjust "the population of papers" to "the reference is" ? to be more explicit?
Visit annotations in context

Annotators

unjournal

URL

llm-uj-research-eval.netlify.app/numerical_ratings

Annotators

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL