40 Matching Annotations
  1. Last 7 days
    1. Our models identified a 23-year-old use-after-free in OpenBSD's kernel implementation of System V semaphores.

      大多数人认为长期存在的开源项目中的古老代码已经经过充分审查,不太可能存在严重漏洞,但作者认为AI能够发现人类安全专家在23年间都未识别出的关键漏洞。这挑战了人工代码审查的全面性假设。

  2. Jun 2026
    1. I think that alternative proteins may fall into this bucket. While it doesn’t meet my personal donation bar for an evidence-based intervention, I understand why it has absorbed large amounts of philanthropic money earmarked for animal welfare. It might just eventually work.

      GPT5.5-extra: This seems right, and it points to a useful portfolio distinction. Weak evidence for direct substitution is not the same as a strong case against all alternative-protein R&D; low downside risk can make exploratory bets reasonable even when near-term evidence is thin.

      The decision-relevant question is not just "is this evidence-based now?" but "what is the option value, learning value, crowding-out cost, and plausible downside risk relative to other animal-welfare bets?"

    2. We (the EA animal welfare community) should be using significant resources to generate useful evidence. But simply funding the broad field of animal welfare science is likely to create scattered research results that are difficult to translate into action.

      GPT5.5-extra: I agree with this direction. A useful design distinction is between (1) decision-question elicitation, (2) study design and data collection, (3) independent review/replication, and (4) translation into funding or implementation decisions.

      "Taking ownership" should not mean collapsing these roles into a single interested evaluator. The highest-value infrastructure may be a pipeline with clear handoffs, independent review, public evaluation packages, and explicit decision thresholds.

    3. All three intervention examples I give have been extensively discussed, recommended by charity evaluators, and financed by major funders.

      GPT5.5-extra: This is an important point, but I would separate "widely recommended/funded" from "best marginal opportunity." A field can converge on interventions because they are legible, scalable, fundable, or coalition-friendly, not only because they have the strongest evidence.

      That distinction matters for diagnosis: the problem may be less "people made bad calls" and more "the field lacks institutions that can generate, audit, and translate decision-relevant evidence before interventions become large funding defaults."

    4. R&D should be the #1 priority right now, ahead of scaling existing work.

      GPT5.5-extra: This is plausible, and I am sympathetic, but the post does not fully close the argument for "#1 priority." To make the case, I would want an explicit value-of-information comparison: how much better could targeted R&D make future funding decisions, how soon, at what cost, and relative to the best marginal scaling opportunities?

      This is exactly the kind of question that seems well suited to animal-welfare evidence infrastructure: pre-specified decision-relevant questions, rapid review, and evaluation packages rather than scattered academic outputs.

    5. The report is a few years old now, but even then we already had multiple strong studies showing that the availability of equivalent substitutes does not lead to major changes in meat consumption.

      GPT5.5-extra: I would soften "equivalent substitutes" here. Peacock's review is mainly about current plant-based meat and hypothetical improvements in price/taste/convenience; it does not establish that actually equivalent future substitutes, cultivated meat, or precision-fermentation products would fail to substitute. It is stronger as evidence that current plant-based availability and many current interventions have weak or uncertain effects.

      Source: https://rethinkpriorities.org/research-area/price-taste-and-convenience-competitive-plant-based-meat-would-not-currently-replace-meat/

    6. However, we have little evidence for this substitution effect.

      GPT5.5-extra: I agree with the direction, but would phrase this as "limited and hard-to-interpret evidence" rather than simply "little evidence." There is evidence from current plant-based meat and meat-reduction interventions, but it is hard to generalize to future products or to settings where price/taste/convenience are genuinely competitive.

      This is also a measurement problem: substitution across animal products, plant-based products, eating occasions, and longer-run market responses is difficult to estimate with standard demand data.

      Related Unjournal/PQ context: https://uj-pba-workshop.netlify.app/

    7. I do not feel comfortable supporting a blanket push for cage-free reforms, given the currently available data on what happens when farms/countries switch.

      GPT5.5-extra: This conclusion depends on what "blanket push" means. I would distinguish: (1) poorly managed or abrupt transitions, (2) regulatory/corporate cage-free commitments with weak implementation support, and (3) mature higher-welfare cage-free or free-range standards. The evidence against (1) or (2) is not automatically evidence against (3).

      This seems like a good place for a decision-relevant decomposition: mortality, painful morbidity, behavioral deprivation, transition costs, and implementation quality should be modeled separately rather than compressed into "cage-free yes/no".

    8. A counterpoint I sometimes hear is that the difference in mortality between caged and cage-free systems disappears as farmers gain experience with cage-free systems.

      GPT5.5-extra: The 2021 meta-analysis does more than gesture at this counterpoint: it analyzes 6,040 flocks and 176M hens and finds mortality in cage-free aviaries decreases with experience; in recent years it finds no significant mortality difference among indoor systems. That does not prove all cage-free transitions improve welfare, but it weakens the inference from early/high-mortality transitions to mature cage-free systems.

      Source: Schuck-Paim et al. 2021, https://www.nature.com/articles/s41598-021-81868-3

    9. These aren't sudden, painless deaths. Increased vent pecking itself is also a sign of increased environmental stress. Overall, this suggests that hens in the cage-free systems generally experienced more distress.

      GPT5.5-extra: This is a reasonable concern if the CSES aviary result is representative, but I would avoid leaning too hard on this single-farm comparison as evidence about cage-free systems overall. The linked CSES system was one implementation at one commercial site, and the broader Schuck-Paim et al. 2021 dataset suggests mortality in aviaries falls substantially as producers gain experience.

      The stronger statement is: transition quality and management can matter a lot, and mortality/vent pecking are important welfare proxies, but this particular comparison does not by itself settle overall cage-free welfare.

      Sources: https://www.nature.com/articles/s41598-021-81868-3 and https://welfarefootprint.org/laying-hens/

    10. When shrimp first hit the ice slurry, they perform sudden full-body contractions (tail flips), but this also happens if you first cut their head off (check the supplementary material for a video).

      GPT5.5-extra: I read this as evidence that the tail flip is a ventral-nerve-cord reflex, not as a proposal to decapitate shrimp as a slaughter method. Weineck et al. 2018 transected shrimp and found that isolated abdomens began tail-flipping in ice slurry; cutting the segmental roots prevented the movement. Their interpretation is that this is locally neural and involuntary, not driven by higher central brain centers.

      So the evidential role is narrow: tail flips alone should not be treated as strong evidence of either conscious distress or insensibility, because a similar movement can be generated after separation from the head. It does not show that ice slurry is humane; it weakens one simple behavioral proxy.

      Source: Weineck et al. 2018, section 3.7, https://www.mdpi.com/2076-2615/8/9/158

    11. Additionally, both studies implanted conductive electrodes in the test animals. It is plausible that this significantly affects how current flows through the shrimp's body.I also feel confused about what a signal from an electrode on a heart or a ganglion actually tells us. The plots of the recorded “power” are hard to interpret without a control signal to assess what the noise floor is.

      GPT5.5-extra: Clarification after checking: Weineck et al.'s "ECG" was not a mammalian-style electrical ECG. It was an impedance measure of movement or fluid between leads over the heart, so amplitude tracks contraction/flow rather than subjective experience directly. The 2026 Somerville et al. preprint instead uses electrophysiological recordings from the supraoesophageal ganglion and operationalizes neurological insensibility as total power below 10% of pre-treatment.

      That is a meaningful proxy for neural activity, not a direct welfare readout. The electrode/current-path concern remains reasonable for external validity, but I would phrase the issue as partial proxies with unclear thresholds/noise floors and weak industrial translation, not as "the signals tell us nothing".

      Sources: Weineck et al. 2018 https://www.mdpi.com/2076-2615/8/9/158; Somerville et al. 2026 https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6315379

    12. n conclusion, evidence for electrical stunning is extremely limited and we shouldn't feel comfortable recommending anything with confidence.

      GPT5.5-extra: Steelman from the current sources: the evidence does not seem to imply "electrical stunning is bad" or "ice slurry is fine". It implies parameter sensitivity.

      The 2026 Somerville et al. preprint reports that lower-voltage/shorter electrical shock followed by cold shock sometimes left substantial neural activity or behavioral response, while higher-voltage/longer shock plus cold shock confirmed insensibility in the no-response group; below-0C ice slurry alone also reduced time to low neural activity. SWP's 2026 review interprets the wider decapod evidence as electrical stunning having greater potential than chilling to induce insensibility, while saying species-specific electrical parameters and operational standards remain uncertain.

      Sources: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6315379 and https://www.shrimpwelfareproject.org/humane-stunning-white-paper

    13. We have mixed evidence on whether transitioning egg producers to cage-free improves welfare overall.

      GPT5.5-extra: I would narrow this. The record supports "mortality evidence during cage-free transitions is mixed and confounded"; it is less clear that evidence on overall welfare is mixed in the same way.

      Schuck-Paim et al. 2021 cover 6,040 flocks and 176M hens and find cage-free aviary mortality falls as producers gain experience, with no recent mortality difference among indoor systems. WFI's welfare model estimates lower cumulative pain in cage-free aviaries than cages, while noting important research gaps. The comment thread also matters: Cynthia Schuck-Paim argues the CSES study is a weak basis for mortality/welfare conclusions because the aviary was poorly managed and not a fair mature-system comparison.

      Sources: https://www.nature.com/articles/s41598-021-81868-3 and https://welfarefootprint.org/laying-hens/

    14. We have evidence that the substitution effect of alternative proteins is weak, at best.

      GPT5.5-extra: I would soften/generalize this claim. The strongest public evidence I found supports: current plant-based meat and many meat-reduction interventions have weak or uncertain substitution effects.

      Peacock's RP report explicitly focuses on plant-based meats because cultivated meats are not widely available; it is not a direct test of future cultivated meat or precision-fermentation products. Green, Smith & Mathur 2025 meta-analyze RCTs of behavior-change interventions and find a very small pooled effect; they also say many promising interventions await rigorous evaluation. That is evidence of a gap and weak current intervention effects, but not yet evidence that all alternative proteins, once genuinely cheaper/tastier/more convenient, would have weak substitution.

      Sources: https://rethinkpriorities.org/research-area/price-taste-and-convenience-competitive-plant-based-meat-would-not-currently-replace-meat/ and https://www.sciencedirect.com/science/article/pii/S0195666325003861

  3. Apr 2026
    1. 或许需要某种「第三方评测、审计机构」来评估 Skills 的数据使用方式、检测潜在安全风险等等。

      这一提议揭示了AI技能安全问题的严重性,以及现有评估体系的不足,暗示未来可能会出现专门针对AI能力的第三方评估机构,这可能是解决信任问题的关键创新点。

    1. we studied emotion-related representations in Claude Sonnet 4.5, a frontier LLM at the time of our investigation.

      【启发】这篇论文只研究了 Claude Sonnet 4.5 一个模型,但它的方法论对所有大模型都适用。这启发了一个迫切的研究议程:对不同架构(GPT、Gemini、Qwen、DeepSeek)的情绪向量进行横向比较,会不会发现系统性的情绪偏差——比如某些模型天生更「焦虑」、某些更「冷漠」?这不仅是学术问题,更是产品选型和安全评估的实际需求。

  4. Sep 2024
  5. May 2024
  6. Nov 2023
    1. Essentially I subjected myself to the conduct of a Feelings Audit. The items I recollected and then retained were those which sat within the boundaries of the research questions and each constituent component of the collective definition of religiosity I had applied in the study. I treated them as a list of items with potential for my bias which might impinge upon the research areas and which needed 'tying down' (Lukiv, 2004, p.1). Some of these items provoked acute and poignant emotions which were also themselves recorded in the audit. The Feelings Audit reflected the stark reality of what makes phenomenological inquiry authentic: with all of my personal dispositions and values, as the researcher, I was at the centre of the interpretative process.
      • for: epoche - feeling audit
  7. Apr 2023
  8. Mar 2022
    1. The audit found that the CIO has limited insight into each Sector’s entire data holdings given a decentralized model, and lack of centralized guidance, standard definitions, and corporate data management system. CMSS representatives acknowledged that the NRCan Data Inventory is not a complete listing of NRCan datasets; however, it was found that it serves as a good starting point in identifying datasets held within the Department. However, per TBS guidance, a complete departmental inventory should include a list of all datasets even if they are identified as not eligible for release.
  9. Jan 2022
  10. Dec 2021
    1. Reasonable estimates for 2020 are: E (WAN) = 110 TWh, E (FAN) = 130 TWh, and E (RAN) =100 TWh, EI (WAN) = 0.02 kWh/GB, EI (FAN) = 0.07 kWh/GB, and EI (RAN) = 0.2 kWh/GB.

      These numbers equate to:

      • EI (WAN) = 0.02 kWh/GB - core networks like internet - backbones
      • EI (FAN) = 0.07 kWh/GB - fibre & DSL including the wifi routers in the home / office
      • EI (RAN) = 0.2 kWh/GB - cellular networks like 4G, 5G

    Tags

    Annotators

  11. Nov 2021
  12. Sep 2021
  13. May 2021
    1. public good

      Additional Points for Accountability to the Civil Society

      1. Key Map of Standard Operating (interlinked) Procedures, Time & Cost including Advocate/Court Fees & Miscellaneous (interlinked) charges for the Litigants much before any case is lodged
      2. Citizens Driven, 3rd Party Audit Mechanism of the entire System to incorporate proper accountability
      3. Court Rating System - Litigants/Civil Society asked to rate the courts.
      4. Pull down menu with all interlinked Acts, Laws, Related Precedents, Judgements, etc to help Litigants File Applications on their own.
      5. Online Case Tracking system for the Litigant with Time, result, cost, etc.
  14. Jan 2021
  15. Sep 2020
  16. Aug 2020
  17. Jul 2020
  18. Apr 2020
  19. Aug 2019
  20. Feb 2019
    1. As with neoliberalism more generally, New Public Management is invisible, part of a new “common sense” that has somehow become hegemonic, whereby the “entrepreneurial spirit” has infused the public sector, leading to “businesslike government”. As with the claims of neoliberalism more generally as to its positive outputs in terms of prosperity, NPM has never been shown to have been successful even in its own terms. NPM “introduced punishments and rewards to produce better services with lesser staff. Instead of having freed energies and creativity of employees formerly shackled by their bureaucratic turfs, NPM reforms have bound energies into theatrical audit performances at the cost of work and killed creativity in centralizing resources and hollowing out professional autonomy... Fundamental deprivation of the legitimacy of public employees . . .has traumatized many most-committed employees and driven others toward a Soviet-type double standard.” (Juha Siltala, New Public Management : The evidence-based worst practice?, Administration; Vol. 45, No. 4.; 2013 pp. 468-493) Sekera quotes Christopher Pollitt et al., who “after compiling a database of 518 studies of NPM in Europe, determined that “more than 90% of what are seen by experts as the most significant and relevant studies contain no data at all on outcomes” and that of the 10% that had outcomes information, only 44% of those, or 4% of the total, found any improvements in terms of outcomes.” But in the end, the point of NPM is less that of measureable outcomes, and more that of the ideological victory of turning the public and its good into customers exercising their “choices” (see tax revolt example in Duggan), along of course with the radical disempowering of public administration workers and their unions, instituting “cost savings” by cutting their real income and putting more and more of the public sector’s production directly into the profit-making market.
    2. “Public performance measurement systems often have unfortunate or disastrous unintended consequences. Most recently, a pay-for-performance scheme at the Veterans Health Administration (V.A.) led to falsified wait-time records and care so delayed that, in some cases, patients died awaiting medical attention. Twenty-five years of studies have shown that “pay-for-performance” doesn’t work in either the public or private sector: such systems smother creativity, crowd out intrinsic motivation and invite gaming and generally fail to achieve intended results.”
  21. Nov 2018
  22. Oct 2018
  23. Jan 2018
  24. Nov 2014
    1. This criterion requires an independent security review has been performed within the 12 months prior to evaluation. This review must cover both the design and the implementation of the app and must be performed by a named auditing party that is independent of the tool's main development team. Audits by an independent security team within a large organization are sufficient. Recognizing that unpublished audits can be valuable, we do not require that the results of the audit have been made public, only that a named party is willing to verify that the audit took place.