25 Matching Annotations
  1. Apr 2026
    1. 或许需要某种「第三方评测、审计机构」来评估 Skills 的数据使用方式、检测潜在安全风险等等。

      这一提议揭示了AI技能安全问题的严重性,以及现有评估体系的不足,暗示未来可能会出现专门针对AI能力的第三方评估机构,这可能是解决信任问题的关键创新点。

    1. we studied emotion-related representations in Claude Sonnet 4.5, a frontier LLM at the time of our investigation.

      【启发】这篇论文只研究了 Claude Sonnet 4.5 一个模型,但它的方法论对所有大模型都适用。这启发了一个迫切的研究议程:对不同架构(GPT、Gemini、Qwen、DeepSeek)的情绪向量进行横向比较,会不会发现系统性的情绪偏差——比如某些模型天生更「焦虑」、某些更「冷漠」?这不仅是学术问题,更是产品选型和安全评估的实际需求。

  2. Sep 2024
  3. May 2024
  4. Nov 2023
    1. Essentially I subjected myself to the conduct of a Feelings Audit. The items I recollected and then retained were those which sat within the boundaries of the research questions and each constituent component of the collective definition of religiosity I had applied in the study. I treated them as a list of items with potential for my bias which might impinge upon the research areas and which needed 'tying down' (Lukiv, 2004, p.1). Some of these items provoked acute and poignant emotions which were also themselves recorded in the audit. The Feelings Audit reflected the stark reality of what makes phenomenological inquiry authentic: with all of my personal dispositions and values, as the researcher, I was at the centre of the interpretative process.
      • for: epoche - feeling audit
  5. Apr 2023
  6. Mar 2022
    1. The audit found that the CIO has limited insight into each Sector’s entire data holdings given a decentralized model, and lack of centralized guidance, standard definitions, and corporate data management system. CMSS representatives acknowledged that the NRCan Data Inventory is not a complete listing of NRCan datasets; however, it was found that it serves as a good starting point in identifying datasets held within the Department. However, per TBS guidance, a complete departmental inventory should include a list of all datasets even if they are identified as not eligible for release.
  7. Jan 2022
  8. Dec 2021
    1. Reasonable estimates for 2020 are: E (WAN) = 110 TWh, E (FAN) = 130 TWh, and E (RAN) =100 TWh, EI (WAN) = 0.02 kWh/GB, EI (FAN) = 0.07 kWh/GB, and EI (RAN) = 0.2 kWh/GB.

      These numbers equate to:

      • EI (WAN) = 0.02 kWh/GB - core networks like internet - backbones
      • EI (FAN) = 0.07 kWh/GB - fibre & DSL including the wifi routers in the home / office
      • EI (RAN) = 0.2 kWh/GB - cellular networks like 4G, 5G

    Tags

    Annotators

  9. Nov 2021
  10. Sep 2021
  11. May 2021
    1. public good

      Additional Points for Accountability to the Civil Society

      1. Key Map of Standard Operating (interlinked) Procedures, Time & Cost including Advocate/Court Fees & Miscellaneous (interlinked) charges for the Litigants much before any case is lodged
      2. Citizens Driven, 3rd Party Audit Mechanism of the entire System to incorporate proper accountability
      3. Court Rating System - Litigants/Civil Society asked to rate the courts.
      4. Pull down menu with all interlinked Acts, Laws, Related Precedents, Judgements, etc to help Litigants File Applications on their own.
      5. Online Case Tracking system for the Litigant with Time, result, cost, etc.
  12. Jan 2021
  13. Sep 2020
  14. Aug 2020
  15. Jul 2020
  16. Apr 2020
  17. Aug 2019
  18. Feb 2019
    1. As with neoliberalism more generally, New Public Management is invisible, part of a new “common sense” that has somehow become hegemonic, whereby the “entrepreneurial spirit” has infused the public sector, leading to “businesslike government”. As with the claims of neoliberalism more generally as to its positive outputs in terms of prosperity, NPM has never been shown to have been successful even in its own terms. NPM “introduced punishments and rewards to produce better services with lesser staff. Instead of having freed energies and creativity of employees formerly shackled by their bureaucratic turfs, NPM reforms have bound energies into theatrical audit performances at the cost of work and killed creativity in centralizing resources and hollowing out professional autonomy... Fundamental deprivation of the legitimacy of public employees . . .has traumatized many most-committed employees and driven others toward a Soviet-type double standard.” (Juha Siltala, New Public Management : The evidence-based worst practice?, Administration; Vol. 45, No. 4.; 2013 pp. 468-493) Sekera quotes Christopher Pollitt et al., who “after compiling a database of 518 studies of NPM in Europe, determined that “more than 90% of what are seen by experts as the most significant and relevant studies contain no data at all on outcomes” and that of the 10% that had outcomes information, only 44% of those, or 4% of the total, found any improvements in terms of outcomes.” But in the end, the point of NPM is less that of measureable outcomes, and more that of the ideological victory of turning the public and its good into customers exercising their “choices” (see tax revolt example in Duggan), along of course with the radical disempowering of public administration workers and their unions, instituting “cost savings” by cutting their real income and putting more and more of the public sector’s production directly into the profit-making market.
    2. “Public performance measurement systems often have unfortunate or disastrous unintended consequences. Most recently, a pay-for-performance scheme at the Veterans Health Administration (V.A.) led to falsified wait-time records and care so delayed that, in some cases, patients died awaiting medical attention. Twenty-five years of studies have shown that “pay-for-performance” doesn’t work in either the public or private sector: such systems smother creativity, crowd out intrinsic motivation and invite gaming and generally fail to achieve intended results.”
  19. Nov 2018
  20. Oct 2018
  21. Jan 2018
  22. Nov 2014
    1. This criterion requires an independent security review has been performed within the 12 months prior to evaluation. This review must cover both the design and the implementation of the app and must be performed by a named auditing party that is independent of the tool's main development team. Audits by an independent security team within a large organization are sufficient. Recognizing that unpublished audits can be valuable, we do not require that the results of the audit have been made public, only that a named party is willing to verify that the audit took place.