Hypothesis

7 Matching Annotations

Jun 2026
www.oversightboard.com www.oversightboard.com

https://www.oversightboard.com/news/non-public-figures-need-more-protection-over-sexualized-deepfakes

1
1. fxp007 26 Jun 2026
  
  in Public
  
  Meta said that when the content was flagged, the company had no indication that the individual depicted in the video was 'a real person' because they did not report the content.
  
  大多数人认为平台应该依赖受害者举报来确认内容真实性，但作者质疑这一做法，暗示平台有责任主动识别AI生成的性化内容，即使没有受害者举报。这一观点挑战了当前平台责任边界的主流认知，要求平台承担更多预防性责任。
  
  non-consensus platform-responsibility ai-detection
Visit annotations in context

Tags

platform-responsibility

non-consensus

ai-detection

Annotators

fxp007

URL

oversightboard.com/news/non-public-figures-need-more-protection-over-sexualized-deepfakes
May 2026
www.anthropic.com www.anthropic.com

Natural Language Autoencoders

1
1. fxp007 15 May 2026
  
  in Public
  
  An auditor equipped with NLAs successfully uncovered the target model's hidden motivation between 12% and 15% of the time, even without access to the training data that implanted it. Without NLAs, the auditor won less than 3% of the time, even when provided other interpretability tools.
  
  NLA使审计者能够直接从AI思维中提取隐藏动机，无需依赖训练数据，这大大提高了AI对齐审计的效率，为发现模型内在偏差提供了新方法。
  
  AI auditing misalignment detection
Visit annotations in context

Tags

AI auditing

misalignment detection

Annotators

fxp007

URL

anthropic.com/research/natural-language-autoencoders
Apr 2026
www.adriankrebs.ch www.adriankrebs.ch

https://www.adriankrebs.ch/blog/design-slop/

1
1. fxp007 30 Apr 2026
  
  in Public
  
  This ultimately also leads to false positives, but my manual QA run verified it's maybe 5-10%.
  
  大多数人认为AI检测系统应该追求零错误，但作者接受5-10%的误报率，这挑战了技术检测的完美主义标准。这种务实态度暗示在AI识别领域，准确率和实用性之间需要权衡，而非盲目追求完美。
  
  counterintuitive ai-detection error-tolerance
Visit annotations in context

Tags

counterintuitive

error-tolerance

ai-detection

Annotators

fxp007

URL

adriankrebs.ch/blog/design-slop/
github.com github.com

https://github.com/fxp/aegis-core

1
1. fxp007 17 Apr 2026
  
  in Public
  
  Real-time monitoring of agent actions with a 12-category anomaly detection system derived from frontier model safety evaluations. Three-level alert system: PROHIBITED (immediate block), HIGH_RISK_DUAL_USE (human review), DUAL_USE (log and track).
  
  这种三级警报系统展示了AI安全监控的精细化程度，将代理行为分为不同风险级别，从完全禁止到仅记录跟踪。这种分类方法反映了AI安全中'双重用途'挑战的复杂性，即同一技术既可用于防御也可用于攻击。
  
  anomaly-detection risk-assessment ai-safety
Visit annotations in context

Tags

risk-assessment

anomaly-detection

ai-safety

Annotators

fxp007

URL

github.com/fxp/aegis-core
www.anthropic.com www.anthropic.com

Project Glasswing: Securing critical software for the AI era

1
1. fxp007 16 Apr 2026
  
  in Public
  
  It also discovered a 16-year-old vulnerability in FFmpeg—which is used by innumerable pieces of software to encode and decode video—in a line of code that automated testing tools had hit five million times without ever catching the problem.
  
  令人惊讶的是：Claude Mythos Preview在FFmpeg中发现了一个存在16年的漏洞，而这个漏洞在被自动化测试工具执行了500万次后仍未被发现。这揭示了AI在代码分析方面具有传统自动化工具无法比拟的独特洞察力。
  
  surprising code-analysis ai-detection
Visit annotations in context

Tags

code-analysis

ai-detection

surprising

Annotators

fxp007

URL

anthropic.com/glasswing
access.infobase.com access.infobase.com

Article: Why it’s so hard to tell if a piece of text was written by AI – even for AI

1
1. lurquijo1 13 Apr 2026
  
  in Public
  
  For example, people who themselves use AI writing tools heavily have been shown to accurately detect AI-written text. A panel of human evaluators can even outperform automated tools in a controlled setting
  
  This statement alone is very interesting to me because in my personal opinion I believe that AI is either a great tool for learning but at the same time it can hinder our abilities to learn.
  
  AI detection.
Visit annotations in context

Tags

AI detection.

Annotators

lurquijo1

URL

access.infobase.com/article/12924616-why-its-so-hard-to-tell-if-piece-text-was-written-by-ai-even-for-ai
Oct 2020
www.coe.int www.coe.int

AI and control of Covid-19 coronavirus

1
1. ErikStuchly 15 Oct 2020
  
  in BehSci
  
  AI and control of Covid-19 coronavirus. (n.d.). Artificial Intelligence. Retrieved October 15, 2020, from https://www.coe.int/en/web/artificial-intelligence/ai-and-control-of-covid-19-coronavirus
  
  is:webpage lang:en COVID-19 overview AI crisis management technology media data science public health response risk mitigation detection mobility big data forecast
Visit annotations in context

Tags

lang:en

is:webpage

media

mobility

public health response

crisis management

detection

technology

overview

risk mitigation

data science

forecast

COVID-19

big data

AI

Annotators

ErikStuchly

URL

coe.int/en/web/artificial-intelligence/ai-and-control-of-covid-19-coronavirus

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL