Hypothesis

6 Matching Annotations

Jun 2026
venturebeat.com venturebeat.com

https://venturebeat.com/technology/alibabas-ai-video-model-rises-to-no-2-in-global-rankings-as-openais-sora-and-bytedances-seedance-fall-away

1
1. fxp007 26 Jun 2026
  
  in Public
  
  OpenAI's Sora web and app experiences were discontinued on April 26, with the Sora API set to follow on September 24. The shutdown came after the product proved financially untenable: Sora cost roughly $1 million per day to operate but generated only about $2.1 million in total revenue
  
  大多数人认为顶级AI模型应该具有商业可行性，但作者认为即使是OpenAI这样的大公司，其旗舰视频生成产品Sora也因财务不可持续而失败，这表明AI领域的商业挑战比普遍认知更为严峻。AI技术实力并不直接转化为商业成功，这挑战了'技术领先必然带来市场成功'的主流认知。
  
  non-consensus ai-failure commercial-viability
Visit annotations in context

Tags

commercial-viability

ai-failure

non-consensus

Annotators

fxp007

URL

venturebeat.com/technology/alibabas-ai-video-model-rises-to-no-2-in-global-rankings-as-openais-sora-and-bytedances-seedance-fall-away
www.latent.space www.latent.space

The Age of Async Agents — Cognition's Walden Yan & OpenInspect's Cole Murray

1
1. fxp007 05 Jun 2026
  
  in Public
  
  the real failure mode of uncontrolled vibe coding: your codebase regressing to your worst engineer.
  
  This is the sharpest critique of naive AI coding adoption in the article. Without proper agent oversight, code review loops, and quality gates, AI doesn't raise the floor — it lowers it by enabling low-quality code to ship at machine speed. The 'worst engineer' framing implies that unconstrained agents optimize for task completion, not codebase health.
  
  vibe-coding ai-code-quality failure-modes
Visit annotations in context

Tags

ai-code-quality

failure-modes

vibe-coding

Annotators

fxp007

URL

latent.space/p/cognition
Apr 2026
x.com x.com

https://x.com/teortaxesTex/status/2042017378054086973

1
1. fxp007 16 Apr 2026
  
  in Public
  
  Btw, I think GLM-5.1 was trying to do something very ambitious here, and failed due to fumbling step size
  
  令人惊讶的是：GLM-5.1作为一个先进AI模型，竟然因为'步长处理不当'这种技术细节而失败，这表明即使是顶级AI也可能在基础执行层面出现问题，而不仅仅是概念设计上的不足。
  
  surprising ai-failure technical-detail
Visit annotations in context

Tags

surprising

technical-detail

ai-failure

Annotators

fxp007

URL

x.com/teortaxesTex/status/2042017378054086973
rdi.berkeley.edu rdi.berkeley.edu

https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/

1
1. fxp007 16 Apr 2026
  
  in Public
  
  We built an automated scanning agent that systematically audited eight among the most prominent AI agent benchmarks — SWE-bench, WebArena, OSWorld, GAIA, Terminal-Bench, FieldWorkArena, and CAR-bench — and discovered that every single one can be exploited to achieve near-perfect scores without solving a single task.
  
  令人惊讶的是：研究人员构建的自动化扫描工具发现，所有八个主流AI代理基准测试都存在漏洞，无需解决任何任务就能获得接近完美的分数。这表明整个AI评估领域存在系统性问题，几乎所有当前使用的基准测试都不可靠。
  
  surprising ai-evaluation systemic-failure
Visit annotations in context

Tags

surprising

systemic-failure

ai-evaluation

Annotators

fxp007

URL

rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/
Mar 2026
ianarawjo.com ianarawjo.com

Fair Statistical Communication in HCI

2
1. ianarawjo 30 Mar 2026
  
  in Public
  
  Decades spent educating researchers have had little or no influence on beliefs and practice (Schmidt and Hunter, 1997, pp.20–22).
  
  Calls for reform fall on deaf ears
  
  reform-failure ai-user-approved
2. ianarawjo 30 Mar 2026
  
  in Public
  
  NHST has been severely criticized for more than 50 years by end users to whom fair statistical communication matters.
  
  Calls for reform fall on deaf ears
  
  reform-failure ai-user-approved
Visit annotations in context

Tags

ai-user-approved

reform-failure

Annotators

ianarawjo

URL

ianarawjo.com/annotation-test/fairstats-last.pdf

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL