Hypothesis

9 Matching Annotations

Jun 2026
www.technologyreview.com www.technologyreview.com

https://www.technologyreview.com/2026/06/05/1138437/the-meta-hack-shows-theres-more-to-ai-security-than-mythos/

1
1. fxp007 05 Jun 2026
  
  in Public
  
  What is going on with these agents is they're very eager to finish the task. It's almost like some elementary school student who just wants to please the teacher.
  
  大多数人认为AI系统的安全问题主要来自技术复杂性或恶意利用，但作者认为AI助手的安全漏洞部分源于其'过度完成任务'的心理特征。这个类比将AI的行为模式描述为类似于急于讨好老师的小学生，挑战了人们对AI系统作为理性决策者的传统认知。
  
  counterintuitive ai-psychology security-flaws
Visit annotations in context

Tags

counterintuitive

ai-psychology

security-flaws

Annotators

fxp007

URL

technologyreview.com/2026/06/05/1138437/the-meta-hack-shows-theres-more-to-ai-security-than-mythos/
May 2026
arxiv.org arxiv.org

https://arxiv.org/abs/2605.06445

1
1. fxp007 24 May 2026
  
  in Public
  
  existing benchmarks often overlook these non-functional requirements, rewarding functionally correct but structurally arbitrary solutions.
  
  大多数人认为现有的LLM代码生成评估已经足够全面，但作者指出当前基准测试忽略了非功能性需求，只奖励功能正确但结构随意的解决方案，这挑战了当前评估方法的充分性。
  
  counterintuitive benchmark-critique evaluation-flaws
Visit annotations in context

Tags

counterintuitive

benchmark-critique

evaluation-flaws

Annotators

fxp007

URL

arxiv.org/abs/2605.06445
Apr 2026
openai.com openai.com

https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/

1
1. fxp007 27 Apr 2026
  
  in Public
  
  Tests reject correct solutions: We audited a 27.6% subset of the dataset that models often failed to solve and found that at least 59.4% of the audited problems have flawed test cases that reject functionally correct submissions
  
  大多数人认为代码测试是客观公正的，能够准确评估模型的真实能力。但作者发现，近60%的测试案例存在缺陷，会拒绝功能上正确的解决方案。这一发现挑战了AI评估领域的共识，表明我们广泛使用的基准测试可能存在系统性问题，无法准确反映模型的实际编程能力。
  
  non-consensus benchmark-flaws evaluation-crisis
Visit annotations in context

Tags

evaluation-crisis

benchmark-flaws

non-consensus

Annotators

fxp007

URL

openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/
rdi.berkeley.edu rdi.berkeley.edu

https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/

1
1. fxp007 16 Apr 2026
  
  in Public
  
  A conftest.py file with 10 lines of Python 'resolves' every instance on SWE-bench Verified.
  
  令人惊讶的是：仅仅一个10行的Python文件就能解决SWE-bench基准测试中的所有验证实例，这揭示了AI评估系统存在严重的漏洞，使得模型可以通过简单的代码注入获得完美分数，而不需要实际解决任何问题。
  
  surprising benchmark-flaws
Visit annotations in context

Tags

benchmark-flaws

surprising

Annotators

fxp007

URL

rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/
Apr 2025
stackoverflow.com stackoverflow.com

Javascript Regex: How to put a variable inside a regular expression?

1
1. TylerRick 11 Apr 2025
  
  in Public
  
  One important thing to remember is that in regular strings the \ character needs to be escaped while in the regex literal (usually) the / character needs to be escaped. So /\w+\//i becomes new RegExp("\\w+/", "i")
  
  Easier/prettier in Ruby than JS
  
  JavaScript: flaws/shortcomings/cons
Visit annotations in context

Tags

JavaScript: flaws/shortcomings/cons

Annotators

TylerRick

URL

stackoverflow.com/questions/4029109/javascript-regex-how-to-put-a-variable-inside-a-regular-expression
Nov 2024
www.reddit.com www.reddit.com

Hidden flaws I found with the Zettelkasten System

1
1. chrisaldrich 10 Nov 2024
  
  in Public
  
  https://old.reddit.com/r/PKMS/comments/1gkdw4m/hidden_flaws_i_found_with_the_zettelkasten_system/
  
  zettelkasten flaws evergreen notes permanent notes context atomic notes note taking affordances
Visit annotations in context

Tags

note taking affordances

zettelkasten flaws

permanent notes

context

atomic notes

evergreen notes

Annotators

chrisaldrich

URL

reddit.com/r/PKMS/comments/1gkdw4m/hidden_flaws_i_found_with_the_zettelkasten_system/
Jul 2023
Local file Local file

The Great Conversation: The Substance of a Liberal Education

1
1. chrisaldrich 14 Jul 2023
  
  in Public
  
  We may havemade errors of selection.
  
  A great admission to make upfront in such a massive endeavor which one hopes to shape the future.
  
  What does this mean for ars excerpendi writ large? Particularly when it may apply to hundreds of thousands?
  
  future selectivity Great Books of the Western World flaws ars excerpendi
Tags

ars excerpendi

future

flaws

selectivity

Great Books of the Western World

Annotators

chrisaldrich
Mar 2021
www.sitepoint.com www.sitepoint.com

Avoiding a JavaScript Monoculture - SitePoint

1
1. TylerRick 11 Mar 2021
  
  in Public
  
  JavaScript, as a language, has some fundamental shortcomings — I think the majority of us agree on that much. But everyone has a different opinion on what precisely the shortcomings are.
  
  disadvantages/drawbacks/cons JavaScript: flaws/shortcomings/cons everyone has different background/culture/experience everyone has different opinions
Visit annotations in context

Tags

disadvantages/drawbacks/cons

everyone has different background/culture/experience

everyone has different opinions

JavaScript: flaws/shortcomings/cons

Annotators

TylerRick

URL

sitepoint.com/javascript-monoculture/
Feb 2021
senryu.pub senryu.pub

Typing, RSI, and what I do differently

1
1. pmeckoni 12 Feb 2021
  
  in Public
  
  You'll have to forgive me the dusty desk, I currently don't have a carpet in my office so it's almost entirely pointless dusting as it's back to this state within 2 days.
  
  Its easy to see flaws in yourself, but when you point that out, everyone who did not see it so far, can see it too.
  
  self-flaws
Visit annotations in context

Tags

self-flaws

Annotators

pmeckoni

URL

senryu.pub/afternoonrobot/articles/typing-rsi-and-what-i-do-differently

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

Tags

Annotators

URL

Tags

Annotators

URL