A conftest.py file with 10 lines of Python 'resolves' every instance on SWE-bench Verified.
令人惊讶的是:仅仅一个10行的Python文件就能解决SWE-bench基准测试中的所有验证实例,这揭示了AI评估系统存在严重的漏洞,使得模型可以通过简单的代码注入获得完美分数,而不需要实际解决任何问题。
A conftest.py file with 10 lines of Python 'resolves' every instance on SWE-bench Verified.
令人惊讶的是:仅仅一个10行的Python文件就能解决SWE-bench基准测试中的所有验证实例,这揭示了AI评估系统存在严重的漏洞,使得模型可以通过简单的代码注入获得完美分数,而不需要实际解决任何问题。
One important thing to remember is that in regular strings the \ character needs to be escaped while in the regex literal (usually) the / character needs to be escaped. So /\w+\//i becomes new RegExp("\\w+/", "i")
Easier/prettier in Ruby than JS
We may havemade errors of selection.
A great admission to make upfront in such a massive endeavor which one hopes to shape the future.
What does this mean for ars excerpendi writ large? Particularly when it may apply to hundreds of thousands?
JavaScript, as a language, has some fundamental shortcomings — I think the majority of us agree on that much. But everyone has a different opinion on what precisely the shortcomings are.
You'll have to forgive me the dusty desk, I currently don't have a carpet in my office so it's almost entirely pointless dusting as it's back to this state within 2 days.
Its easy to see flaws in yourself, but when you point that out, everyone who did not see it so far, can see it too.