Auto-generate screen reader specs from UI designs
这一功能令人惊讶地将无障碍设计前置到开发流程的起点,而非传统的工作流程末端。AI代理能够直接从实际设计组件生成屏幕阅读器和ARIA规范,这可能是无障碍设计实践的重大转变,使可访问性成为设计过程的核心部分,而非事后考虑。
Auto-generate screen reader specs from UI designs
这一功能令人惊讶地将无障碍设计前置到开发流程的起点,而非传统的工作流程末端。AI代理能够直接从实际设计组件生成屏幕阅读器和ARIA规范,这可能是无障碍设计实践的重大转变,使可访问性成为设计过程的核心部分,而非事后考虑。
The integration also connects to Upwork's AI agent Uma, which helps automate parts of the hiring and execution process once a project is underway.
AI正在从单一工具演变为完整的工作生态系统,这种从招聘到执行的自动化整合展示了AI如何重塑整个工作流程。这不仅提高了效率,也可能导致传统中介角色的消失,同时创造了新的AI服务市场,值得深入思考这种转变对不同行业的影响。
Ovren puts AI frontend and backend engineers on it - they work inside your real codebase, execute scoped tasks, and deliver reviewable code updates.
这代表了一个令人惊讶的AI工程能力跃迁——从代码建议者转变为实际执行者。这种转变意味着AI不再仅仅是辅助工具,而是可以直接在真实代码库中执行任务并产出可审查的代码更新,这可能是AI在软件开发领域最具颠覆性的应用方向。
The AI toolkit for building and maintaining browser automations
这个项目将AI技术与浏览器自动化相结合,代表了一个令人兴奋的研究方向。将AI模型与浏览器自动化工具集成,可以创建能够理解网页内容、进行复杂交互并自主解决问题的智能自动化系统,这大大扩展了传统自动化工具的能力边界。
Add dev-tools package with wt worktree manager CLI - New packages/dev-tools with standalone wt CLI for git worktree management - Commands: wt new, wt scratch, wt prune - Uses Vertex AI (gemini-2.5-flash) for branch name generation via gcloud ADC
令人惊讶的是:这个项目不仅是一个浏览器自动化工具,还内置了一个使用AI生成分支名称的Git工作树管理器。它利用Google的Vertex AI和gemini-2.5-flash模型来自动创建有意义的分支名称,这展示了AI在开发工作流中的创新应用。
Performance: dev-browser: 3m53s, $0.88, 100% success rate — beats MCP configs, Chrome extensions, 'browser skill' stacks.
令人惊讶的是:这种新技术不仅在功能上超越传统方法,在性能指标上也取得了显著优势,100%的成功率和相对较低的成本显示了其技术成熟度和实用性,这可能会使现有的浏览器自动化解决方案迅速过时。
One Agent can now: open X (Twitter), scroll the feed, extract tweets, return clean JSON. No plugins. No extensions. No orchestration.
令人惊讶的是:单个AI代理现在能够独立完成复杂的社交媒体数据提取任务,无需任何插件或扩展编排,这展示了AI自主操作能力的惊人进步,可能会彻底改变数据收集和自动化工作流程。
Austin built the whole pipeline from his Claude Code terminal using the Notion API. He brain-dumped the desired outcome using Monologue, let Claude Code create the database and data pipeline, and pasted the generated instructions into the Notion custom agent setup.
令人惊讶的是:非技术人员可以通过语音转文本工具(Monologue)直接向AI描述需求,然后由AI自动构建整个数据管道和代理系统,这大大降低了技术门槛,使非技术团队成员也能构建复杂的AI工作流程。
Open Loop + Infinite Demand = Creative Amplifiers. Content creation & marketing strategy. AI can generate a thousand ad variations or blog posts.
令人惊讶的是:AI在创意营销领域的能力已经达到可以瞬间生成数千个广告变体或博客帖子的程度,这展示了AI作为创意放大器的潜力。然而,最终选择仍需人类判断,这揭示了AI与人类创造力之间的互补关系。
Closed Loop + Finite Demand = Efficiency Plays. AI bookkeeping categorizes transactions, reconciles accounts, files returns. Deterministic rules applied to numbers.
令人惊讶的是:即使是有限需求领域,AI也能通过确定性规则实现显著效率提升。AI记账系统能够自动处理分类、对账和报税等任务,这表明即使在传统上需要人工判断的财务领域,AI也能通过标准化流程创造价值。
accounting and auditing showing nearly a 20 percent jump on GDPval and even domains like police / detective work showing a nearly 30 percent improvement.
会计审计能力 4 个月提升 20%,警察/刑侦工作提升近 30%——这两个数字分别代表了两种截然不同的威胁:前者是白领知识工作(会计师)的自动化压力正在加速;后者则更令人不安,AI 在犯罪调查领域的快速进步,意味着监控和执法能力正在以同样的速度提升。GDPval 把这两件事放在同一个坐标轴上,本身就是一个值得深思的设计选择。
METR conclude that “the length of tasks AI can do is doubling every 7 months”. I’m not convinced that pattern will continue to hold, but it’s an eye-catching way of illustrating current trends in agent capabilities.
a potential pattern to watch. Even if it doesn't follow a exponential trajectory. If it keeps the pattern in tact, by August we should see days of SE work being done independently by models.
The chart shows tasks that take humans up to 5 hours, and plots the evolution of models that can achieve the same goals working independently. As you can see, 2025 saw some enormous leaps forward here with GPT-5, GPT-5.1 Codex Max and Claude Opus 4.5 able to perform tasks that take humans multiple hours—2024’s best models tapped out at under 30 minutes.
Interesting metric. Until 2024 models were capable of independently execute software engineering tasks that take a person under 30mins. This chimes with my personal observation that there was no real time saving involved, or regular automation can handle it. In 2025 that jumped to tasks taking a person multiple hours. With Claude Opus 4.5 reaching 4:45 hrs. That is a big jump. How do you leverage that personally?
f you define agents as LLM systems that can perform useful work via tool calls over multiple steps then agents are here and they are proving to be extraordinarily useful. The two breakout categories for agents have been for coding and for search.
recognisable, ai agents as chunked / abstracted away automation. This also creates the pitfall [[After claiming to redeploy 4,000 employees and automating their work with AI agents, Salesforce executives admit We were more confident about…. - The Times of India]] where regular automation is replaced by AI.
Most useful for search and for coding
Home security company Vivint, which uses Agentforce to handle customer support for 2.5 million customers, experienced these reliability problems firsthand. Despite providing clear instructions to send satisfaction surveys after each customer interaction, The Information reported that Agentforce sometimes failed to send surveys for unexplained reasons. Vivint worked with Salesforce to implement "deterministic triggers" to ensure consistent survey delivery.
wtf? Why ever use AI to send out a survey, something you probably already had fully automated beforehand. 'deterministic triggers' is a euphemism for regular scripted automation like 'clicking done on a ticket triggers an e-mail for feedback', which we've had for decades.
All of us were more confident about large language models a year ago," Parulekar stated, revealing the company's strategic shift away from generative AI toward more predictable "deterministic" automation in its flagship product, Agentforce.
Salesforce moving back from fully embracing llms, towards regular automation. I think this is symptomatic in diy enthusiasm too: there is likely an existing 'regular' automation that helps more.
On AI agents, and the engineering to get one going. A few things stand out at first glance: frames it as the next hype (Vgl plateau in model dev), says it's for personal tools (doesn't square w hype which vc-fuelled, personal tools not of interest to them), and mentions a few personal use cases. e.g. automation, vgl [[Open Geodag 20241107100937]] Ed Parsons of Google AI on the same topic.
I believe the final policy shall contain robust rationale and, in the best way possible, avoids the perception of rAIcial discrimination
“You have to assume that things can go wrong,” shared Waymo’s head of cybersecurity, Stacy Janes. “You can’t just design for this success case – you have to design for the worst case.”
Future proofing by asking "what if we're wrong?"
Hoffman, R., Mueller, S., Klein, G., & Litman, J. (2021). Measuring Trust in the XAI Context. PsyArXiv. https://doi.org/10.31234/osf.io/e3kv9
Side note: When I flagged yours as a dupe during review, the review system slapped me in the face and seriously accused me of not paying attention, a ridiculous claim by itself since locating a (potential) dupe requires quite a lot of attention.
Yes, autoexpect is a good tool, but it is used just to automatically create TCL-expect scripts, by watching for user. So it’s can be equal to writing expect-scripts by hand.
You can now distribute your add-on. Note, however, that your add-on may still be subject to further review, if it is you’ll receive notification of the outcome of the review later.
that can be partially automated but still require human oversight and occasional intervention
but then have a tool that will show you each of the change sites one at a time and ask you either to accept the change, reject the change, or manually intervene using your editor of choice.
Overestimating robots and AI underestimates the very people who can save us from this pandemic: Doctors, nurses, and other health workers, who will likely never be replaced by machines outright. They’re just too beautifully human for that.
Yes - we used to have human elevator operators and telephone operators that would manually connect your calls. We now have automated check-out lines in stores and toll booths. In the future, we will have automated taxis and, yes, even some automated health care. Automated healthcare will enable better healthcare coverage with the same number of healthcare workers (or the same level of coverage with fewer workers). There can be good things or bad things about it - the way we do it will absolutely matter. We just need to think through how best to obtain the good without much of the bad ... rather than assuming it wont ever happen.
the demand for products will keep climbing as well, as we’re seeing with this hiring bonanza.
Probably not. The increase in demand is a result of the social-distancing and the hoarding. This is not a steady state. The demand for many things will return to normal (or below) once people figure out what they are using and what is still available. For example - you don't use that much more toilet paper when you are at home ... but you buy more if you don't know when it will be available again.
Last week, Amazon officials announced that in response to the coronavirus they were hiring 100,000 additional humans to work in fulfillment centers and as delivery drivers, showing that not even this mighty tech company can do without people.
Amazon has adopted automation in a very big and increasing way. Just because it has not automated everything yet, doesn't mean that complete automation isn't possible. We already know automated delivery is in the works. Amazon, Uber and Google are all working on the details of autonomous navigation ... and the ultimate result will absolutely impact future drivers (pun intended).
Why haven’t the machines saved us yet?
because machines don't buy tickets to fly on planes and vacation on cruise ships.
And that’s all because of the vulnerabilities of the human worker.
It has more to do with the vulnerabilities of the human traveler and the human guest (and less to do with the workers). The demand for these services has simply gone down while people try to avoid spreading the virus.