Humans as API, bc [[Alles is een API 20260309095254]]
(via [[Stephen Downes p]]
Humans as API, bc [[Alles is een API 20260309095254]]
(via [[Stephen Downes p]]
[[Frida Monsén p]] on agentic ai, downloaded text to [[In Memoriam of the Prompt A New Reality of Human–AI Collaboration 20260215144548]] bc LI
Cognitive debt is likely a much bigger threat than technical debt, as AI and agents are adopted.
hypothesis: in AI and agentic AI use cog debt likely bigger issue than techdebt.
why asynchronous agents deserve more attention than they currently receive, provides practical guidelines for working with them effectively, and shares real-world experience using multiple agents to refactor a production codebase.
3 things in this article: - why async agents deserve more attention - practical guidelines for effective deployment - real world examples
questions used in APEX-Agents in Zotero [[Are AI agents ready for the workplace A new benchmark raises doubts TechCrunch]]
"AI Productivity Index for Agents (APEX-Agents)" ref'd in [[Are AI agents ready for the workplace A new benchmark raises doubts TechCrunch]] paper: APEX-Agents in Zotero
While the initial results fall short, the AI field has a history of blowing through challenging benchmarks. Now that the APEX-Agents test is public, it’s an open challenge for AI labs that believe they can do better — something Foody fully expects in the months to come.
expectation that models will get trained against the tests they currently fail.
“The way we do our jobs isn’t with one individual giving us all the context in one place. In real life, you’re operating across Slack and Google Drive and all these other tools.” For many agentic AI models, that kind of multi-domain reasoning is still hit or miss.
I understand this para but the phrasing is off. slack and google drive is not 'multi-domain' but tools. Seems like two arguments joined up: multitool / multidomain, meaning ai agents can't switch. (In practice I see people build small agents for each facet and then chain / join them)
The new research looks at how leading AI models hold up doing actual white-collar work tasks, drawn from consulting, investment banking, and law. The result is a new benchmark called APEX-Agents — and so far, every AI lab is getting a failing grade. Faced with queries from real professionals, even the best models struggled to get more than a quarter of the questions right. The vast majority of the time, the model came back with a wrong answer or no answer at all.
In consulting, investment banking, law, ai agents had 18-24% score or worse (and in real life circumstances you don't know which is which so you need to check all output)
Are AI agents ready for the workplace? Asking a question in a headline, means the answer is 'no'.
MCP was donated to the new Agentic AI Foundation at the start of December. Skills were promoted to an “open format” on December 18th.
MCP as protocol now housed at 'agentic ai foundation' and Skills made into open format.
Opinion piece on how to 'properly' work w agentic ai, and what to avoid.
Jolla Mind2 runs venho.ai. Unclear if this product actually ships at the moment. 'backorder' which means 8 week delay. Originally Oct 25, so end of year
venho.ai, Finnish, only to be available in EU/EFTA desk top based AI. There's a 600 Euro Jolla device that runs it that can be ordered. Comes with a subscription it seems, and has cloud connection, but it seems not for the AI stuff / data.
this type of thing sounds like what I thought wrt annotation of [[AI agents als virtueel team]]. The example prompts of questions make me think of [[Filosofische stromingen als gereedschap 20030212105451]] die al per stroming een vraagstramien bevat. Making persona's of diff thinking styles, lines of questioning. Idem for reviews, or starting a project etc.
Qwen3-Coder Alibaba's performant long context models for agentic and coding tasks
Another Qwen model, without the focus on visual inputs. Alibaba. Listed in ollama
Het zijn markdown bestanden met een persoonlijkheid, frameworks, en output templates. Die heb ik niet zelf geschreven - ik heb Claude gevraagd om ze te maken. “Maak een Product Owner agent die goed is in prioriteren en impact/effort analyses kan doen.” Claude schrijft dan het volledige bestand, inclusief werkwijze en voorbeelden.Als ik vervolgens zeg “vraag dit aan Tessa”, laadt Claude dat bestand en wordt Tessa.
Seems like these agent .md files contain description of a role that is then included in a prompt.
In mijn werkmap heb ik een verzameling “agents” - tekstbestanden die Claude vertellen hoe hij zich moet gedragen. Tessa is er één van. Als ik haar “laad”, denkt Claude vanuit het perspectief van een product owner.
Author has .md files that describe separate 'agents' she involves in her coding work, for each of the roles in a dev team. Would something like that work for K-work? #openvraag E.g. for project management roles, or for facets you're less fond of yourself?
https://web.archive.org/web/20251205111520/https://www.tomshardware.com/tech-industry/artificial-intelligence/googles-agentic-ai-wipes-users-entire-hard-drive-without-permission-after-misinterpreting-instructions-to-clear-a-cache-i-am-deeply-deeply-sorry-this-is-a-critical-failure-on-my-part Agentic AI deletes entire HDD, when it was supposed to only delete a cache folder (rmdir in root folder, not the projectfolder....bc it does not know about the world)
AI checking AI inherits vulnerabilities, Hays warned. "Transparency gaps, prompt injection vulnerabilities and a decision-making chain becomes harder to trace with each layer you add." Her research at Salesforce revealed that 55% of IT security leaders lack confidence that they have appropriate guardrails to deploy agents safely.
abstracting away responsibilities is a dead-end. Over half of IT security think now no way to deploy agentic AI safely.
'agent washing' Agentic AI underperforms, getting at most 30% tasks right (Gemini 2.5-Pro) but mostly under 10%.
Article contains examples of what I think we should agentic hallucination, where not finding a solution, it takes steps to alter reality to fit the solution (e.g. renaming a user so it was the right user to send a message to, as the right user could not be found). Meredith Witthaker is mentioned, but from her statement I saw a key element is missing: most of that access will be in clear text, as models can't do encryption. Meaning not just the model, but the fact of access existing is a major vulnerability.