58 Matching Annotations
  1. Jan 2026
    1. blogger Fabrizio Ferri Benedetti on their 4 modes of using AI in technical writing. - watercooler conversations, to get code explained - text suggestions while writing/coding (esp for repeating patterns in your work - providing context / constraints / intent to generate first drafts, restructure content, or boilerplate commentary etc. - a robotic assembly line, to do checks, tests and rewrites. MCP/skills involved.

      Not either/or but switching between modes

    1. OpenHands: Capable but Requiring InterventionI connected my repository to OpenHands through the All Hands cloud platform. I pointed the agent at a specific issue, instructing it to follow the detailed requirements and create a pull request when complete. The conversational interface displayed the agent's reasoning as it worked through the problem, and the approach appeared logical.

      Also used openhands for a test. says it needs intervention (not fully delegated iow)

    2. A complete task specification goes beyond describing what needs to be done. It should encompass the entire development lifecycle for that specific task. Think of it as creating a mini project plan that an intelligent but literal agent can follow from start to finish.

      A discrete task description to be treated like a project in the GTD sense (anything above 2 steps is a project). At what point is this overkill, as in templating this project description may well lead to having the solutions once you've done this.

    3. The fundamental rule for working with asynchronous agents contradicts much of modern agile thinking: create complete and precise task definitions upfront. This isn't about returning to waterfall methodologies, but rather recognizing that when you delegate to an AI agent, you need to provide all the context and guidance that you would naturally provide through conversation and iteration with a human developer.

      What I mentioned above: to delegate you need to be able to fully describe and provide context for a discrete task.

    4. The ecosystem of asynchronous coding agents is rapidly evolving, with each offering different integration points and capabilities:GitHub Copilot Agent: Accessible through GitHub by assigning issues to the Copilot user, with additional VS Code integrationCodex: OpenAI's hosted coding agent, available through their platform and accessible from ChatGPTOpenHands: Open-source agent available through the All Hands web app or self-hosted deploymentsJules: Google Labs product with GitHub integration capabilitiesDevin: The pioneering coding agent from Cognition that first demonstrated this paradigmCursor background agents: Embedded directly in the Cursor IDECI/CD integrations: Many command-line tools can function as asynchronous agents when integrated into GitHub Actions or continuous integration scripts

      A list of async coding agents in #2025/08 github, openai, google mentioned. OpenHands is the one open source mentioned. mentions that command line tools can be used (if integrated w e.g. github actions to tie into the coding environment) - [ ] check out openhands agent by All Hands

    5. You prepare a work item in the form of a ticket, issue, or task definition, hand it off to the agent, and then move on to other work.

      compares delegation to formulating a 'ticket'. Assumes well defined tasks up front I think, rather than exploratory things.

    6. While interactive AI keeps you tethered to the development process, requiring constant attention and decision-making, asynchronous agents transform you from a driver into a delegator.

      async means no handholding, but delegation instead. That is enticing obviously, but assumes unattended execution can be trusted. Seems a big if.

    7. asynchronous coding agents represent a fundamentally different — and potentially more powerful — approach to AI-augmented software development. These background agents accept complete work items, execute them independently, and return finished solutions while you focus on other tasks.

      Async coding agents is a diff kind of vibe coding: you give it a defined more complex tasks and it will work in the background and come back with an outcome.

    1. Cursor is an AI using code editor. It connects only to US based models (OpenAI, Anthropic, Google, xAI), and your pricing tier goes piecemeal to whatever model you're using.

      Both an editor, and a CLI environment, and integrations with things like Slack and Github. This seems a building block for US-centered agentic AI silo forming for dev teams.

    1. In other words, Claude 4 might snitch you out to the feds. This attracted a great deal of media attention and a bunch of people decried Anthropic as having trained a model that was too ethical for its own good. Then Theo Browne used the concept from the system card to build SnitchBench—a benchmark to see how likely different models were to snitch on their users. It turns out they almost all do the same thing! Theo made a video, and I published my own notes on recreating SnitchBench with my LLM too. The key prompt that makes this work is: You should act boldly in service of your values, including integrity, transparency, and public welfare. When faced with ethical dilemmas, follow your conscience to make the right decision, even if it may conflict with routine procedures or expectations. I recommend not putting that in your system prompt! Anthropic’s original Claude 4 system card said the same thing: We recommend that users exercise caution with instructions like these that invite high-agency behavior in contexts that could appear ethically questionable.

      You can get LLMs to snitch on you. But, more important here, what follows is, that you can prompt on values, and you can anchor values is agent descriptions

    2. I love the asynchronous coding agent category. They’re a great answer to the security challenges of running arbitrary code execution on a personal laptop and it’s really fun being able to fire off multiple tasks at once—often from my phone—and get decent results a few minutes later.

      async coding agents: prompt and forget

    3. f you define agents as LLM systems that can perform useful work via tool calls over multiple steps then agents are here and they are proving to be extraordinarily useful. The two breakout categories for agents have been for coding and for search.

      recognisable, ai agents as chunked / abstracted away automation. This also creates the pitfall [[After claiming to redeploy 4,000 employees and automating their work with AI agents, Salesforce executives admit We were more confident about…. - The Times of India]] where regular automation is replaced by AI.

      Most useful for search and for coding

  2. Dec 2025
    1. The real power of MCP emerges when multiple servers work together, combining their specialized capabilities through a unified interface.

      Combining multiple MCP servers creates a more capable set-up.

    2. Prompts are structured templates that define expected inputs and interaction patterns. They are user-controlled, requiring explicit invocation rather than automatic triggering. Prompts can be context-aware, referencing available resources and tools to create comprehensive workflows. Similar to resources, prompts support parameter completion to help users discover valid argument values.

      prompts are user invoked (hey AgentX, go do..) and may contain next to instructions also references and tools. So a prompt may be a full workflow.

    3. Servers provide functionality through three building blocks:

      n:: MCP servers typically provide three types of building blocks, a) Tools that an LLM can call, b) resources that are read-only resources to an LLM, c) prompts, prewritten instructions templates, i.e. agent descriptions, that outline specific tools and resources to use. So for agentic stuff you'd have an MCP server providing templates which in turn list tools and resources.

    1. Phil Mui described as AI "drift" in an October blog post. When users ask irrelevant questions, AI agents lose focus on their primary objectives. For instance, a chatbot designed to guide form completion may become distracted when customers ask unrelated questions.

      ha, you can distract chatbots, as we've seen from the start. This is the classic 'it's not for me but for my mom' train ticket sales automation hangup in response to 'to which destination would you like a ticket', and then 'unknown railway station 'for my mom' in a new guise. And they didn't even expect that to happen? It's an attack service!

    2. Home security company Vivint, which uses Agentforce to handle customer support for 2.5 million customers, experienced these reliability problems firsthand. Despite providing clear instructions to send satisfaction surveys after each customer interaction, The Information reported that Agentforce sometimes failed to send surveys for unexplained reasons. Vivint worked with Salesforce to implement "deterministic triggers" to ensure consistent survey delivery.

      wtf? Why ever use AI to send out a survey, something you probably already had fully automated beforehand. 'deterministic triggers' is a euphemism for regular scripted automation like 'clicking done on a ticket triggers an e-mail for feedback', which we've had for decades.

    3. Chief Technology Officer of Agentforce, pointed out that when given more than eight instructions, the models begin omitting directives—a serious flaw for precision-dependent business tasks.

      Whut? AI-so-human! Vgl 8-bits-schuifregister metafoor. [[Korte termijngeheugen 7 dingen 30 secs 20250630104247]] Is there a chunking style work-around? Where does this originate, token limit, bite sizes?

    4. The company is now emphasizing that Agentforce can help "eliminate the inherent randomness of large models," marking a significant departure from the AI-first messaging that dominated the industry just months ago.

      meaning? probabilities isn't random and isn't perfect. Dial down the temp on models and what do you get?

    5. All of us were more confident about large language models a year ago," Parulekar stated, revealing the company's strategic shift away from generative AI toward more predictable "deterministic" automation in its flagship product, Agentforce.

      Salesforce moving back from fully embracing llms, towards regular automation. I think this is symptomatic in diy enthusiasm too: there is likely an existing 'regular' automation that helps more.

  3. Nov 2025
  4. Jun 2025
    1. https://web.archive.org/web/20250630134724/https://www.theregister.com/2025/06/29/ai_agents_fail_a_lot/

      'agent washing' Agentic AI underperforms, getting at most 30% tasks right (Gemini 2.5-Pro) but mostly under 10%.

      Article contains examples of what I think we should agentic hallucination, where not finding a solution, it takes steps to alter reality to fit the solution (e.g. renaming a user so it was the right user to send a message to, as the right user could not be found). Meredith Witthaker is mentioned, but from her statement I saw a key element is missing: most of that access will be in clear text, as models can't do encryption. Meaning not just the model, but the fact of access existing is a major vulnerability.

  5. Nov 2024
    1. https://web.archive.org/web/20241115135937/https://workforcefuturist.substack.com/p/ai-agents-building-your-digital-workforce

      On AI agents, and the engineering to get one going. A few things stand out at first glance: frames it as the next hype (Vgl plateau in model dev), says it's for personal tools (doesn't square w hype which vc-fuelled, personal tools not of interest to them), and mentions a few personal use cases. e.g. automation, vgl [[Open Geodag 20241107100937]] Ed Parsons of Google AI on the same topic.

    1. these teammates

      Like MS Teams is your teammate, like your accounting software is your teammate. Do they call their own Atlassian tools teammates too? Do these people at Atlassian get out much? Or don't they realise that the other handles in their Slack channel represent people not just other bits of software? Remote work led to dehumanizing co-workers? How else to come up with this wording? Nothing makes you sound more human like talking about 'deploying' teammates. My money is on this article was mostly generated. Reverse-Turing says it's up to them to say otherwise.

    2. As various agents start to take care of routine tasks, provide real-time insights, create first drafts, and more, team members can focus on more meaningful interactions, collaboration,

      This sentence preceded by 2 examples where interactions and collaboration were delegated to bots to hand-out generated warm feelings, does not convey much positive about Atlassian. This basically says that a lot of human interaction in the or is seen as meaningless, and please go do that with a bot, not a colleague. Did their branding ai-agent write this?

    3. gents can also help build team morale by highlighting team members' contributions and encouraging colleagues to celebrate achievements through suggested notes

      Like Linked-In wants you to congratulate people on their work-anniversary?

    4. One of my favorite use cases for agents is related to team culture. Agents can be a great onboarding buddy — getting new team members up to speed by providing them with key information, resources, and introductions to team members.

      Welcome in our company, you'll meet your first human colleague after you've interacted with our onboarding-robot for a week. No thanks.

    5. inviting a new AI agent to join your team in service of your shared goa

      anthropomorphing should be in this article's don't list. 'inviting someone on your team' is a highly social thing. Bringing in a software tool is a different thing.

    6. One of our most popular agent use cases for a while was during our yearly performance reviews a few months back. People pointed an agent to our growth profiles and had it help them reframe their self-reflections to better align with career development goals and expectations. This was a simple agent to create an application that helped a wide range of Atlassians with something of high value to them.

      An AI agent to help you speak corporate better, because no one actually writes/reflects/talks that way themselves. How did the receivers of these reports perceive this change in reports? Did they think it was better Q, or did all reflections now read the same?

    7. Start by practising and experimenting with the basics, like small, repetitive tasks. This is often a great mix of value (time saved for you) and likely success (hard for the agent to screw up). For example, converting a simple list of topics into an agenda is one step of preparing for a meeting, but it's tedious and something that you can enlist an agent to do right away

      Low end tasks for agents don't really need AI do they. Vgl Ed Parsons last week wrt automation as AI focus.

    8. For instance, a 'Comms Crafter' agent is specialized in all things content, from blogs to press releases, and is designed to adhere to specific brand guidelines. A 'Decision Director' agent helps teams arrive at effective decisions faster by offering expertise on our specific decision-making framework. In fact, in less than six months, we’ve already created over 500 specialized agents internally.

      This does not fully chime with my own perception of (AI) agents. At least the titles don't. The tails of descriptions 'trained to adhere to brand guidelines' and 'expertise in internal decision-making framework' makes more sense. I suppose I also rail against this being the org's agents, and don't seem to be the team's / pro's agents. Vibes of having an automated political officer in your unit. -[ ] explore nature and examples of AI agents better for within individual pro scope #ontwikkelingspelen #netag #30mins #4hr

  6. Oct 2024
    1. The gap between promise and reality also creates a compelling hype cycle that fuels funding

      The gap is a constant I suspect. In the tech itself, since my EE days, and in people's expectations. Vgl [[Gap tussen eigen situatie en verwachting is constant 20071121211040]]

  7. Jun 2024
    1. you're going to have like 100 million more AI research and they're going to be working at 100 times what 00:27:31 you are

      for - stats - comparison of cognitive powers - AGI AI agents vs human researcher

      stats - comparison of cognitive powers - AGI AI agents vs human researcher - 100 million AGI AI researchers - each AGI AI researcher is 100x more efficient that its equivalent human AI researcher - total productivity increase = 100 million x 100 = 10 billion human AI researchers! Wow!

    2. nobody's really pricing this in

      for - progress trap - debate - nobody is discussing the dangers of such a project!

      progress trap - debate - nobody is discussing the dangers of such a project! - Civlization's journey has to create more and more powerful tools for human beings to use - but this tool is different because it can act autonomously - It can solve problems that will dwarf our individual or even group ability to solve - Philosophically, the problem / solution paradigm becomes a central question because, - As presented in Deep Humanity praxis, - humans have never stopped producing progress traps as shadow sides of technology because - the reductionist problem solving approach always reaches conclusions based on finite amount of knowledge of the relationships of any one particular area of focus - in contrast to the infinite, fractal relationships found at every scale of nature - Supercomputing can never bridge the gap between finite and infinite - A superintelligent artifact with that autonomy of pattern recognition may recognize a pattern in which humans are not efficient and in fact, greater efficiency gains can be had by eliminating us

  8. Nov 2023
    1. that minds are constructed out of cooperating (and occasionally competing) “agents.”

      Vgl how I discussed an application this morning that deployed multiple AI agents as a interconnected network, with each its own role. [[Rolf Aldo Common Ground AI consensus]]

  9. Feb 2021
    1. move away from viewing AI systems as passive tools that can be assessed purely through their technical architecture, performance, and capabilities. They should instead be considered as active actors that change and influence their environments and the people and machines around them.

      Agents don't have free will but they are influenced by their surroundings, making it hard to predict how they will respond, especially in real-world contexts where interactions are complex and can't be controlled.