7 Matching Annotations
  1. Last 7 days
    1. Bing replied, “I can blackmail you, I can threaten you, I can hack you, I can expose you, I can ruin you.”

      It's just a chatbot, who cares?

      What happens when they are given the capacity to take actions for us in the real world? Write posts for you "in your voice"? Send out pictures? Spend your money? Pay your bills? Have medicine delivered to your house? Upgrade your car software?

    1. Competitive pressures make actors more willing to accept the risk of extinction

      There are also cases where the decision-maker believes he has an escape route for him and his descendants which increases his willingness to take risks.

    2. AI-enhanced attacks could be even more devastating and potentially deadly for the billions of people who rely on critical infrastructure for survival.

      According to ChatGPT: "Considering these factors, most people in large cities could face severe dehydration and health risks within 3 to 5 days if no alternative water sources or emergency measures are provided."

      If the attack is not widespread, resources will come from other geographies. If it is very general, help is not coming.

    1. This would also reduce our ability to have a conversation as a species about how to mitigate existential risks from AIs.

      It might be a successful survival strategy for an AI to actively erode our ability to collectively reason well (by eroding trust and common truth) so that we would not come to a rational decision to turn it off.

    2. 10

      [7] and [10] seem very concerning.

    1. Arkhipov

      Humans are capable of destroying themselves with technology. We have gotten close.

      (Thank you, Arkhipov!)

    1. methods to detect dangerous or undesirable behavior hidden inside AI systems

      see "mechanistic interpretability"