91 Matching Annotations
  1. Oct 2023
    1. a binary point in the middle

      It seems that clicking "656.515" generates the number 646.515 instead.

  2. Sep 2022
    1. I will also make the case that modesty—the part of this process where you go into an agonizing fit of self-doubt—isn’t actually helpful for figuring out when you might outperform

      This seems highly dependent on the individual

    2. the overall competence of human civilization is such that we shouldn’t be surprised to find the professional economists at the Bank of Japan doing it wrong.

      This makes sense, but I worry about the way this links with a general rationalist attitude of exasperation and dismissiveness towards the rest of the world.

    3. We have a picture of the world where it is perfectly plausible for an econblogger to write up a good analysis of what the Bank of Japan is doing wrong, and for a sophisticated reader to reasonably agree

      We should still be skeptical a priori that we are sophisticated reader, surely. Eliezer has reason to believe so based on his IQ, successful writing, etc.

    4. start off by following the majority opinion; and then only adopt a different view for good and convincing reasons.

      Majority of informed experts, or of society in general? The latter seems hard to capture on non-political issues, and tainted by aggressive persuasion on political ones.

  3. Jul 2022
    1. While hallucinating, she might “act as she wants unencumbered,” but she could hardly be said to be acting of her own free will.

      Actually, the conclusion - that this is hardly an example of acting of free will - is not obvious to me. When I laugh at a stand-up routine, I am not controlling the audio and visual input my brain receives, just as a hallucinator is not controlling their hallucinations. If the former can be an act of free will, why not the latter?

    1. be those

      "be to those", not "be those"

    2. It’s not the type of power we think of ourselves as having.

      Doesn't this instead point to a lack of power, necessitated by determinism?

    3. that about

      "about", not "that about"

    4. part why

      "part of why", not "part why"

    5. think of all these cases as involving an inconsistency between the policy that an agent would want to adopt, at some prior point in time/from some epistemic position (e.g., before the aliens invade, before we know the value of the X-th digit, before Omega makes her predictions), and the action that Guaranteed Payoffs would mandate given full information.

      A CDT agent might argue that "Guaranteed Payoffs" condition is met even in the original Newcomb case, because in all worlds the payoff is greater. So again it feels as though this just begs the question of causality.

    6. if you understand all of this, and make your decisions in light of full information, additional disputes about what compliments and insults are appropriate don’t seem especially pressing.

      I think that trivializes the point here; if we are applying decision theory once we are in the city, it seems like different suggestions here vary on whether to decide "in light of full information" or to pretend like some implications aren't true.

    7. the only remaining dispute is whether, given these facts, we should baptize the action of paying in the city with the word “rational,” or if we should instead call it “an irrational action, but one that follows from a disposition it’s rational to cultivate

      Doesn't this just depend on whether paying in the city is a juncture at which we apply free will, in the same way that when free will applies is central to Newcomb's problem?

    8. if, in the desert, I could set-up some elaborate and costly self-binding scheme – say, a bomb that blows off my arm, in the city, if I don’t pay — such that paying in the city becomes straightforwardly incentivized, I would want to do it. But if that’s true, we might wonder, why not skip all this expensive faff with the bomb

      CDT would agree here, so I'm not sure why this is presented as a separate framing.

    9. Guaranteed Payoffs: When you’re certain about what the pay-offs of your different options would be, you should choose the option with the highest pay-off.

      Doesn't this just beg the question of whether pay-offs lie only in the future (because of causality) or not?

    10. should you pay him? Both CDT and EDT answer: no. By the time you get to the city, the risk of death in the desert is gone.

      I see why this is decisive for CDT, but why is EDT responsive to "by the time you" arguments in this case, and not in Newcomb's case? Shouldn't acausal influence be at play here regardless?

    11. CDT ignores letters like this. But CDT also gets given termites once a year. EDT, by contrast, pays, and stays termite free.

      Not if CDT thinks it's actions this year will influence the prediction next year.

    12. I predicted that

      "I predicted whether", not "I predicted that"?

    13. paying seem

      "paying seems", not "paying seem"

    14. “you win your”

      “you win your bet”, not “you win your”

    15. Imagine, for example, that I decide to refrain from smoking a million times in a row. If the case’s hypothesized correlations hold, then I will in fact spawn, consistently, without the lesion. In that case, though, it starts to look like my choice of whether to smoke or not actually is exerting a type of “control”

      Again, the word "decide" no longer feels appropriate here, if we hypothesize that the lesion really does causally control things with such certainty.

    16. maybe the right thing to say here is just that the correlations posited in smoking lesion don’t persist under conditions of “play around however you want”

      Aren't those precisely the conditions in which decision theory is interesting/relevant, e.g. conditions in which free will exists?

    17. it turns

      "it turns out", not "it turns"

    18. case a

      "case as a", not "case a"

    19. view just

      "view is just", not "view just"

    20. some via

      "via some", not "some via"

    21. e.g

      "e.g.,", not "e.g,"

    22. give

      "given", not "give"

    23. my conviction about one-boxing start

      "my conviction... starts" or "my convictions... start", not "my conviction... start"

    24. one boxing

      "one-boxing", not "one boxing"

    25. one box

      "one-box", not "one box"

    26. you grandfather

      "your grandfather", not "you grandfather"

    27. a living in

      "living in an", not "a living in"

    28. it prompts CDT

      maybe "it prompts you" instead of "it prompts CDT"?

    29. /or

      seems like this should be "and/or", "/" or "or", but not "/or"

    30. All of three

      "All three", not "All of three"?

    31. does feel

      "does it feel", not "does feel"

    32. that I think

      "I think", not "that I think"

    33. been been

      "been", not "been been"

    34. that shows

      "shows", not "that shows"

    35. these distinctions

      "these are distinctions", not "these distinctions"

    36. one who want

      "wants", not "want"

    37. in an already-painted painting, the future has already been fixed, too: you just don’t know what it is. And when you act, you start to find out. Insofar as you can choose how to act – and per compatibilism, you can – then you can choose what you’re going to find out, and in that sense, influence it.

      Doesn't this just mean compatibilism is wrong? The word "influence" seems completely devoid of meaning in an already-painted painting.

    38. As long as you and your copy’s choice are correlated, CDT is going to ignore that correlation, hold p constant given different actions

      Isn't decision theory useful only insofar as a decision is "free", and therefore by definition uncorrelated?

    39. CDT imagines that we have severed the ties between you and your copy, between you and the history that determines every aspect of you.

      Isn't this what any decision theory does, on the premise that decisions exist in the first place?

    40. you can change the past, here, about as much as you can change the future in a deterministic world: that is, not at all, and enough to matter for practical purposes.

      I agree with "not at all", so how does this matter for practical purposes?

    41. We must distinguish between the ability to “change things” in this sense, and the ability to “control” them in some broader sense.

      What does "control" mean, if not to change things?

    42. If you choose cooperate, it will always have been the case that Monday-Joe + process P outputs cooperate. If you choose defect, it will always have been the case that Monday-Joe + process P outputs defect. In this very real sense – the same sense at stake in every choice in a deterministic world – you get to choose what will have always been the case, even before your choice.

      The word "choose" doesn't belong in a deterministic world.

    43. because only one of (a) or (b) is compatible with the past/the physical laws, and because you are free to choose (a) or (b), it turns out that in some sense, you’re free to choose the past/the physical laws (or, their computational analogs).

      What?

    44. it is genuinely up to you what you write, or do.

      I don't think this is true in the sense most people would assume, given the premise of deterministic twins.

    45. your copy is your puppet. But equally, you are his puppet. But more truly, neither of you are puppets. Rather, you are both free men

      How are you "free"? I feel like, over and over, a semantic trick is going on in this post, where words like "free" and "control" are used in a situation where they clearly do not apply.

    46. it feels like what compels me is a direct, object-level argument, which could be made equally well before the copying or after.

      I disagree. The copying seems significant, in that it bakes into the premise of the situation a completel lack of free will.

    47. if you find yourself reasoning about scenarios where he presses one button, and you press another – e.g., “even if he cooperates, it would be better for me to defect” – then you are misunderstanding your situation. Those scenarios just aren’t on the table.

      In general, this isn't an iron-clad proof of reasoning in the wrong direction. An only slightly-related example: the correct solution to the blue-eyed monk riddle involves considering hypothetical scenarios that everyone knows are off the table (e.g. if it were true that there were only 95 blue-eyed monks...)

    48. you can make him do whatever you want

      To see why this is absurd, consider that the reverse must also be true. We think of "control" as a one-way thing, from puppet master to puppet, so how can it apply in a symmetric situation.

    49. for all intents and purposes, you control what he does.

      This feels like a misuse of the word "control", which, for me at least, has the concept of causality baked into it.

    50. CDT, in this case, defects. After all, your choice can’t causally influence your copy’s choice

      How is any decision theory at all relevant to this situation? The premise, that I am a deterministic system, precludes any "decisions" being made, because free will does not exist. It might still feel like I'm making a decision, but that's an interesting quirk of emergent psychology, not a matter relevant to decision theory.

    1. you get public criticism for doing things and making mistakes, not for failing to do anything at all.

      Perhaps we should try to fix this? Like the "Jeff Bezos did not end world hunger" Twitter account.

    2. the perception of extravagance often has little to do with the amount of money actually being spent.

      In the example of a fancy venue that's discounted by its non-profit owner: this still seems rational, as long as you believe the objection to extravagance is a signal that can propagate up to the foundation that owns the venue (e.g. you should spend your money on other things, or rent your venue to corporate clients at full-price and donate)

  4. Jun 2022
    1. most organizations don't have plans, because I haven't taken the time to personally yell at them.

      Why such fatalism that it's too late to make such plans?

    2. it is still Eliezer Yudkowsky writing up this list, says that humanity still has only one gamepiece

      He seems quite confident that the lack of similar work from others is not due to well-thought-out disagreement.

    3. are (a) people who might not be able to do equally great work away from tight feedback loops, (b) people who chose a field where their genius would be nicely legible

      (b) seems reasonable, since it's easy to observe that effecting change requires power and credentialing is helpful for this.

    4. on tackling its enormous lethal problems.  These problems are in fact out of reach; the contemporary field of AI safety has been selected to contain people who go to work in that field anyways.  Almost all of them are there to tackle problems on which they can appear to succeed and publish a paper

      What other method do we have of getting to a point where the problems are not out of reach?

    5. if you're fighting it in an incredibly complicated domain you understand poorly, like human minds, you should expect to be defeated by 'magic' in the sense that even if you saw its strategy you would not understand why that strategy worked. 

      This is a big extrapolation; seems plausible that there is simply less and less room for magic (e.g. diminishing returns to astonishment) as we progress in our understanding of the universe from medieval superstition to current understanding of physics to whatever an AGI would understand.

    6. Any pivotal act that is not something we can go do right now, will take advantage of the AGI

      How exhaustive has our own human search for such pivotal acts been?

    7. corrigibility runs actively counter to instrumentally convergent behaviors

      I see how this is true for corrigibility of an (otherwise accurate) world model, by why is it true for corrigibility of goals?

    8. CEV specifically, is unworkable because the complexity of what needs to be aligned or meta-aligned for our Real Actual Values is far out of reach for our FIRST TRY

      How do we know this? Do we expect CEV to be wildly far afield from our current average values, which presumably humans can comprehend?

    9. If you perfectly learn and perfectly maximize the referent of rewards assigned by human operators, that kills them.

      I understand how this leads to deception perhaps, but not killing them.

    10. lethal-to-us possibilities exist in some possible environments underlying every given sense input.

      Because, for example, webcam data can be hacked.

    11. When an outer optimization loop actually produced general intelligence, it broke alignment after it turned general, and did so relatively late in the game of that general intelligence accumulating capability and knowledge

      It's true that not all individual humans currently alive are optimizing for reproductive fitness, but aren't we still in a regime where humans who have more kids will indeed become dominant (e.g. the population of Mormons relative to the rest of the world will be substantial given current trends)?

    12. around half of the alignment problems of superintelligence will first naturally materialize after that one first starts to appear.

      Can we name what these are, or are we just assuming the existence of unknown unknowns?

    13. if you're starting with an unaligned system and labeling outputs in order to get it to learn alignment, the training regime or building regime must be operating at some lower level of intelligence*capability

      Is the thought here that a sufficiently detailed simulation to train in is not feasible or would necessarily contain sentient simulated humans whose fate we then care about?

    14. proposals for alignment fall apart as soon as you ask "How could you use this to align a system that you could use to shut down all the GPUs in the world?"

      This seems drastic and prone to the unilateralists curse

    15. a

      "with a", not "a"

    1. You cannot dissociate intelligence from the context in which it expresses itself.

      Sure, but artificial intelligence would presumably also have ways to sense and receive data from it's environment - how is this relevant?

    2. all intelligent systems we know are highly specialized.

      Human intelligence seems very general; our ancestral environment did not have cars and yet teenagers can learn to drive them.

    1. This argument has the flaw of potentially conveying the beliefs of ‘reduce AI and bio x-risk’ without conveying the underlying generators of cause neutrality and carefully searching for the best ways of doing good.

      It seems like talking about cause neutrality only makes this stronger; when dealing with something as weird as human extinction, we need people to think about unfamiliar conceptions of what "doing good" means that might not be as viscerally motivating as, say, helping the homeless, and cause neutrality comes in very handy for this.

    2. when I look for technologies emerging now, still in their infancy but with a lot of potential, AI and synthetic biology stand well above the rest.

      Factory farming seems on par with nuclear weapons as a technology allowing an unprecedented amount of suffering; it would have been extremely high-leverage to prevent it from ever emerging, so it seems an innovation that could stop it from continuing would be hugely valuable.

  5. May 2022
    1. "what the hell do you then?"

      perhaps "what the hell do you do then?"

    2. of of

      "of", not "of of"

    3. facebook

      "Facebook", not "facebook"

    4. with show up on street corners with clipboards

      perhaps "with clipboards show up on street corners"

    5. is

      "its", not "is"

  6. Apr 2022
    1. One thinks that moral claims such as “X is wrong” just mean the same thing as, e.g., “X reduces net happiness,” and she thinks that some of these claims are true. Another thinks that moral claims such as “X is wrong” refer to an irreducible property of wrongness

      How are these different? Isn't the irreducible property of wrongness just "forfeited possible happiness" in the first case?