73 Matching Annotations
  1. Jul 2024
    1. Fortunately, better benchmarks reflecting consequential real-world tasks are being developed.

      Usually benchmarks are quickly followed by AIs beating that benchmark because of increased interest. Therefore, having a benchmark for danger, is not necessarily something to be celebrated.

    2. There are many reasons why risk estimates may be systematically inflated

      You are now assigning your own probability range on P(doom), after writing half a post on how such ranges cannot be trusted. Do we know or not? Which is it snake oil man?

    3. To the extent that we put any stock into these estimates, it should be the forecasting experts’ rather than the AI experts’ estimates.

      You stated that forecasting experts have little to go with during deductive reasoning. Something like mesa-optimisation, goal misgeneralization or adversarial attacks only really make sense if you know the literature, and you hardly expect a relative layman to anticipate such failure modes.

      Please cite so we can access the contexts in which superforecasters can be expected to outperform domain experts.

    4. long-term forecasts won’t be resolved anytime soon

      This is also the point where you should have mentioned that nobody gets payed when everyone dies. Prediction markets cannot access X-risk well, because there is only merit for the people predicting no risk, hence there is a big bias to predict too low.

    5. the rational thing to do is to go with the higher end of their range of estimates.

      Being mistaken about reality is very irrational. Acting on expected value when there is a lot of expected value (loss) in tail probabilities is the rational thing to do.

    6. here is little evidence that can change one’s beliefs one way or another when it comes to AI x-risk

      Famously, everyones timelines shortened by decades at the release of GPT-3.

    7. it is simply empirically undetectable.

      The author found a single metric for which the difference between two forecasters is relatively small.

      It's a bit early to say the entire problem of finding differences between two forecasters (if they only disagree about tail probabilities) is intractable.

    8. the 75th percentile AI expert forecast and the 25th percentile superforecaster forecast differ by at least a factor of 100

      there is some merit to this point, but it needs to be said that people tend not work well with probabilities near 0 or near 1 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8043721/

      I know that some of the people here were superforecasters, but some aren't and isn't it also hard to know whether someone is well callibrated on their 0.25%

    9. forecasts are guaranteed to be guesses rather than the output of a model — after all, no model can be used to estimate the probability that the model itself is wrong, or what the risk would be if the model were wrong.

      This explains too much and would only suit in a blog arguing that forecasting should never be used to inform policy.

    10. These assumptions are far more tenuous than those involved in asteroid modeling

      Yes, we are magnitudes more sure about P(astroid) than P(doom).

    11. AI risk is sufficiently different that quantitative estimates lack the kind of justification needed for legitimacy in policymaking.

      This conclusion is based off of a single source. It also wasn't compared to any criterea of when something is sufficiently justified, though maybe I'm asking for too much from what is just a blog post.

    12. None of those tell us anything about the possibility of developing superintelligent AI or losing control over such AI, which are the central sources of uncertainty for AI x-risk forecasting.

      There are reference classes for that too:

      We have never seen animal species domesticate other species of higher intelligence, for a fairly broad space of definitions for what intelligence is.

      The possibillity of developing superintelligent AI is arguably easier. We can just look at other goals the AI community attempted to achieve and see whether they managed.

    13. ust look at the attempts to find reference classes for AI x-risk: animal extinction (as a reference class for human extinction), past global transformations such as the industrial revolution (as a reference class for socioeconomic transformation from AI), or accidents causing mass deaths (as a reference class for accidents causing global catastrophe).

      alright, great citation

    14. For existential risk from AI, there is no reference class, as it is an event like no other. To be clear, this is a matter of degree, not kind.

      So with "no" they actually mean "relatively well fitting"?

      I think past extinction events serve as a decent reference class.

      The AI-foom debate between Yudkowsky and Hanson dives into reference classes for a bit.

    15. There are basically only three known ways by which a forecaster can try to convince a skeptic: inductive, deductive, and subjective probability estimation.

      A source would have been nice here. I'm sure someone said theis at some point, and perhaps for valid reasons. I'm just curious as to what those reasons are and their scope.

    16. The main aim of this essay is analyzing whether there is any justification for any of the specific x-risk probability estimates that have been cited in the policy debate.

      "any justification" is a really low bar

      I'm getting the sense that I shouldn't take this literally, though I lament that the "main aim" is not made explicit.

    17. A good example is restricting open releases of AI models. Can governments convince people and companies who stand to benefit from open models that they should make this sacrifice because of a speculative future risk?

      Very bad example. You are exploring whether it is beneficial for people to have access to open source models. You need to provide evidence first, before you state that people stand to gain from open source models! (I thought this was going to be evidence-based?)

      inb4: "No it's a good example because they state 'people that stand to benefit' " No, we are asking whether there even exist more than zero people who stand to benefit from open source AI.

    18. we have a strong cognitive bias to view quantified risk estimates as more valid than qualitative ones

      fair, I think people will need to get over this somewhat if we're ever going to use prediction markets more though

    19. You would ask to see our evidence. As obvious as this may seem, it seems to have been forgotten in the AI x-risk debate that probabilities carry no authority by themselves.

      Evidence only means "support for a proposition" It does not necessarily mean, "past measurements" or "proof". This sentence immediately jumps to "this seems to have been forgotten in AI x-risk debate" which, considering how broad "evidence" is, is very dismissive.

      I'm starting to doubt the authors were ever searching for the truth in their research leading up to this.

    20. If the two of us predicted an 80% probability of aliens landing on earth in the next ten years, would you take this possibility seriously? Of course not.

      Aliens are brought up for the second time now. A sad state of affairs in AIS discourse is the fact that much is being dismissed simply because it sounds like science fiction.

      The author's aren't explicitly doing that, but I am growing the idea they are doing this implicitly.

    21. unreliable to be useful for policy, and in fact highly misleading.

      "misleading" implies that you know the direction with regards to where they go wrong. I think they will make a case of "we cannot know" and also "I know better".

    22. An estimate of 10% over a few decades, for example, would obviously be high enough for the issue to be a top priority for society.

      ... well the field only exists for 10 years or so, if people thought about this a 100 years earlier, only then would you be able to get multiple decades of probabilities. Also prediction markets are very new, this bar is unreasonably high. I criticise this, since anchoring is a known human bias, and if you anchor people on this point, they'll be dissuaded from any probability-based approach whatsoever.

    23. The AI safety community relies heavily on forecasting the probability of human extinction due to AI (in a given timeframe) in order to inform decision making and policy.

      That's controversial. Yudkowsky does not name a number with intention and within rationalist circles it's very much encouraged to "go with your gut" after naming the numbers. It is concensus that any p(doom) has high error bars and if you claim yours does not, you're in the minority.

    24. evidence-based approach

      I googled the word to be sure, but "evidence-based" says very little. It only promises that you supply support for the statement, which is so low of a bar that it makes me anxious thinking about the alternative.

    25. existential risks (x-risks) are necessarily somewhat speculative: by the time there is concrete evidence, it may be too late

      AIS debate is not that great that this point always comes up. I'm grateful it already did early on.

    26. I learned about this through twitter through David Krueger's criticism.

      I was also looking to write a post where I explain my beliefs by contrasting them against others.

      I strongly disagree with this title.

      So you can say I am not coming in neutral, though I will try to fight my confirmation bias during my read.

    27. given the lack of consensus

      As late as 1960 did only 1 / 3 doctors believe that cigarettes were harmful, even though evidence for that came out in the 1940s already.

      Lack of concensus can arise for numerous cases, not all of which are valid.

    1. Latent variables (ztrain) areextracted and concatenated for the first 31 layers of the hierarchy by passing training images (Ytrain) into thepretrained VDVAE Encoder. A ridge regression model (Regressor) is trained between fMRI patterns ( Xtrain)and corresponding latent variables (ztrain). Testing Stage (right). Test fMRI data ( Xtest ) are passed throughthe trained Regressor to obtain predicted latent variables ( ˆztest ). These predicted latent variables are fed to thepretrained VDVAE Decoder to get the low-level reconstruction ( ˆYlow ) of the test images (Ytest ), which will serveas a sort of “initial guess” for the second stage. Note that all VDVAE layers (encoder and decoder blocks) arepretrained and frozen, only the brain-to-latent regression layer (blue box) is trained.

      The VDVAE comes pre-trained. This seems plausible, it seems plausible to have a general image-reconstruction VDVAE. And if not, they could train it on the dataset. The only thing of note is the regressor, which seems to be a linear regressor. I wonder why they use this, and not just feed images through the VDVAE.

    Annotators

  2. Dec 2023
    1. However, theseabstractions are not directly learned, but emerge as a by-product of minimizing task losses. Therefore,the quality of these abstractions cannot be guaranteed, depriving neural networks of the benefits thatreal abstractions would offer.

      There is a long history of AI models with hard-coded grammar rules that got massively outperformed with transformers who have no prior knowledge about such rules.

    2. So how is human intelligence formed? This remains an open question, and we hypothesize thatthere are at least three key elements.

      My highest trust to statements like these are for neurologists, and I do not believe neurologists are this far to understanding the formation of human intelligence. The lack of citations here is damning. These researchers did not start off asking the right question.

    3. Specifically, we propose a partitionstructure that contains pre-allocated abstraction neurons; we formulate abstrac-tion learning as a constrained optimization problem, which integrates abstractionproperties; we develop a network evolution algorithm to solve this problem

      The/A bitter lesson DL should have taught most by now is that hardcoding desirable properties tends to work worse than letting those emerge. Without reading further I think this paper will be forgotten by time, if it isn't already.

    4. This does not seem to pretend to be about alignment. A ctrl+f on safety or alignment or risk yields nothing. I have a high P on this not belonging on this site.

  3. Nov 2023
    1. level

      It's unclear what "interpretability level" means. From the previous context, I would expect that to mean the step number, but HEM seems to prescribe the application of every step, but choose a specific method per step. It feels weird however to consider the methods per step to have a hierarchy such that the word "level" can be accurately ascribed.

    2. the decision-making process

      Which decision-making process? The use-cases for HEM are unknown to me. Is the successful application of HEM (how do you measure success?) sufficient for having your stock-trading bot trade with hundreds of dollars unsupervised? Probably not. Is it a way to evaluate whether some LLM can be released? HEM could incidentally find biases present in LLM output, but how much should you update based on that?

    3. HEM is a comprehensive approach

      Comprehensive in what way? The methods mentioned all have flaws and scoped domains. Using multiples of these methods is a standard ML-like way of partially dealing with the limitations of individual methods, but are not at all comprehensively complimentary.

      This would also not be the only way. Saliency methods only give results local to the input without detailing how the model will behave for other inputs. It's unclear whether saliency methods will still be relevant in 4 years..

    4. Use techniques like LIME or SHAP to generate explanations for individual predictions

      Why does this mention these methods again? This entire report feels like a first draft.

    1. "how can eroding part of oneself's away be okay for you to do, if you consider yourself a person/moral patient?"

      I did not realize people (angels?) asked this question.

    2. If you act like an anime character, you win in anime and you lose in real life. If you act like a ratfic character, you win in ratfic and you win in real life.

      ie. only/mostly only consumer ratfic

  4. Oct 2023
    1. unless you can get over that grand hump, it looks to me like one of the key bottlenecks here is bureaucratic legibility of plausible solutions.

      A not-yet solution might be ai-plans.com which invites the public to criticize publications.

    2. Suppose further that everyone agreed that the task at hand was to fully and deeply understand the AI systems we’ve managed to develop so far, and understand how they work, to the point where people could reverse out the pertinent algorithms and data-structures and what-not.

      Nate Soares finds it important for us to deeply understand AI. I thank a significant portion of his P(Doom) comes from our current lack of (predictive) understanding.

    3. when you grow minds, they don’t care about what you ask them to care about and they don’t care about what you train them to care about; instead, I expect them to care about a bunch of correlates of the training signal in weird and specific ways.

      Like how humans optimized for genetic fitness instead pursue weird, specific values like liking ice cream but only if it's cold.

    1. Relatively mundane changes in sensor technology, cyberweapons, and autonomous weapons could increase the risk of nuclear war

      ???

    2. we see that technology can produce social harms, or fail to have its benefits realized, because of a host of structural dynamics

      The split between the superintelligence perspective and the other perspectives seems one to describe a temporal relation. Structural problems are slowly popping up now. But I consider tackling these contemporary problems to be different from X-risk, even though they bleed into each other. I fear future confusions where people preventing misaligned ASI are held accountable for not preventing the deployment of some companies LLM that sometimes doxxes people.

    3. a diverse, global, ecology of AI systems. Some may be like agents, but others may be more like complex services, systems, or corporations. These systems, individually or in collaboration with humans, could give rise to cognitive capabilities in strategically important tasks that exceed what humans are otherwise capable of

      the ecology perspective is still dangerous, though I honestly think that subhuman AI will not organize itself well enough to form a threat, that would moreso be a story of humans disempowering other humans. If the AIs forming the ecology are superhuman, I expect one to FOOM and disempower or assimilate the rest.

    4. Problems of building safe superintelligence are made all the more difficult if the researchers, labs, companies, and countries developing advanced AI perceive themselves to be in an intense winner-take-all race with each other, since then each developer will face a strong incentive to “cut corners”

      problem of managing AI competition

    5. A subsequent governance problem concerns how the developer should institutionalize control over and share the bounty from its superintelligence

      problem of constitution design

  5. Jan 2023
    1. What does it mean to "trade against" a computer program that outputs credences?

      the logical inductor sets prices on sentences between 0$ and 1$. "Buying a credence" (I think) means that you hand the inductor as much money as the amout the inductor asks for it and the inductor promises to pay you back 1$ if an observation occurs that confirms the sentence and 0$ if an observation occurs that falsifies the sentence.

      So if an inductor has a high price on a sentence (a high credence), then that means the inductor considers it likely that the sentence be confirmed at some point in the future.

      Personal note: the inductor might not immediately realize that a sentence has been confirmed even when it has according to someone with unbounded compute. Would the inductor refuse to immediately pay out in such cases?

    2. Probability theory says that credences are "reasonable" if it is impossible for someone to bet against you in a way that is expected to make money, independent of the true state of the world (a Dutch book). Logical induction says that credences are "reasonable" if it is impossible for someone to bet against you in a way that makes more and more money over time with no corresponding down-side risk.

      reasonabillity := a bound on how much a false credence can be exploited by an outside actor

    3. Probability theory and logical induction both provide concrete operationalizations of "quantified uncertainty" (henceforth "credence"), and what it means for a set of credences to be "reasonable".

      credence := quantified uncertiaty

    4. In the remainder of this document we will refer to a binary-valued variable as an "atom" and to a logical statement about some variables as a "logical sentence". A logical sentence that we pass to our logical inductor as "observed true" will be referred to as an "observation".

      atom := binary-valued variable logical sentence := a logical statement about some variables observation := a true logical sentence that passes to the logical inductior

  6. Nov 2022
    1. BDSM didn’t exist in dath ilan. I don’t really know why. Maybe everyone in dath ilan who realized that they wanted to be hurt, categorized themselves as having the stereotypically nonvirtuous quality of self-destructiveness, and kept quiet about it,

      and eliminated those qualities by way of self-modification

    2. there would be centralized development of movies you watched on your own, and the training-games you played in what I won’t insult by calling it a school, and experiments to find out which variations worked.

      dath elons universities at scale ,they oppose that testing is done by the same teachers and they oppose students pay up-front regardless of success

    3. This Earth cannot resolve circular dependencies and almost always gets stuck in Nash equilibria.

      the example is that underground car tunnels aren't popular because of gas fumes, although those gas fumes would not be a problem for electric cars, but those electric cars could -with current-day tech- only safely ride in underground tunnels.

  7. Oct 2022
    1. Good (but hard) exercise: Code your own tiny GPT-2 and train it. If you can do this, I’d say that you basically fully understand the transformer architecture.Example of basic training boilerplate and train scriptThe EasyTransformer codebase is probably good to riff off of here

      todo

  8. Aug 2022
    1. cd /mnt/arch # or where you are preparing the chroot dir mount -t proc /proc proc/ mount --rbind /sys sys/ mount --rbind /dev dev/

      mandatory virtual filesystem mounts

    1. This worked for setting my locale to en_US.UTF-8 before this locale-gen would not generate any locales even though I checked it in dpkg-reconfigure locales

  9. Jul 2022
    1. Assigning $1 to $1 in $1=$1 modifies a field ($1 in this case) and that results in awk rebuilding the record $0. Rebuilding the record replaces the delimiters FS with OFS.

      you can replace a separator by settings FS and OFS and then prompt awk to rebuild the record.

    1. Each record is split by the FS delimiter, # which defaults to white-space

      Records are split by the FS into "fields". patterns are applied to records, not to individual fields. You can determine how a string is split into records using RS. RS only does not allow to split on individual characters whereas FS does.

    1. :map and its friends are the key, :verbose adds info and :redir allow post-search refinement. They are a perfect mix to show what command is bind to what shortcut and viceversa, but if you want to search some keys and avoid temp files whenever you need to search mappings, take a look to scriptease and :Verbose command. It is a wrapper on :verbose to show result in a preview window. this way you can search whatever you want inside results without using temp files type :Verbose map and use / ? as usual.

      finding vim commands

    1. It doesn't seem like you have any highlights of this type — yet! Start taking highlights and we'll automatically get started when you have some.

      testing testing