36 Matching Annotations
  1. Oct 2024
    1. doing more than fairly basic math

      another apples/oranges. We have software that is good at math. Regular people call them spreadsheets. What we don't have, also not in algogens is software that understands what it is they're doing. My model can do sums is not a useful comparison wrt if it can do cognitive tasks.

    2. Essentially anything that a remote worker can do, AI will do better

      Weird notion of remote work as only screen interaction. My team works remote, meaning they think independent from any screen tasks.

    3. Machine learning is a young field,

      ? young? Author is in their 20s, case of 'my first encounter with something means it is globally new'?

    4. I expect AI to get much better than it is today. Research on AI systems has shown that they predictably improve given better algorithms, more and better quality data, and more computational power. Labs are in the process of further scaling up their clusters—the groupings of computers that the algorithms run on.

      Ah, article based on assumption of future improvement. compute and data are limiting factors, and you will end up making the equation if compute footprint is more efficient than doing it yourself. Data even more limiting, as the most meaningful stuff is qualitative rather than quantitative, and stats on the Q stuff won't give you meaning (LLMs case in point)

    5. The shared goal of the field of artificial intelligence is to create a system that can do anything. I expect us to soon reach it.

      Is it though? Wrt GAI that is as far away as before imo. The rainbow never gets nearer, because it is dependent on your position.

    6. The economically and politically relevant comparison on most tasks is not whether the language model is better than the best human, it is whether they are better than the human who would otherwise do that task

      True, and that is where this fails outside of bullshit tasks. The unmentioned assumption here is that algogen output can have meaning, rather than just coherence and plausibility.

    7. The general reaction to language models among knowledge workers is one of denial.

      equates 'content production' w k-work

    8. my ability to write large amounts of content quickly

      right. 'content production' where the actual meaning isn't relevant?

    9. it can competently generate cogent content on a wide range of topics. It can summarize and analyze texts passably well

      cogent content / passably well isn't the quality benchmark for K-work though.

    1. https://web.archive.org/web/20241007071434/https://www.dbreunig.com/2024/10/03/we-need-help-with-discovery-more-than-generation.html

      Author says generation isn't a problem to solve for AI, there's enough 'content' as it is. Posits discovery as a bigger problem to solve. The issue there is, that's way more personal and less suited for VC funded efforts to create a generic tool that they can scale from the center. Discovery is not a thing, it's an individual act. It requires local stuff, tuned to my interests, networks etc. Curation is a personal thing, providing intent to discovery. Same why [[Algemene event discovery is moeilijk 20150926120836]], as [[Event discovery is sociale onderhandeling 20150926120120]] Still it's doable, but more agent like than central tool.

  2. Sep 2024
    1. The field I know as "natural language processing" is hard to find these days. It's all being devoured by generative AI. Other techniques still exist but generative AI sucks up all the air in the room and gets all the money. It's rare to see NLP research that doesn't have a dependency on closed data controlled by OpenAI and Google

      Robyn Speer says in his view natural language processing as a field has been taken over by #algogens And most NLP research now depends on closed data from the #algogens providers.

    2. Reddit also stopped providing public data archives, and now they sell their archives at a price that only OpenAI will pay.

      Reddit was another key data source for wordfreq but they too no longer provide public archives, and sell it at high prices (to the likes of the #algogens)

    3. As one example, Philip Shapira reports that ChatGPT (OpenAI's popular brand of generative language model circa 2024) is obsessed with the word "delve" in a way that people never have been, and caused its overall frequency to increase by an order of magnitude.

      Example of how #algogens slop pollutes corpus data: ChatGPT uses the word 'delve' a lot, an order of magnitude above human usage. #openvraag Is this to do with the 'need' for #algogens to sound more human by switching words around (dial down the randomness, and it will give the same stuff every time, but will stand out immediately as computer generated too)?

    1. paywalled article.

      Wordfreq is shutting down because LLM output on the web is polluting its data to the point of uselessness. It would track longitudinally the change in use of words across a variety of languages. Vgl human centipede epistomology in [[Talk The Expanding Dark Forest and Generative AI]] by [[Maggie Appleton]]

    1. there has been no serious attempt by digital media developers to engage in a constructive public dialogue with historians of information and leading librarians. There is, perhaps, a reason for this. As Geoffrey Nunberg starkly revealed in 2009 in the Chronicle of Higher Education, Google cannot celebrate the history of indexing and cataloguing because it would draw attention to its matrix of errors. As of yet, Google Books does not work as an accurate system of cataloguing and searching for books. Nunberg showed that the seemingly clunky nineteenth-century Library of Congress Classification system is still more accurate.

      A point worth repeating. I think there is a strong parallel here with algogens. The way 'progress' in released models is celebrated by e.g. Donald Clark. It beats a PhD exam, it does CT etc. What does comparison with their deep roots yield though? Keep history short so you may be the biggest giant of all time.

    1. You have to see this new approach as not providing simple solutions to single prompts but predicting and planning, multi-stage tasks, with far more penetrative judgement. You can get it to do the market research or needs analysis, then scenario analysis to evaluate potential outcomes

      Where is this different from the prompt-chaining thing I run locally? Which you prompt for a bunch of steps and then self prompts each one and goes online to further detail or do them.

    2. We will now get hundreds of thousands of real use cases in the real world. The old days of release a perfect product are gone

      yeah, externalising the cost of getting it wrong at scale. Testing it in real world circumstances is extremely useful and needed, yet OpenAI's general public customers will mostly not show their own CT and assume any result is true (seen it happen a lot) and thus moving the cost externalisation further down the chain, where it is more likely to have negative real world consequences.

    3. https://web.archive.org/web/20240916044350/https://donaldclarkplanb.blogspot.com/2024/09/critical-thinking-was-famous-21st.html

      Donald Clark on the latest closed OpenAI iteration. I can see how it may do the rule based bits of CT (although OpenAI's stuff until now still does very basic stuff wrong even before getting to CT here and at unpredictable times making every output suspect and in need of checking) Anything rule based is codable or will stand out as pattern for prediction.

      Says AGI is now more visible. Seems rather Chomsky-esque, assuming language is thinking. Sociocentric (CT!) too, wrt English, while the rest of linguistics world has a century of saying language is communication and thinking a different thing.

  3. Mar 2024
    1. Next to the xz debacle where a maintainer was psyops'd into backdooring servers, this is another new attack surface: AI tools make up software packages in what they generate which get downloaded. So introducing malware is a matter of creating malicious packages named the way they are repeatedly named by AI tools.

    1. https://web.archive.org/web/20240305082302/https://aiedusimplified.substack.com/p/on-not-using-generative-ai

      This seems an interesting piece on the use of algogens. It probably does not address the issues around transparency, labor, footprint etc. But it does seem to address the search for the spot where algogens are useful in one's own workflow. Like me in [[Coding Personal Tools With GitHub Co-Pilot]]. There getting to action faster, and saving time are key. But only if you use it as intermediate step, never as a result to be used as is or as final output.

      Via [[Stephen Downes]] https://www.downes.ca/post/76336

  4. Feb 2024
    1. Broderick makes a more important point: AI search is about summarizing web results so you don't have to click links and read the pages yourself. If that's the future of the web, who the fuck is going to write those pages that the summarizer summarizes? What is the incentive, the business-model, the rational explanation for predicting a world in which millions of us go on writing web-pages, when the gatekeepers to the web have promised to rig the game so that no one will ever visit those pages, or read what we've written there, or even know it was us who wrote the underlying material the summarizer just summarized? If we stop writing the web, AIs will have to summarize each other, forming an inhuman centipede of botshit-ingestion. This is bad news, because there's pretty solid mathematical evidence that training a bot on botshit makes it absolutely useless. Or, as the authors of the paper – including the eminent cryptographer Ross Anderson – put it, "using model-generated content in training causes irreversible defects"

      Broderick: https://www.garbageday.email/p/ai-search-doomsday-cult, Anderson: https://arxiv.org/abs/2305.17493

      AI search hides the authors of the material it presents, summarising it is abstracting away the authors. It doesn't bring readers to those authors, it just presents a summary to the searcher as end result. Take it or leave it. At the same time, if one searches for something you know about, you see those summaries are always of. Leaving you guessing how of it is when searching something you don't know about. Search should never be the endpoint, always a starting point. I think that is my main aversion against AI search tools. Despite those clamoring 'it will get better over time' I don't think it will easily because the tool nor its makers have any interest in the quality of output necessarily and definitely can't assess it. So what's next, humans factchecking AI output. Why not prevent bs at its source? Nice ref to Maggie Appleton's centipede metaphor in [[The Expanding Dark Forest and Generative AI]]

  5. Jan 2024
  6. Dec 2023
    1. "hadn’t seriously considered the future economic impact on illustrators" This sounds too much like the 'every illegal download is a misplaced sale' trope of the music industry. There are many reasons to not use algogens, or opt for different models for such generation than the most popular public facing tools. Missed income for illustrators by using them in blog posts isn't one. Like with music downloads there's a whole world of users underneath the Cosean floor. My blog or presentations will never use bought illustrations, I started making lots of digital photos for that reason way back in 2003, and have been using open Creative Commons licenses. And now may try to generate a few images, if it's not too work intensive. Not to say that outside the mentioned use case of blogs and other sites (the ones that already now are indistinguishable from generated texts and only have generating ad eyeballs as purpose), the lower end of the existing market will get eroded. I bet that at the same time there will be a growing market for clearly human made artefacts as status symbol too. The Reverse Turing effect in play. I've paid more for prints of artwork, both graphics and photos, made in the presence of the artist than one printed after their death for instance. They adorn the walls at home rather than my blog though.

  7. Nov 2023
    1. Creative Commons can be relied upon to take a generally pro-ownership and libertarian stance regarding rules and regulation

      This is bothersome seen from my perspective of both a CC advocate and European national chapter and as a CC using maker. In my experience makers using CC use CC because they want to limit the ownership current international copyright laws and treaties bestow on them, as they see them as obstacle and greedy, and generally not serving the maker but later exploiters of artefacts. Also the perspective of contributing to the common good / pool of culture is frequent, and counter libertarian angles. I need to check but I think it might also be a ways off from Lessig's original idea for CC as expressed in [[Free Culture by Lawrence Lessig]].

    2. https://web.archive.org/web/20231108095251/https://www.downes.ca/cgi-bin/page.cgi?post=75761

      [[Stephen Downes]] on CC and their answers to US copyright questions wrt generative algo's.

  8. Sep 2023
    1. https://www.filosofieinactie.nl/blog/2023/9/5/open-source-large-language-models-an-ethical-reflection (archive version not working) Follow-up wrt openness of LLMs, after the publication of the inteprovincial ethics committee on ChatGPT usage within provincial public sector in NL. At the end mentions the work by Radboud Uni I pointed them to. What are their conclusions / propositions?

  9. Aug 2023
    1. Roland Barthes (1915-1980, France, literary critic/theorist) declared the death of the author (in English in 1967 and in French a year later). An author's intentions and biography are not the means to explain definitively what the meaning of a (fictional I think) text is. [[Observator geeft betekenis 20210417124703]] dwz de lezer bepaalt.

      Barthes reduceert auteur to de scribent, die niet verder bestaat dan m.b.t. de voortbrenging van de tekst. Het werk staat geheel los van de maker. Kwam het tegen in [[Information edited by Ann Blair]] in lemma over de Reader.

      Don't disagree with the notion that readers glean meaning in layers from a text that the author not intended. But thinking about the author's intent is one of those layers. Separating the author from their work entirely is cutting yourself of from one source of potential meaning.

      In [[Generative AI detectie doe je met context 20230407085245]] I posit that seeing the author through the text is a neccesity as proof of human creation, not #algogen My point there is that there's only a scriptor and no author who's own meaning, intention and existence becomes visible in a text.

  10. May 2023
    1. This clearly does not represent all human cultures and languages and ways of being.We are taking an already dominant way of seeing the world and generating even more content reinforcing that dominance

      Amplifying dominant perspectives, a feedback loop that ignores all of humanity falling outside the original trainingset, which is impovering itself, while likely also extending the societal inequality that the data represents. Given how such early weaving errors determine the future (see fridges), I don't expect that to change even with more data in the future. The first discrepancy will not be overcome.

    2. This means they primarily represent the generalised views of a majority English-speaking, western population who have written a lot on Reddit and lived between about 1900 and 2023.Which in the grand scheme of history and geography, is an incredibly narrow slice of humanity.

      Appleton points to the inherent severely limited trainingset and hence perspective that is embedded in LLMs. Most of current human society, of history and future is excluded. This goes back to my take on data and blind faith in using it: [[Data geeft klein deel werkelijkheid slecht weer 20201219122618]] en [[Check data against reality 20201219145507]]

    3. But a language model is not a person with a fixed identity.They know nothing about the cultural context of who they’re talking to. They take on different characters depending on how you prompt them and don’t hold fixed opinions. They are not speaking from one stable social position.

      Algogens aren't fixed social entities/identities, but mirrors of the prompts

    4. A big part of this limitation is that these models only deal with language.And language is only one small part of how a human understands and processes the world.We perceive and reason and interact with the world via spatial reasoning, embodiment, sense of time, touch, taste, memory, vision, and sound. These are all pre-linguistic. And they live in an entirely separate part of the brain from language.Generating text strings is not the end-all be-all of what it means to be intelligent or human.

      Algogens are disconnected from reality. And, seems a key point, our own cognition and relation to reality is not just through language (and by extension not just through the language center in our brain): spatial awareness, embodiment, senses, time awareness are all not language. It is overly reductionist to treat intelligence or even humanity as language only.

    5. This disconnect between its superhuman intelligence and incompetence is one of the hardest things to reconcile.

      generative AI as very smart and super incompetent at the same time, which is hard to reconcile. Is this a [[Monstertheorie 20030725114320]] style cultural category challenge? Or is the basic one replacing human cognition?

    6. But there are a few key differences between content generated by models versus content made by humans.First is its connection to reality. Second, the social context they live within. And finally their potential for human relationships.

      yes, all generated content is devoid of an author context e.g. It's flat and 2D in that sense, and usually fully self contained no references to actual experiences, experiments or things outside the scope of the immediate text. As I describe https://hypothes.is/a/kpthXCuQEe2TcGOizzoJrQ

    7. Most of the tools and examples I’ve shown so far have a fairly simple architecture.They’re made by feeding a single input, or prompt, into the big black mystery box of a language model. (We call them black boxes because we don't know that much about how they reason or produce answers. It's a mystery to everyone, including their creators.)And we get a single output – an image, some text, or an article.

      generative AI currently follows the pattern of 1 input and 1 output. There's no reason to expect it will stay that way. outputs can scale : if you can generate one text supporting your viewpoint, you can generate 1000 and spread them all as original content. Using those outputs will get more clever.

    8. By now language models have been turned into lots of easy-to-use products. You don't need any understanding of models or technical skills to use them.These are some popular copywriting apps out in the world: Jasper, Copy.ai, Moonbeam

      Mentioned copy writing algogens * Jasper * Wordtune * copy.ai * quillbot * sudowrite * copysmith * moonbeam