the future of video generation may depend more on language models and agents than on diffusion alone
大多数人认为扩散模型(diffusion models)是视频生成的核心技术,并将持续主导这一领域,但作者认为未来视频生成的发展将更多地依赖于语言模型和代理技术,而非单纯的扩散方法。这挑战了当前AI生成领域的技术共识,暗示了语言模型可能在视频生成中扮演更重要的角色。
the future of video generation may depend more on language models and agents than on diffusion alone
大多数人认为扩散模型(diffusion models)是视频生成的核心技术,并将持续主导这一领域,但作者认为未来视频生成的发展将更多地依赖于语言模型和代理技术,而非单纯的扩散方法。这挑战了当前AI生成领域的技术共识,暗示了语言模型可能在视频生成中扮演更重要的角色。
First, if you’re using frontier models through the API or a pro subscription, you have significantly more control than most people realize. Your data generally isn’t feeding back into training. You’re using the model as a tool, not handing over your content to a platform. That’s a meaningfully different relationship than the one you have with social media companies, where you’re feeding them data, and their business model is based on monetizing that data.
This is only true if, and only if, you actually believe Sam Altman* is telling the truth and won't change his mind in the future. If you don't think you can win a a lawsuit against OpenAI, and I mean actually win it, not just be correct about them violating their contract with you, then this particular argument does not hold for you.
*or Dario Amodei, but they're cut from the same cloth
A small number of samples can poison LLMs of any size<br /> by [[Anthropic]] from 2025-10-09 accessed on 2025-12-30T15:26:42
Interesting position paper about how to have useful discussion about AGI and Foundation models.
Detailed explanation of what DeepSeek model is doing differently to improve performance and training time over ChatGPT.
when you want to use Google, you go into Google search, and you type in English, and it matches the English with the English. What if we could do this in FreeSpeech instead? I have a suspicion that if we did this, we'd find that algorithms like searching, like retrieval, all of these things, are much simpler and also more effective, because they don't process the data structure of speech. Instead they're processing the data structure of thought
for - indyweb dev - question - alternative to AI Large Language Models? - Is indyweb functionality the same as Freespeech functionality? - from TED Talk - YouTube - A word game to convey any language - Ajit Narayanan - data structure of thought - from TED Talk - YouTube - A word game to convey any language - Ajit Narayanan
https://metagov.org/projects/koi-pond
Metagov's KOI (Knowledge Organization Infrastructure) is a graph database that supports relationships between knowledge objects, users, and groups within Metagov. via JM
https://research.ibm.com/blog/retrieval-augmented-generation-RAG
PK indicates that folks using footnotes in AI are using rag methods.
Hubinger, et. al. "SLEEPER AGENTS: TRAINING DECEPTIVE LLMS THAT PERSIST THROUGH SAFETY TRAINING". Arxiv: 2401.05566v3. Jan 17, 2024.
Very disturbing and interesting results from team of researchers from Anthropic and elsewhere.
GPT-4 System CardOpenAIMarch 23, 2023
Introduction of the RoBERTa improved analysis and training approach to BERT NLP models.
Wu, Prabhumoye, Yeon Min, Bisk, Salakhutdinov, Azaria, Mitchell and Li. "SPRING: GPT-4 Out-performs RL Algorithms byStudying Papers and Reasoning". Arxiv preprint arXiv:2305.15486v2, May, 2023.
Zecevic, Willig, Singh Dhami and Kersting. "Causal Parrots: Large Language Models May Talk Causality But Are Not Causal". In Transactions on Machine Learning Research, Aug, 2023.
"The Age of AI has begun : Artificial intelligence is as revolutionary as mobile phones and the Internet." Bill Gates, March 21, 2023. GatesNotes
Minda Zetlin. "Bill Gates Says We're Witnessing a 'Stunning' New Technology Age. 5 Ways You Must Prepare Now". Inc.com, March 2023.
Feng, 2022. "Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis"
Shared and found via: Gowthami Somepalli @gowthami@sigmoid.social Mastodon > Gowthami Somepalli @gowthami StructureDiffusion: Improve the compositional generation capabilities of text-to-image #diffusion models by modifying the text guidance by using a constituency tree or a scene graph.
Training language models to follow instructionswith human feedback
Original Paper for discussion of the Reinforcement Learning with Human Feedback algorithm.
GPT-2 Introduction paper
Language Models are Unsupervised Multitask Learners A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, (2019).
"Attention is All You Need" Foundational paper introducing the Transformer Architecture.
GPT-3 introduction paper
"Are Pre-trained Convolutions Better than Pre-trained Transformers?"
LaMDA: Language Models for Dialog Application
"LaMDA: Language Models for Dialog Application" Meta's introduction of LaMDA v1 Large Language Model.
Benyamin GhojoghAli Ghodsi. "Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey"
LLAMA 2 Release Paper
Daniel Adiwardana Minh-Thang Luong David R. So Jamie Hall, Noah Fiedel Romal Thoppilan Zi Yang Apoorv Kulshreshtha, Gaurav Nemade Yifeng Lu Quoc V. Le "Towards a Human-like Open-Domain Chatbot" Google Research, Brain Team
Defined the SSI metric for chatbots used in LAMDA paper by google.
The Annotated S4 Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, and Christopher Ré.
A new approach to transformers
Efficiently Modeling Long Sequences with Structured State SpacesAlbert Gu, Karan Goel, and Christopher R ́eDepartment of Computer Science, Stanford University
Bowman, Samuel R.. "Eight Things to Know about Large Language Models." arXiv, (2023). https://doi.org/https://arxiv.org/abs/2304.00612v1.
The widespread public deployment of large language models (LLMs) in recent months has prompted a wave of new attention and engagement from advocates, policymakers, and scholars from many fields. This attention is a timely response to the many urgent questions that this technology raises, but it can sometimes miss important considerations. This paper surveys the evidence for eight potentially surprising such points: 1. LLMs predictably get more capable with increasing investment, even without targeted innovation. 2. Many important LLM behaviors emerge unpredictably as a byproduct of increasing investment. 3. LLMs often appear to learn and use representations of the outside world. 4. There are no reliable techniques for steering the behavior of LLMs. 5. Experts are not yet able to interpret the inner workings of LLMs. 6. Human performance on a task isn't an upper bound on LLM performance. 7. LLMs need not express the values of their creators nor the values encoded in web text. 8. Brief interactions with LLMs are often misleading.
Found via: Taiwan's Gold Card draws startup founders, tech workers | Semafor
It was only by building an additional AI-powered safety mechanism that OpenAI would be able to rein in that harm, producing a chatbot suitable for everyday use.
This isn't true. The Stochastic Parrots paper outlines other avenues for reining in the harms of language models like GPT's.
Ganguli, Deep, Askell, Amanda, Schiefer, Nicholas, Liao, Thomas I., Lukošiūtė, Kamilė, Chen, Anna, Goldie, Anna et al. "The Capacity for Moral Self-Correction in Large Language Models." arXiv, (2023). https://doi.org/https://arxiv.org/abs/2302.07459v2.
We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability for moral self-correction emerges at 22B model parameters, and typically improves with increasing model size and RLHF training. We believe that at this level of scale, language models obtain two capabilities that they can use for moral self-correction: (1) they can follow instructions and (2) they can learn complex normative concepts of harm like stereotyping, bias, and discrimination. As such, they can follow instructions to avoid certain kinds of morally harmful outputs. We believe our results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles.
Dass das ägyptische Wort p.t (sprich: pet) "Himmel" bedeutet, lernt jeder Ägyptologiestudent im ersten Semester. Die Belegsammlung im Archiv des Wörterbuches umfaßt ca. 6.000 Belegzettel. In der Ordnung dieses Materials erfährt man nun, dass der ägyptische Himmel Tore und Wege hat, Gewässer und Ufer, Seiten, Stützen und Kapellen. Damit wird greifbar, dass der Ägypter bei dem Wort "Himmel" an etwas vollkommen anderes dachte als der moderne westliche Mensch, an einen mythischen Raum nämlich, in dem Götter und Totengeister weilen. In der lexikographischen Auswertung eines so umfassenden Materials geht es also um weit mehr als darum, die Grundbedeutung eines banalen Wortes zu ermitteln. Hier entfaltet sich ein Ausschnitt des ägyptischen Weltbildes in seinem Reichtum und in seiner Fremdheit; und naturgemäß sind es gerade die häufigen Wörter, die Schlüsselbegriffe der pharaonischen Kultur bezeichnen. Das verbreitete Mißverständnis, das Häufige sei uninteressant, stellt die Dinge also gerade auf den Kopf.
Google translation:
Every Egyptology student learns in their first semester that the Egyptian word pt (pronounced pet) means "heaven". The collection of documents in the dictionary archive comprises around 6,000 document slips. In the order of this material one learns that the Egyptian heaven has gates and ways, waters and banks, sides, pillars and chapels. This makes it tangible that the Egyptians had something completely different in mind when they heard the word "heaven" than modern Westerners do, namely a mythical space in which gods and spirits of the dead dwell.
This is a fantastic example of context creation for a dead language as well as for creating proper historical context.
In looking at the uses of and similarities between Wb and TLL, I can't help but think that these two zettelkasten represented the state of the art for Large Language Models and some of the ideas behind ChatGPT
"There is a robust debate going on in the computing industry about how to create it, and whether it can even be created at all."
Is there? By whom? Why industry only and not government, academia and civil society?
Bender, Emily M., Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜” In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–23. FAccT ’21. New York, NY, USA: Association for Computing Machinery, 2021. https://doi.org/10.1145/3442188.3445922.
Would the argument here for stochastic parrots also potentially apply to or could it be abstracted to Markov monkeys?
language models are incredible "yes, and" machines, allowing writers to quickly explore seemingly unlimited variations on their ideas.
"Talking About Large Language Models" by Murray Shanahan
natural-language processing is going to force engineers and humanists together. They are going to need each other despite everything. Computer scientists will require basic, systematic education in general humanism: The philosophy of language, sociology, history, and ethics are not amusing questions of theoretical speculation anymore. They will be essential in determining the ethical and creative use of chatbots, to take only an obvious example.
Houston, we have a Capability Overhang problem: Because language models have a large capability surface, these cases of emergent capabilities are an indicator that we have a ‘capabilities overhang’ – today’s models are far more capable than we think, and our techniques available for exploring the models are very juvenile. We only know about these cases of emergence because people built benchmark datasets and tested models on them. What about all the capabilities we don’t know about because we haven’t thought to test for them? There are rich questions here about the science of evaluating the capabilities (and safety issues) of contemporary models.
tutorial article on BERT
It might be instructive to think about what it would take to create a program which has a model of eighth grade science sufficient to understand and answer questions about hundreds of different things like “growth is driven by cell division”, and “What can magnets be used for” that wasn’t NLP led. It would be a nightmare of many different (probably handcrafted) models. Speaking somewhat loosely, language allows for intellectual capacities to be greatly compressed. From this point of view, it shouldn’t be surprising that some of the first signs of really broad capacity- common sense reasoning, wide ranging problem solving etc., have been found in language based programs- words and their relationships are just a vastly more efficient way of representing knowledge than the alternatives.
DePonySum ask us to consider what you would need to program to be able to answer a wide range of eight grade science level questions (e.g. What can magnets be used for.) The answer is you would need a whole slew of separately trained and optimized models.
Language, they say, is a way to compress intellectual capacities.
It is then no surprise that common sense reasoning, and solving a wide range of problems, is first discovered through language models. Words and their relationships are probably a very efficient way of representing knowledge.