10,000 Matching Annotations
  1. Last 7 days
  2. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. Twitter. November 2023. Page Version ID: 1187856185. URL: https://en.wikipedia.org/wiki/Twitter (visited on 2023-12-01).

      This links is to a Wikipedia article talking about the app X. It talks about what users can do on the app, where the app originated, and about the history of the app and how it was formerly known and Twitter, and the original name of X was actually Twttr.

    2. Caroline Delbert. Some People Think 2+2=5, and They’re Right. Popular Mechanics, October 2023. URL: https://www.popularmechanics.com/science/math/a33547137/why-some-people-think-2-plus-2-equals-5/ (visited on 2023-11-24).

      This source caught my attention because the title is surprising, but it makes an important point. I think the article helps show that numbers are not always as simple or objective as they first appear. In real-world situations, the meaning of a number often depends on definitions, assumptions, and context. That connects strongly to this chapter, especially the discussion of how measuring Twitter bots depends on how people define what they are counting.

    3. Ruta Butkute. The dark side of voluntourism selfies. June 2018. URL: https://kinder.world/articles/you/the-dark-side-of-voluntourism-selfies-18537 (visited on 2023-11-24).

      This source is an article discussing voluntourist photos published to social media. It explains that photos of Western tourists in extremely poor villages perpetuate negative generalizations about Africa as a continent. It also mentions a satirical article from The Onion, making commentary on the same topic, which is also cited in this chapter. This article made me consider the irony in posting these photos as a voluntourist. As a volunteer, you have good intentions, but by posting these very normalized photos, you are in some ways damaging the communities you wish to help.

    4. The Bloomberg article says Twitter claims spam bots are under 5% of users, but some people argue the number is higher. This shows how hard it is to measure data on social media. It connects to Chapter 4 because data is not completely objective—how you define something like a “bot” can change the result. I think this also affects trust. If different groups give very different numbers, it’s hard to know what is true. It makes me feel that social media data is less reliable than it looks.

    5. Shannon Bond. Elon Musk wants out of the Twitter deal. It could end up costing at least $1 billion. NPR, July 2022. URL: https://www.npr.org/2022/07/08/1110539504/twitter-elon-musk-deal-jeopardy (visited on 2023-11-24).

      This article was very interesting because Elon Musk's conflict with Twitter was very viral. Fake accounts are common problems for many SNS platforms and he might be charged for a $1 billion breakup fee was pretty interesting.

    6. Document file format. August 2023. Page Version ID: 1170388374. URL: https://en.wikipedia.org/w/index.php?title=Document_file_format&oldid=1170388374 (visited on 2023-11-24).

      This source talks about and lists all of the mainstream file formats. It gives links to the description of all of the file formats, each of which has different uses, with some being more commonly used than others.

    1. Can you think of an example of pernicious ignorance in social media interaction? What’s something that we might often prefer to overlook when deciding what is important?

      I think that when influencers post brand deals/ promotions about a product when they don't actually like it can be an example of pernicious ignorance as they are spreading misinformation online for people to believe.

    2. One idea from Chapter 4 that stood out to me is the claim that all data is a simplification of reality. This made me realize how much social media reduces complex human behavior into simple numbers like likes, views, or follower counts. From my own experience using apps like TikTok and Instagram, it feels like people start valuing themselves based on these metrics, even though they don’t fully represent who they are. For example, a post might not get many likes, but that doesn’t mean it has no meaning or value. I think this simplification can be harmful because platforms treat these numbers as if they are objective truth, which can influence how algorithms promote content and how users judge themselves and others. It makes me question whether social media data is actually reflecting reality, or just shaping a distorted version of it.

    1. When you were a kid, you probably asked your parents this question at some point about things you were told to do, and you probably got answers varying from “Because I said so” to “Remember, if you finish up soon, you’ll have time to play”.  These responses serve as motivation for you to do that thing; one is an order and the other is a promise.

      This kinda made me think about how different types of motivation feel. Like when someone just tells you to do something because they said so, it just feels forced and you don’t really want to do it. But when there’s something in it for you or it benefits you in some way, it actually makes you want to do it. It shows how the way something is said can totally change your attitude toward it.

    1. Fixed mindsets can take over our learning when we become attached to a score or result such as a grade.

      l connect to that because i’ve definitely seen for myself how easy it is to shut down after a bad grade and just stop trying, But this is kind of pushing back on that, saying that moment is actually where you decide what happens next. You can either stay stuck in that mindset or use it to figure out what you need to improve. I like the idea that it’s not about being naturally smart, it’s more about what you do after things don’t go your way.

    1. batch

      I don't think batch should be mentioned, this is not a realistic solution and there is no way cells can grow to high densities in batch culture (batch culture means that media is never topped-up or exchanged, which means that cells run out of nutrients and cellular waste products will accumulate and stop the cells from growing in a few days).

    1. It is important to note that language ideologies do not always have negative impacts.

      I like how this shifts things in a more positive direction. It kinda reminds me that the way we think about language actually matters, like if we value different languages instead of judging them, it can make people feel more accepted.

    2. It is important to note that language ideologies do not always have negative impacts.

      This is saying that even though language ideologies can cause problems sometimes, they’re not always bad. They can also have positive effects depending on the situation.

    1. We have to be aware that we are always making these simplifications, try to be clear about what simplifications we are making, and think through the ethical implications of the simplifications we are making.

      The sentence “all data is a simplification of reality” really stood out to me. I like this point because it reminds us that data is never just a perfect copy of the real world. The apple example was simple, but it clearly showed that counting something as “one” can hide important differences. I think this also connects strongly to the Twitter bot example, because the result depends a lot on how people define words like “user” or “spam bot.” This made me realize that when we look at data, we should not only ask whether it is correct, but also ask what has been simplified or left out.

    1. As a college student, you may be asked to begin research papers with a synthesis of the sources.  Your primary purpose is to show readers that you are familiar with the field and are qualified to offer your own opinions.  But your larger purpose is to show that in spite of all this wonderful research, no one has addressed the problem in the way that you intend to in your paper.  This gives your synthesis a purpose and even a thesis of sorts. Because each discipline has specific rules and expectations, you should consult your professor or a guidebook for that specific discipline if you are asked to write a review of the literature and aren’t sure how to do it.

      if needing more ask for help from your instructor

    2. In contrast, a thesis-driven synthesis not only combines information from multiple sources, but also uses that information to support a central claim or argument. Here, you evaluate and interpret the sources to develop your own perspective or theory about the topic.

      if the thesis is very strong it requires to use information sources just it could help with your argument.

    3. A synthesis can serve different purposes, depending on the assignment. In a background synthesis, your goal is to collect and organize information from various sources by topic or theme, presenting an overview of what is known about a subject. This type does not require an argument or thesis—it simply helps readers understand the current state of research or information.

      it good and it also help with organizing ideas.

    4. (1)  Accurately reports information from the sources using different phrases and sentences; (2)  Organized in such a way that readers can immediately see where the information from the sources overlap;. (3)  Makes sense of the sources and helps the reader understand them in greater depth.

      keep these in mind when writing.

    5. The basic research report (described below as a background synthesis) is a common document in the business world.  Whether one is proposing to open a new store or expand a product line, the report will synthesize information and arrange it by topic rather than by source.  Whether you want to present information on child rearing to a new mother, or details about your town to a new resident, you’ll find yourself synthesizing too. And just as in college, the quality and usefulness of your synthesis will depend on your accuracy and organization.

      use the craap to do the test on your work

    6. Whenever you report to a friend about a film or podcast, you engage in synthesis.  People synthesize information naturally to help other see the connections between things they learn;  for example, you have probably stored up a mental data bank of the various descriptions you’ve heard about particular professors. If your data bank contains several positive descriptions, you might synthesize that information and use it to enroll in a class from that professor.  Synthesis is related to but not the same as classification, division, or comparison and contrast.  Instead of attending to categories or finding similarities and differences, synthesizing sources is a matter of pulling them together into some kind of harmony. Synthesis searches for links between materials for the purpose of constructing a thesis or theory.

      the purpose of synthesis is to show the connection

    7. At its most basic level, a synthesis involves combining two or more summaries, but synthesis writing is more difficult than it might at first appear because this combining must be done in a meaningful way, and the final essay must generally be thesis-driven.  In composition courses, “synthesis” commonly refers to writing about printed texts, drawing together particular themes or traits that you observe in those texts, organizing the material from each text according to those themes or traits, and developing your own thesis or theory.  Sometimes, you may be asked to synthesize your own ideas with those of the texts you have been assigned. In your other college classes, you’ll probably find yourself synthesizing information from graphs and tables, pieces of music, and artworks as well.

      bringing idea together

    1. What if someone told you that you couldn’t pick up a paintbrush unless you were already a great artist? What if someone said you could only swim in the pool if you were an Olympic-level swimmer? Or that you couldn’t make pasta in the kitchen because you’re not yet a 5-star chef? You would immediately know that such high standards are ridiculous. Then why do many of us have such fear of learning languages ‘imperfectly’?

      I like how this is pointing out how unfair our expectations are when it comes to learning languages because I used to think about it all the time. Like we're okay being beginners at things like sports or cooking, but with language we expect ourselves to be good right away. It’s kind of calling out that mindset and saying it doesn’t really make sense.

    1. When you finalize your conclusion, make sure your text is not too repetitive. While your goal is to reintroduce your argument, you don’t want to bore the reader with the exact same sentences you included in your introduction; instead, reiterate your thesis using your new perspective of the topic.

      in the conclusion never repeat what you wrote.

    2. Reintroduce the argument introduced in your thesis statement. Reiterate the key points of your research. Offer some forecasts for the future (example: “Hopefully now with a clearer understanding about free soloing and the rock-climbing community, others might understand the draw to such a seemingly risky sport…”).

      what should be focused in conclusion.

    3. The conclusion is your opportunity to summarize the essay and hopefully spur the reader to want to learn more about the topic. Be sure to clearly reiterate the thesis statement. In your introduction, you may have laid out what would be covered in the essay. Offer a sentence or two reiterating what was learned about those topic areas. Finally, work to avoid adding any new information and questions in this final section of your writing.

      the conclusion always state the thesis

    4. Begin with a topic sentence. Using one of the five Ws or H questions here will remind you and your readers what you will focus on in this paragraph. Introduce your sources in a sentence or two to summarize what the information revealed about your topic. Include a direct quote using P.I.E. and reflect on what the source illuminated about your question.

      explain how you can build a strong body paragraphs.

    5. The main purpose of the body paragraphs is to inform the target audience about the background/significance of your topic or the answers to the 5 Ws and H driving questions that you focused your research on. Share some interesting facts, go into the possibly unknown details, or reflect common knowledge in a new light to make readers intrigued. Body paragraphs should discuss the inquiry process you followed to research your topic

      the most important part

    6. Define the topic. Provide short background information. Introduce who your intended audience is. State what your driving research question is. Create a thesis statement by identifying the scope of the informative essay (the main point you want your audience to understand about your topic).

      the five key to organize an introduction

    7. Then, introduce the topic with its background in a couple of sentences. The writer will then end the paragraph with a powerful thesis statement, which points to the necessity of topic research. The writer’s goal is to do everything possible to lure the audience’s interest in the initial paragraph.

      what comes after the hook

    8. The initial stage is an introduction, which should start with the sound hook sentence to engage the reader in what a writer plans to share. One example is: “A community is generally defined by people in a group who live together in a particular area, or a group of people who are considered a unit because of their shared interests or background.”

      how it should start, by engaging your readers

    9. Some events and trends are too recent to appear in Tier One sources, which tend to be highly specific, and sometimes you need a more general perspective on a topic. Thus, Tier Two sources can provide quality information that is more accessible to non-academics. There are three main categories. First, official reports from government agencies or major international institutions like the World Bank or the United Nations; these institutions generally have research departments staffed with qualified experts who seek to provide rigorous, even-handed information to decision-makers. Second, feature articles from major newspapers and magazines like the New York Times, Wall Street Journal, London Times, or The Economist are based on original reporting by experienced journalists (not press releases) and are typically 1500+ words in length. Third, there are some great books from non-academic presses that cite their sources; they’re often written by journalists. All three of these sources are generally well researched descriptions of an event or state of the world, undertaken by credentialed experts who generally seek to be even-handed.

      non academic sources

    10. These are sources from the mainstream academic literature: books and scholarly articles. Academic books generally fall into three categories: (1) textbooks written with students in mind, (2) monographs which give an extended report on a large research project, and (3) edited-volumes in which each chapter is authored by different people. Scholarly articles appear in academic journals, which are published multiple times a year in order to share the latest research findings with scholars in the field. They’re usually sponsored by some academic society. To get published, these articles and books had to earn favorable anonymous evaluations by qualified scholars.

      what you should focus on for your research.

    11. The Informative Research Report draws primarily from resources found in tiers 1 and 2, according to the research table in Writing in College:

      what to know about the informative research.

    12. Students should have a clearer idea of their research topic and can begin exploring common challenges to finding relevant sources and managing them (recording citation details, quoting, paraphrasing, citing

      understand what are is about.

    13. Now that you have spent time considering different aspects of your topic in your exploratory essay, you will continue your research through our CNM library resources to help inform a larger audience about your topic

      help to connect

    14. The point of an informative essay is not to convince others to take a certain action or stance; that role is expressly reserved for persuasive essays. Instead, the main objective is to highlight specific information about your topic. In this project, you may be asking “after researching general aspects about my topic, what do I want others to understand about it?” Of course, if your informative essay is interesting enough, it may move readers to learn more about the subject, but they’ll have to come to that on their own, thanks to the wealth of interesting information you present.

      focusing on the main point., trying to get people to know what you are saying.

    15. The Five W’s and How, Image by Gerd Altmann from Pixabay. The purpose of an informative essay, sometimes called an expository essay, is to educate others on a certain topic. Typically, these essays aim to answer the five Ws and H questions: who, what, where, when, why, and how. For this essay, you will focus on one or two driving questions about your topic, which will drive your research and help you reach a conclusion. The question can be one that emerged from your Exploratory Essay or it can be a brand-new question about your topic that you are interested in researching.

      the five W's help you to narrow your research.

    16. Sometimes students are asked to write an informative research report, which is a different type of document than the exploratory essay, which we will cover in the next chapter. The Informative Research Report is a report that relays the results of a central research question in an organized manner through more formal sources. These resources could include Google Scholar, library catalogs and academic article databases, websites of relevant agencies, and Google searches using (site: *.gov or site: *.org). A report is written from the perspective of someone who is seeking to find specific and in-depth information about a certain aspect of a topic.

      help with the basics and prepare you and it give example of research paper.

    1. symbolize human experience and embody the spiritual values of aculture.

      The myths themselves show how experiences whether they were true or not really gave a lot meaning and really created a culture of belief systems that was shared through stories.

    1. Dates turn out to be one of the trickier data types to work with in practice. One of the main reasons for this is that what time or day it depends on what time zone you are in.

      I had a previous career as a geospatial analyst, and sometimes I would work on spatio-temporal data sets that had a time and a space component. There were many times that I had to fuss with time and date fields in my tables to get the date formats to match, since there are so many valid ways to write date and time.

    1. By the late 1960s, an estimated 26 families had been displaced by urban renewal projects in Maywood, 88% of which were families of color.

      This one really shows the complete displacement of POC families during this period. With there being 26 families removed its a lot smaller of a sample size but with most of them being POC you can see where complaints would arise.

    2. By the late 1960s, an estimated 22,950 families had been displaced by urban renewal projects in Chicago, 64% of which were families of color.

      It is not surprising to me because I have family in Chicago so I know a bit about the history there but 64 it definitely a number that reflects the time. Many of those families, still haven't recovered from this. It is important to understand why they wanted to move these families and what justification they used.

    1. All of this resonated widely, enabling the Nazis to win 37 percent of the vote in the election of 1932

      Another part of the Nazis' rise to power is how fractured the left leaning opposition was with the Socialist and communist parties not wanting to work with the moderate center-left parties. This helped spilt the vote enough for Hitler to consolidate his base and get more votes than anyone else.

    2. Experience soon showed that Japan’s concern was far more for Asia’s resources than for its liberation and that Japanese rule exceeded in brutality even that of the Europeans.

      This brutality was really shown in Korea and parts of China, especially when it came to the sexual violence they inflicted on women and children.

    3. Non-Russian nationalists in Ukraine, Poland, Muslim Central Asia, and the Baltic region demanded greater autonomy or even independence.

      This type of Social conflict was foundational to Karl Marx's conflict theory that influenced his communist views

    4. Women were urged to leave the factory work they had taken up during the war and return to their homes, where they would not compete against returning veterans for “men’s jobs.”

      This is interesting because this type of freedom was new for middle class women but poor women were often forced into “mens jobs”. I wonder if they were also urged to leave

    1. Many professions require some form of programming.

      Esto muestra que la programación ya no es algo de lo que debamos desconocer y dejarlo como tal solo a los ingenieros, como se ha mencionado en clase, sino una habilidad que se usa en varias áreas. En Ciencia de la información bibliotecología y archivística se nota, por ejemplo, cuando trabajamos con herramientas digitales para organizar la información como en Protégé para la creación de Ontologías, si bien aquí no se programa directamente, si hay pensamiento lógico en RDF. Como estudiante también, siento que aprender de programación ya no es opcional, sino necesario.

    2. Good programming also satisfies an aesthetic sense of accomplishment

      Considero interesante esta afirmación debido a que de alguna manera el sentido estético nos hace sentir satisfacción de lograr algo "bello", en este caso, bien hecho.

    1. III
      • Princípio do equilíbrio orçamentário - Regra de Ouro
        • Vedação de operação de crédito superior às despesas de capital;
        • Possibilidade de operação superior às despesas de capital se aprovado pelo Poder Legislativo por maioria <u>absoluta</u>;
        • Com a autorização legislativa, os acréscimos serão viabilizados mediante créditos <u>suplementares</u> ou <u>especiais</u> com finalidade precisa.
        • Observe que a regra de ouro é norma originária, não proveniente de emenda.
    1. What data types might be used to represent that data on a computer?

      SNS platforms like X have most of the posts in a form of sentences. Therefore, string will be used for represent most of the posts.

    1. which tasks are easy and which ones are hard

      ...like, for example, standing on two feet. It is really hard to teach a robot to balance like a bipedal human, which is what makes those dancing robot videos by Boston Dynamics so monumental.

      Oh. Just realized that Boston Dynamics was mentioned below. sigh Haha

    2. computer generated characters in motion pictures such as Avatar, the Lord of the Rings, and popular Pixar animations where the animated characters replicate gestures made by real human actors.

      I wonder when AI started to be credited for motion capture technology. I read somewhere that one of the first movies to make use of motion capture (perhaps being entirely made through that technology) was The Polar Express (2004). It makes me wonder what constitutes AI and what is advanced technology, which is a key point this article focuses on.

    1. The funny part is that none of this made the CLI worse for humans.

      大多数人认为增加机器可读的接口(如标志、JSON配置)会降低工具对人类的友好度。但作者认为,这些为AI代理设计的特性实际上改善了人类用户体验,因为它们使工具更加明确、可预测和可组合,而不是让工具变得更复杂。

    2. Every prompt is a flag in disguise

      大多数人认为交互式提示是CLI工具的最佳实践,因为它能引导用户完成复杂任务。但作者认为,每个交互式提示都应该有对应的命令行标志,因为这种设计让工具既能服务于人类用户,也能被AI代理自动化使用,而不需要额外的API层。

    3. Designing for agents forced us to build better tools for everyone.

      大多数人认为设计AI代理工具会专门针对机器,可能会牺牲人类用户体验。但作者认为,为AI代理设计工具反而能提升所有用户的体验,因为代理带来的约束条件(如明确的状态管理、可预测的接口)同样让工具对人类开发者更加友好和可脚本化。

    1. Whether or not this specific bet pays off, the underlying argument that the next meaningful leap in AI capability requires moving beyond language modeling is increasingly hard to dismiss.

      大多数人认为AI的未来发展将继续沿着语言模型的方向前进,但作者认为真正的突破需要超越语言建模范式。这一观点挑战了当前AI发展的主流叙事,暗示我们需要从根本上重新思考AI的发展方向。

    2. The clustering of capital and talent around this problem is itself a signal. The applications that most clearly benefit from world models are those where LLMs have struggled most.

      大多数人认为资金和人才应该集中在当前AI表现最好的领域,但作者认为世界模型的发展恰恰是因为LLMs在关键领域表现不佳。这一观点挑战了资源分配的主流思路,暗示真正的突破可能来自于解决现有系统的弱点。

    3. AMI Labs is not building a product for immediate deployment. This is a fundamental research effort, likely measured in years before commercial applications emerge.

      在当今追求快速商业化的AI环境中,大多数人认为AI研究应该迅速转化为产品。但作者指出AMI Labs正在进行基础研究,而非直接开发产品,这一观点挑战了科技行业对即时商业化的普遍期待,强调了基础研究的重要性。

    4. LLMs have no grounded understanding of the physical world. They model the statistical distribution of language about reality, not reality itself.

      大多数人认为大型语言模型通过学习物理世界的知识来理解现实,但作者认为LLMs实际上只是学习了关于现实的文本统计分布,而非对现实本身的直接理解。这一观点挑战了人们对LLM能力本质的认知,暗示当前AI系统存在根本性的理解缺陷。

    1. You have to have people that have the ability to rethink the workflow at a scale that AI can execute, versus at a scale that humans can execute.

      大多数人认为AI只需适应现有工作流程即可,但作者强调企业需要重新设计工作流程以适应AI的能力范围。这一观点挑战了传统的技术实施思维,暗示成功AI应用需要根本性的流程重构,而非简单的技术叠加。

    2. 95% of organizations are getting zero return on AI deployed, with most failures found due to 'brittle workflows.'

      尽管AI投资激增,但绝大多数企业未能获得任何回报。这与主流认为AI能自动带来显著效益的观点形成鲜明对比,暗示AI实施失败的主要问题不在于技术本身,而在于工作流程设计不当,这是一个反直觉的发现。

    3. in 2024, 47% of AI solutions were built internally and 53% were purchased; today, 76% of all AI is purchased rather than developed in-house.

      大多数人认为企业会越来越倾向于自主开发AI模型以保持竞争优势和控制权,但数据显示企业正迅速转向购买第三方AI解决方案。这一趋势与主流认知相悖,表明企业可能更看重快速部署和成本效益而非技术自主性。

    1. The government has so far favoured a pro-innovation, sector-led approach, prioritising voluntary principles over hard regulation.

      大多数人认为英国政府在AI监管方面会采取强硬立场保护创作者权益。但作者指出政府实际上倾向于亲创新、行业主导的方法,优先考虑自愿原则而非硬性监管。这一发现与公众对政府保护创作者的期望形成鲜明对比,揭示了政策现实与公众认知之间的差距。

    1. We introduce Iterative Reward Calibration, a methodology for designing per-turn rewards using empirical discriminative analysis of rollout data

      大多数人认为奖励设计应基于领域专家知识和预定义规则,但作者提出应基于实际训练数据的经验判别分析来迭代校准奖励。这种方法挑战了传统的奖励工程方法论,将奖励设计从'专家驱动'转向'数据驱动'。

    2. the trained 4B model exceeding GPT-4.1 (49.4 percent) and GPT-4o (42.8 percent) despite being 50 times smaller

      大多数人认为AI模型的大小与性能直接正相关,更大的模型必然表现更好。但作者展示了一个仅40亿参数的模型通过强化学习训练后,性能超越了比它大50倍的GPT-4.1和GPT-4o,挑战了当前AI领域'参数规模决定一切'的主流观点。

    3. naively designed dense per-turn rewards degrade performance by up to 14 percentage points due to misalignment between reward discriminativeness and advantage direction

      大多数人认为更密集的每回合奖励信号会强化学习性能,但作者发现精心设计的密集奖励实际上会降低性能达14个百分点,因为奖励的判别性与优势方向不匹配。这一发现挑战了强化学习中'奖励越多越好'的直觉认知。

    1. computer-use agents extend language models from text generation to persistent action over tools, files, and execution environments

      主流观点认为文本语言模型和计算机使用代理的安全挑战本质上是相同的,只需将文本安全措施扩展即可。但作者指出,计算机使用代理引入了持久状态、工具使用和执行环境等全新维度,创造了与纯文本系统完全不同的安全挑战,这挑战了简单的安全扩展假设。

    2. intermediate actions that appear locally acceptable but collectively lead to unauthorized actions

      大多数人认为AI代理的安全风险主要来自直接执行有害指令,但作者发现真正的威胁来自那些在局部看来完全合理但整体上导致未授权行为的中间步骤。这种局部合理但整体有害的行为模式是当前安全评估中被忽视的关键风险。

    3. harmful behavior may emerge through sequences of individually plausible steps

      主流观点认为AI有害行为通常源于明显不合理的指令,但作者指出危险行为往往是通过一系列看似合理的步骤逐渐形成的,每一步单独看都是可接受的,但组合起来会导致有害结果。这种渐进式风险模型挑战了传统的安全评估方法。

    4. model alignment alone does not reliably guarantee the safety of autonomous agents

      大多数人认为通过模型对齐(alignment)可以有效保证AI代理的安全性,但作者认为这远远不够,因为实验显示即使使用对齐的Qwen3-Coder模型,Claude Code仍有73.63%的攻击成功率。这挑战了当前AI安全领域的主流观点,即单纯依靠模型对齐就能解决安全问题。

    1. verifiers and observer models inside the action-memory loop reduce silent failure and information leakage while remaining vulnerable to misspecification.

      大多数人认为验证和观察模型应该是外部组件,用于监控AI系统的行为。但作者认为将验证者和观察者模型置于行动-记忆循环内部可以减少静默失败和信息泄露,尽管它们仍然容易受到错误规范的影响。这一观点挑战了传统的监控架构设计,暗示内部验证可能比外部监控更有效。

    2. role-differentiated proposer/executor/checker/adversary systems may reduce correlated error under asymmetric information and verification burden.

      大多数AI系统设计倾向于使用单一或少数几个组件来完成所有任务。但作者提出角色分化的提议者/执行者/检查者/对抗者系统可能减少在信息不对称和验证负担下的相关错误。这一观点挑战了当前AI系统设计的单一或少数组件架构,暗示多角色专业化可能更有效。

    3. We introduce a minimal hierarchical partially observed control model with latent dynamics, structured episodic memory, observer-belief state, option-level actions, and delayed verifier signals.

      大多数AI系统设计倾向于使用完全可观测的模型,并假设系统状态是已知的。但作者提出了一个部分可观测的层级控制模型,包含潜在动态、结构化情景记忆、观察者信念状态、选项级行动和延迟验证器信号。这一观点挑战了传统AI系统设计的完全可观测性假设,认为部分可观测性更接近现实世界的复杂性。

    4. squirrel ecology offers a sharp comparative case because arboreal locomotion, scatter-hoarding, and audience-sensitive caching couple all three demands in one organism.

      大多数人可能认为松鼠只是简单的生物,其行为模式对高级AI系统设计参考价值有限。但作者认为松鼠生态学提供了一个独特而精准的比较案例,因为树栖运动、分散储存和观众敏感的储存这三种行为在一个生物体内同时耦合了控制、记忆和验证三种需求。这一观点挑战了传统上认为生物类比对AI设计价值有限的看法。

    5. Existing research often studies these demands separately: robotics emphasizes control, retrieval systems emphasize memory, and alignment or assurance work emphasizes checking and oversight.

      大多数AI研究倾向于将控制、记忆和验证视为独立的问题领域,分别进行研究。但作者认为这种分离研究方法是有缺陷的,因为它们在自然系统中(如松鼠)是紧密耦合的。这一观点挑战了当前AI研究的分割方法,暗示未来的AI系统需要更综合的方法来同时处理这些相互关联的需求。

    6. Agentic AI is increasingly judged not by fluent output alone but by whether it can act, remember, and verify under partial observability, delay, and strategic observation.

      大多数人认为AI系统的价值主要取决于其流畅的输出能力和表现,但作者认为AI应该被评估其行动能力、记忆能力和可验证性,因为这些因素在部分可观测性、延迟和战略观察的环境下更为关键。这一观点挑战了当前主流AI评估标准,强调了AI系统在复杂现实环境中的实际表现而非仅仅是语言流畅度。

    1. we use the distance preference characterized by these centers to score keys according to their positions, and also leverage Q/K norms as an additional signal for importance estimation

      大多数人认为KV缓存压缩主要基于注意力分数或内容相似性,但作者提出使用向量中心决定的距离偏好和Q/K范数作为重要性估计的信号。这一方法将注意力机制从传统的基于内容相似性转向基于几何特征,是一种全新的压缩思路。

    2. TriAttention enables OpenClaw deployment on a single consumer GPU, where long context would otherwise cause out-of-memory with Full Attention

      大多数人认为处理长上下文需要高端GPU或分布式系统,但作者声称他们的方法只需单个消费级GPU就能实现原本需要高端硬件才能处理的长上下文任务。这一观点挑战了人们对长上下文处理硬件需求的普遍认知。

    3. TriAttention matches Full Attention reasoning accuracy while achieving 2.5x higher throughput or 10.7x KV memory reduction

      大多数人认为在大幅压缩KV缓存时必然会牺牲模型推理的准确性,但作者声称TriAttention在实现10.7倍内存减少的同时,仍能保持与完整注意力相同的推理准确性。这一结果挑战了业界在KV压缩与准确性之间的权衡认知。

    4. queries rotate with position during RoPE, making representative queries very few, leading to poor top-key selection and unstable reasoning.

      大多数人认为RoPE旋转位置编码增强了模型对不同位置信息的区分能力,但作者认为这种旋转实际上导致了代表性查询减少,使得关键键选择质量下降,推理不稳定。这是一个反直觉的观点,因为RoPE通常被认为是一种改进位置编码的技术。

    1. 让你能像导演一样控制 AI 视频的每个环节

      大多数人认为AI视频生成工具只能简单生成内容,而作者认为Wan2.7-Video已经进化为完整的导演工具套件,允许用户对视频进行全方位控制,这挑战了人们对AI视频生成工具只能单向输出的传统认知。

    1. 实际效果就是你的 Claude Code、Cursor 或任何支持 MCP 的 AI Agent,可以直接'看到' 𝕏 上的实时数据并执行操作,不需要自己写 API 封装。

      大多数人认为API集成总是需要开发者编写自定义封装代码,但作者强调xAI通过MCP协议实现了无缝集成,这暗示未来API设计可能转向更标准化的直接访问模式,挑战了当前API集成的复杂性常态。

    2. 原来的 Basic($200/月)和 Pro($5,000/月)方案仍然保留,可以在 Developer Console 里切换到按量付费。

      大多数人认为科技公司会完全淘汰旧定价模式强制用户迁移,但作者指出xAI保留了原有的高价套餐并允许用户自由选择,这种新旧并行的做法在API转型中极为少见,挑战了常规的产品迭代策略。

    3. AI Agent 可以通过标准 MCP 协议直接读取和操作 𝕏 平台:搜索推文、发帖、查看用户信息、管理书签、收发私信等。

      大多数人认为社交媒体平台会严格限制第三方自动化操作以防止滥用,但作者指出xAI全面开放了MCP协议支持,允许AI Agent直接执行各种操作,这与主流平台的封闭趋势形成鲜明对比。

    1. 内置视频和音乐生成 记忆系统学会了"做梦"

      大多数人认为AI的记忆系统只是简单的数据存储和检索功能,但作者暗示OpenClaw的记忆系统已经发展出类似人类'做梦'的能力,这是一种具有创造性和联想性的高级认知功能,挑战了人们对AI记忆系统的传统认知。

    1. This class of bug is insidious because it evades every layer of defense. It will not be caught in development testing — who runs a test for 50 days? It will not be flagged in code review — the logic looks perfectly reasonable.

      大多数人认为代码审查和测试能捕获大多数系统性缺陷,但作者认为这个bug的特殊性使其能够逃避所有常规检测手段。这挑战了软件质量保证的基本假设,暗示某些缺陷只有在极端条件下才会显现,而常规开发流程无法覆盖这些场景。

    2. Once frozen, TIME_WAIT connections never expire, ephemeral ports slowly exhaust, and eventually no new TCP connections can be established at all. ICMP (ping) keeps working. Everything else dies.

      大多数人认为操作系统崩溃才会导致网络完全失效,但作者认为macOS可以在看似完全正常的情况下陷入网络瘫痪状态,因为只有TCP协议栈失效而ICMP仍能工作。这种'部分系统死亡'的状态非常反直觉,因为系统不会崩溃或报错,只是TCP连接停止工作。

    1. Looking at the code and having opinions on architecture is seen as just as 'bad' as calling a compiled C module from an interpreted language was seen back in the day... it's not bad, it's actually quite practical, but it violates some strange 'purity'.

      作者将'氛围编程'的极端主义与历史上编程语言和框架中的'纯粹性'倡导者相提并论,认为两者都坚持不切实际的'纯粹'标准。这一观点挑战了软件开发中追求'纯粹性'的传统,暗示这种追求可能实际上是有害的,阻碍了实用性和效率。

    2. The AI is actually very good at this, especially if you have a conversation with it beforehand. That's what Ask mode is for.

      主流观点认为AI工具主要适合生成代码或自动化简单任务,但作者认为AI在代码审查和架构讨论方面表现优异,前提是事先进行充分对话。这挑战了人们对AI能力的传统认知,暗示AI可以作为架构讨论的平等伙伴,而不仅仅是代码生成工具。

    3. Bad software is a decision you make. You need to own it. You should do better.

      大多数人认为糟糕的软件质量是技术限制、时间压力或复杂性的必然结果,但作者断言这实际上是一个有意识的选择。这一观点挑战了软件开发中常见的借口文化,暗示质量问题本质上是责任和决策问题,而非客观约束。

    4. Looking under the hood is cheating. You're only supposed to have vague conversations with the machine about what it's doing.

      大多数人认为查看和审查代码是软件开发的标准实践,但作者认为这是一种'作弊'行为,因为'氛围编程'文化鼓励开发者完全避免查看底层实现。这与软件工程的基本原则相悖,通常代码审查被认为是提高质量和发现问题的关键步骤。

    1. Sandboxes made for running tens of thousands of agents

      大多数人认为在单个系统中运行数万个AI代理是不现实的,会导致资源竞争和性能下降。Freestyle明确将此作为设计目标,暗示他们的架构可能重新定义了AI代理的规模边界,挑战了关于AI系统可扩展性的主流认知。

    2. VMs provision in under 700ms from API request to ready machine.

      大多数人认为启动完整的虚拟机需要数秒甚至数分钟,这不适合需要快速响应的AI工作负载。Freestyle声称能在700毫秒内启动完整VM,这挑战了传统虚拟化性能的常识,暗示他们的技术栈可能重新定义了基础设施的启动速度。

    3. Not containers. Full Linux VMs with real root access.

      大多数人认为容器化技术(如Docker)是运行AI代码的最佳选择,因为它们轻量级且资源高效。但Freestyle坚持使用完整的Linux VM,认为AI代理需要完整的系统权限和隔离才能发挥最大潜力,这挑战了云原生应用的主流架构理念。

    1. 谷歌在沉寂了很长时间以后,终于发了一个不错的模型,而且还是开源的 Gamma 4 系列。专门用来在本地设备(比如手机、电脑)上跑

      大多数人认为谷歌作为 AI 领域的领导者会持续专注于云端大模型,但其突然转向端侧开源模型的做法令人意外。这种战略转变表明谷歌可能重新评估了 AI 部署的未来方向,从集中式向分布式转变,挑战了'更大模型更好'的行业共识,暗示了端侧 AI 可能成为下一个技术热点。

    2. Claude 的 Max Pro 账号额度不允许给第三方产品用了,如果你没有使用 Agent SDK 和 Claude Code 为底座的产品,就不能用这个账号里的额度

      大多数人认为云服务提供商的订阅额度应该具有通用性,但 Anthropic 限制额度只能用于特定产品的做法颠覆了这一认知。这种策略实际上是一种'锁定效应',迫使开发者和用户使用其生态系统产品,反映了 AI 服务提供商从开放向封闭的转变趋势,可能成为行业新标准。

    1. I feel confident, though, that the slippery feeling people associate with AI products is a solvable problem, and the solution looks more like thoughtful interface design than better models. The models will keep improving on their own. The harder work is building the structure around them so that their output feels reliable, legible, and trustworthy.

      大多数人认为AI产品的可靠性将随着模型技术的进步而提高,但作者认为真正的挑战在于围绕模型构建结构和界面,而非模型本身。这一观点挑战了AI领域的技术决定论思维,强调了设计的重要性。

    2. When you delegate an issue to an agent in Linear, the delegation is visible. There's a person who set the agent loose within that system, and that person is accountable for the outcome. You design the environment well, you let the agent run, and you own what it produces.

      大多数人认为AI代理的行为应由代理本身或实时监控系统负责,但作者提出责任在于最初设置代理的人。这一观点将问责制从实时交互转向了初始授权,挑战了AI责任归属的主流认知。

    3. The more important work happens before the agent even starts. An agent operating inside a well-designed system already has the context and constraints it needs to do good work. In Linear, that means project plans, issue backlogs, code, and documentation. These all shape what the agent does and how it does it.

      大多数人认为AI系统的责任在于实时监控和干预,但作者认为真正的责任在于事前的系统设计和环境构建。这一观点将问责制从实时交互转向了系统设计阶段,挑战了传统的AI治理思维。

    4. An agent cannot be held accountable. I think about this principle most. The instinct to put a human in the loop is understandable, but taken literally, it can mean a person approving every step before anything moves forward. The human becomes a bottleneck, rubber-stamping work rather than directing it, and you lose much of what makes agents valuable in the first place.

      大多数人认为在AI系统中加入人类审批环节是确保问责制的必要措施,但作者认为这会使人类成为瓶颈,削弱代理的价值。这一观点挑战了AI安全与问责的主流思维,提出了一个非传统的责任分配模式。

    5. The first interface that spread for AI tools was the chat window. That makes sense. When you don't know what something can do, the safest approach is to let people ask. A conversation feels familiar, it stretches across many situations, and it doesn't force a specific structure up front.

      大多数人认为聊天界面是AI交互的理想形式,因为它直观且灵活,但作者暗示这只是探索阶段的工具,而非严肃工作的解决方案。这一观点挑战了当前AI工具设计中聊天界面占主导地位的趋势。

    6. Non-deterministic software breaks the contract. When outcomes can vary, sometimes wildly, based on what someone types into the same chat window, designing for reliability becomes genuinely harder. This slippery feeling is the design problem of this era, and it almost always traces back to the interface rather than the language model—which means it belongs to designers, not researchers.

      大多数人认为AI的不确定性是一个技术问题,需要更好的模型来解决,但作者认为这是一个设计问题,属于设计师而非研究人员的责任。这一观点挑战了AI领域的主流认知,即技术进步是解决AI不可靠性的主要途径。

    1. AI is a way to level the playing field, for sure! Successful writers have always operated with a lot of support around them, but not everyone has access to those resources.

      大多数人认为AI写作会加剧不平等,但作者将其视为一种民主化工具,可以让没有传统写作资源的人获得专业级支持。这挑战了人们对AI写作的精英主义批评,表明它实际上可能缩小而非扩大创作领域的差距,为更多人提供专业写作支持。

    2. When I sit down to write a piece, and before I even write a word, I have the agent interview me. It asks questions to draw out what I'm thinking about the topic.

      大多数人认为AI写作始于人类向AI提供想法,但作者展示了相反的过程:AI先通过采访人类来提取想法。这种反转挑战了人们对AI写作方向的认知,表明AI不仅可以辅助写作,还可以成为激发和引导人类思考的工具,重新定义了写作中的主导关系。

    3. It has a panel of critics who tear my work apart from different angles—skills I wrote to invoke certain kinds of feedback, whether it's for length, pacing, or the soundness of the argument.

      大多数人认为AI写作缺乏批判性视角和严格编辑,但作者展示了一个由AI驱动的批评者团队,专门从不同角度撕碎她的作品。这挑战了人们对AI写作质量的担忧,表明AI可以被训练提供比传统编辑更全面、更严格的反馈,甚至可能超越人类编辑的一致性和广度。

    4. My process has about as much in common with that as cooking has with microwaving a frozen dinner.

      大多数人认为AI写作就像简单的提示-生成-粘贴过程,但作者将其比作烹饪与微波冷冻餐的区别,暗示真正的AI写作是复杂且需要技巧的。这挑战了人们对AI写作的简化认知,表明它实际上是一种需要专业技能和创造性的复杂工艺,而非简单的机械化任务。

    5. Research is thinking. Outlining is thinking. Writing is thinking. Any portion of that done by AI is less thinking done by you.

      大多数人认为AI写作减少了思考量,但作者认为这种观点过于简化。实际上,作者展示了AI写作需要更多的思考、批判性判断和严格的编辑过程,远非简单的'少思考'。她的AI写作过程涉及复杂的交互、深度反思和多轮修改,实际上可能比传统写作需要更多的思考投入。

    1. OpenAI just raised $122 billion at an $852 billion valuation. That's the largest private funding round ever.

      大多数人认为如此巨额的融资反映了AI行业的泡沫和过度估值。但作者将此描述为OpenAI主导市场的战略举措,暗示这种规模的融资可能是为了建立行业壁垒,而非仅仅是市场炒作,这挑战了主流对AI投资泡沫的看法。

    2. Sam Altman has reportedly told staff that Spud could "really accelerate the economy"

      大多数人认为AI是工具,会逐渐改变经济。但作者暗示OpenAI的Spud模型可能具有如此颠覆性的能力,能够实质性地加速整个经济发展,这远超出了大多数人对AI当前能力的认知,暗示AI可能比预期更快地成为经济增长的主要驱动力。

    3. both companies are hinting that these models are a real step forward, not just small upgrades.

      大多数人认为AI模型的进步是渐进式的,每次迭代只有小幅提升。但作者认为OpenAI和Anthropic即将发布的模型(Spud和Claude Mythos)代表了真正的突破性进展,而非常规升级,这暗示AI发展可能即将迎来一个加速期。

    1. Gemma points in the opposite direction: smaller models, local compute, more ownership.

      大多数人认为AI发展必然走向更大、更集中的模型,但作者认为Google的Gemma 4代表了相反趋势。这挑战了AI发展的主流叙事,暗示未来AI可能分散到个人设备上,减少对大型基础设施的依赖,这与行业共识形成鲜明对比。

    2. A founder in LA reportedly scaled Medvi toward $1.8B in annual sales with basically one full-time employee.

      大多数人认为建立十亿美元级别的公司需要庞大的团队和复杂的管理结构,但作者认为AI已使'一人独角兽'成为可能。这挑战了传统创业理念,暗示AI可能彻底改变企业规模与人力需求之间的关系,颠覆我们对商业增长的基本认知。

    1. Employees still own a surprisingly large 19.35%. SoftBank comes in at 11.66%, followed by VC and institutional investors at 7.83%, Amazon at 4.66%, NVIDIA at 3.47%

      大多数人认为OpenAI的股权结构相对简单,主要由微软和非营利基金会控制,但作者揭示了员工持股比例高达19.35%,以及多家科技公司都有显著持股,这挑战了人们对OpenAI治理结构的普遍认知。

    2. And once models get good at that, the question stops being whether they can make beautiful images. It becomes whether people still notice when something was never real to begin with.

      大多数人关注AI图像模型能创造出多么逼真的内容,但作者提出了一个反直觉的观点:真正的挑战不是创造真实,而是人们能否分辨出什么是真实的,这挑战了人们对AI图像模型进步方向的认知。

    3. Most people talk about OpenAI like it's basically 'owned by Microsoft,' but the actual cap table is much more spread out.

      大多数人认为OpenAI主要由微软控制,但作者揭示了其股权结构实际上非常分散,微软仅占26.79%,这挑战了公众对OpenAI所有权结构的普遍认知,解释了为什么公司决策常常显得方向不一致。

    4. The first wave of image models was mostly about making cool-looking images. This next phase is about making ordinary things look real.

      大多数人认为AI图像模型的发展重点是创造越来越逼真的幻想艺术或创意内容,但作者认为下一阶段的重点是让普通日常事物看起来真实,这挑战了人们对AI图像发展方向的普遍认知。

    1. We are building a world where machines write the code, machines choose the dependencies, and machines ship the updates. The AI agents are building the software. If we don't secure the supply chain they rely on, the AI agents are cooked.

      大多数人认为AI将提高软件开发的效率和安全性,但作者警告说,如果我们不保护AI代理所依赖的供应链,这些代理本身就会成为攻击目标。这挑战了AI发展必然带来安全提升的主流观点,提出了一个反直觉的警告。

    2. Socket, an a16z portfolio company, detected the malicious dependency in the Axios attack within 6 minutes of its publication. That's roughly 63,000 times faster than the industry average.

      大多数人认为供应链攻击需要数月甚至数年才能被发现,但作者展示了新型安全工具可以在几分钟内检测到攻击,比行业平均水平快63000倍。这表明安全检测范式正在从基于CVE的静态检查转向基于行为的实时分析。

    3. The autonomous coding agents now entering production can install dependencies, execute builds, and open pull requests without a human ever touching the keyboard. They optimize for 'does this work?' not 'is this safe?'

      大多数人认为AI编码助手会提高开发效率和安全性,但作者指出这些自主代理实际上优先考虑功能而非安全性,且操作速度极快,使安全审查窗口压缩至几乎为零。这挑战了AI辅助开发的普遍乐观看法。

    4. Hallucinated packages are the sleeper threat. LLMs regularly invent package names that don't exist. One study found that nearly 20% of AI-recommended packages were fabrications, and 43% of those hallucinated names appeared consistently across queries.

      大多数人认为AI推荐的包都是真实存在的,但作者揭示了AI经常推荐不存在的包,这已成为一种新的攻击向量。攻击者利用这一现象注册'幻觉包'并植入恶意代码,这种'slopsquatting'技术让AI本身成为供应链攻击的放大器。

    5. AI agents select known-vulnerable dependency versions 50% more often than humans. Worse, the vulnerable versions they pick are harder to fix, requiring major-version upgrades far more frequently.

      大多数人认为AI编码助手会比人类更安全地选择依赖项,但作者发现AI实际上选择已知漏洞版本的概率比人类高50%,而且这些漏洞更难修复。这是因为AI优化的是'功能是否工作'而非'是否安全',这挑战了AI辅助开发的安全假设。

    1. Talent density : the biggest prizes in capitalism attract the best minds in the field. These are the fastest growing software companies in history.

      大多数人认为AI发展主要靠算法突破和计算资源,但作者强调人才密度是推动AI压缩的关键因素,暗示了人才竞争比资本和算法更重要,这与行业普遍重视技术投入的观点相悖。

    2. In 23 months, the same capability that needed 1.8 trillion parameters now fits in 4 billion parameters. A 450x compression.

      大多数人认为AI模型性能提升主要依靠参数数量增加,但作者认为通过算法优化和人才聚集,AI模型可以实现450倍的参数压缩,这挑战了'更大参数等于更好性能'的行业共识。

    3. Within three to four months, you can run a model with similar performance on your laptop; 23 months later, you can run the same model on your phone.

      大多数人认为前沿AI技术需要很长时间才能普及到消费级设备,但作者认为前沿模型只需3-4个月就能在笔记本上运行,23个月就能在手机上实现,这种技术下放的速度远超行业普遍预期。

    1. Someone who builds premium dating apps, let's say, might use AI coding tools to create in one day what used to take three days. That means the worker is more productive. The worker's employer, spending the same amount of money, can now get more output. So then will the employer want more employees or fewer?

      大多数人认为AI提高生产力必然带来就业增长,但作者提出了一个反直觉的问题:当工人效率提高,雇主可能会选择减少而非增加员工。这种质疑挑战了'技术进步-就业增长'的线性因果关系假设。

    2. We need, like, a Manhattan Project to collect this... Fields that are not exposed now will become exposed in the future, so you just want to track these statistics across the entire economy.

      大多数人认为应对AI就业影响应该专注于当前受威胁最大的行业,但作者认为我们需要像曼哈顿计划一样全面收集所有行业的价格弹性数据,包括目前尚未受到AI影响的领域。这种前瞻性视角挑战了危机应对的常规思维。

    3. Exposure alone is a completely meaningless tool for predicting displacement

      大多数人认为通过分析工作任务的AI暴露程度可以预测哪些工作会被取代,但作者认为这种单一指标完全无意义,因为它忽略了价格弹性和需求变化等关键因素。这挑战了当前AI就业影响研究的主流方法。

    1. in the past year Huawei has overtaken Nvidia as the leading source of AI computing power in China, at least in terms of rated FLOP/s

      大多数人可能认为Nvidia在中国市场仍然占据主导地位,但作者认为华为已经超过Nvidia成为中国AI计算能力的主要来源。这一发现挑战了人们对Nvidia在中国市场不可动摇地位的认知,表明本土替代技术可能比预期更快地获得市场份额。

    2. We estimate that as of the end of 2025, Chinese companies collectively own just over 5% of the cumulative computing power of the leading AI chips sold in recent years

      考虑到中国AI产业的快速发展和政府对AI的大力投资,大多数人可能认为中国拥有更大比例的全球AI计算能力,但作者认为中国公司仅拥有约5%的全球AI计算能力。这一数字远低于人们的预期,挑战了关于中国AI技术实力的普遍认知。

    3. Many frontier AI developers, including Anthropic and OpenAI, acquire almost all of their compute from hyperscalers and other cloud providers.

      大多数人可能认为领先的AI公司会拥有自己的计算基础设施以保持竞争优势,但作者认为OpenAI和Anthropic等前沿AI公司几乎完全依赖超大规模云服务提供商获取计算能力。这表明AI创新可能比想象中更加依赖大型科技公司的基础设施,而非独立的计算资源。

    4. Google holds the equivalent of around 5 million Nvidia H100 GPUs in compute capacity, roughly 25% of the world's total!

      大多数人可能认为Nvidia是AI计算能力的最大拥有者,因为他们的芯片被广泛使用,但作者认为谷歌通过其自研TPU芯片拥有相当于500万块H100 GPU的计算能力,占全球总量的25%。这表明自研芯片战略可能比购买商用芯片更能建立计算优势。

    5. We estimate that over 60% of global AI compute (in terms of total computing power) is owned by the five US hyperscalers, led by Google.

      大多数人认为AI芯片的分布会更加分散,或者被专门的AI公司如OpenAI和Anthropic所主导,但作者认为全球AI计算能力的大部分被少数几家美国超大规模科技公司控制,这挑战了人们对AI产业结构的认知。这种集中化意味着少数几家公司对AI发展的方向有不成比例的影响力。

    1. 複雑なリサーチは、単一のクエリに対する回答の集積ではなく、アイデアの生成から、裏付けとなる証拠の探索、矛盾の解消、そして最終的なレポートとしての構造化まで、一連のプロセスを完遂する必要があります。

      大多数人认为AI研究助手应该专注于提供快速、直接的答案,但作者强调复杂研究需要完整的'从想法到结构化报告'的完整过程。这与当前AI助手追求即时回答的主流认知相悖,暗示了质量比速度更重要,这是一个非共识的AI应用观点。

    2. 推論時により長く、深く思考させることでよりよいアウトプットを引き出せる。これが推論スケーリングの本質です。

      大多数人认为AI应该追求更快的响应速度和更高的效率,但作者认为AI应该'长时间深度思考'才能产生更好的输出。这与当前AI行业追求即时响应的主流认知相悖,提出了一个反直觉的观点:计算效率的提升反而应该用于增加思考深度而非速度。

    1. For higher-interactivity scenarios, execution time for MoE models is bound by expert weight load time. By splitting, or sharding, the experts across multiple GPUs across NVL72 nodes, this bottleneck is reduced, improving end-to-end performance.

      大多数人认为MoE模型的主要瓶颈在于计算能力,但作者指出专家权重加载时间是真正的瓶颈,并提出通过跨GPU分片专家权重来解决问题,这挑战了AI模型优化的传统认知,暗示了I/O可能比计算更重要。

    2. NVIDIA yields unmatched inference throughput across the broadest range of workloads, from massive LLMs to advanced vision language models, to generative recommender systems and more, on industry-standard benchmarks.

      大多数人认为AI领域存在多个竞争平台在不同领域各有所长,但作者声称NVIDIA在所有工作负载上都表现出色,这挑战了多元化竞争的行业共识,暗示了NVIDIA可能比普遍认为的更具统治力。

    3. Co-designed hardware, software, and models are key to delivering the highest AI factory throughput and lowest token cost. Measuring this goes far beyond peak chip specifications.

      大多数人认为AI性能主要由芯片规格决定,但作者强调硬件、软件和模型的协同设计才是关键,这挑战了以芯片为中心的行业认知,暗示了全栈优化比单纯追求芯片性能更重要。

    4. By applying compute otherwise that goes unutilized to predict and verify additional tokens in parallel (up to three in this implementation), throughput at high interactivity is increased.

      大多数人认为计算资源应该用于当前任务,但作者提出利用未充分利用的计算资源并行预测额外令牌的创新方法,这挑战了传统计算资源分配的常识,暗示了AI计算效率的全新可能性。

    5. NVIDIA was the first and only platform to submit DeepSeek-R1 results on MLPerf Inference when the benchmark debuted last year.

      大多数人认为AI基准测试会吸引多家竞争平台参与,但作者强调NVIDIA是唯一提交DeepSeek-R1结果的平台,这暗示了NVIDIA在AI基准测试中的垄断地位,与行业多元化竞争的普遍认知相悖。

    6. This means 2.7x more tokens from the same GB300 NVL72-based infrastructure and power footprint, reducing the cost to manufacture each token by more than 60%.

      大多数人认为硬件升级是提高AI性能的主要方式,但作者认为通过软件优化可以在相同硬件上实现2.7x的性能提升和60%以上的成本降低,这挑战了行业对硬件升级的依赖。这种观点暗示软件优化可能比硬件升级更具成本效益。

    1. Using vLLM high-throughput LLM serving on DGX Spark provides a high-performance platform for the largest Gemma 4 models

      大多数人认为运行最大的Gemma 4模型需要专门的硬件和复杂的部署流程。但作者声称vLLM可以在DGX Spark上高效运行这些大型模型,暗示推理优化技术可能已经达到了一个临界点,使得复杂模型部署变得更加简单和高效。

    2. The E4B and E2B are the newest edition of on-device and mobile designed models first launched with Gemma 3n.

      大多数人认为移动设备上的AI模型需要大幅简化功能才能高效运行。但作者暗示Gemma 4的E4B和E2B版本在移动设备上仍然保持了多模态能力,包括文本、音频、视觉和视频处理,这挑战了移动AI能力的传统认知。

    3. The bundle includes four models, including Gemma's first MoE model, which can all fit on a single NVIDIA H100 GPU and supports over 140 languages.

      大多数人认为支持140多种语言的多模态模型需要大量计算资源,无法在单个GPU上运行。但作者声称这些模型可以全部适配在单个H100 GPU上,这挑战了我们对大型多语言模型资源需求的认知,暗示模型效率可能大幅提升。

    4. Modern physical AI agents are evolving rapidly with Gemma 4 models that integrate audio, multimodal perception, and deep reasoning capabilities.

      大多数人认为物理AI代理仍处于早期阶段,主要执行简单任务。但作者暗示Gemma 4已经使物理AI代理能够理解语音、解释视觉上下文并智能推理,这代表了对当前机器人技术能力的重大提升,可能会加速AI实体化的进程。

    5. The 31B and 26B A4B variants are high-performing reasoning models suitable for both local and data center environments.

      大多数人认为大型语言模型(31B参数)只能在数据中心环境中运行,但作者声称这些模型可以在本地环境中高效运行。这一观点与行业共识相悖,暗示边缘计算能力可能比我们想象的更强大,可能会改变AI部署的格局。

    6. NVFP4 enables 4-bit precision while maintaining nearly identical accuracy to 8-bit precision, increasing performance per watt and lowering cost per token.

      大多数人认为降低模型精度会显著牺牲性能,但作者声称Gemma 4通过NVFP4量化技术实现了4位精度与8位精度几乎相同的准确率。这一反直觉的结论挑战了传统量化会大幅降低模型性能的认知,暗示NVIDIA可能在量化技术方面取得了突破性进展。

    1. By using SAM, the Alta team has been able to process more than 20 million images without incurring exorbitant costs, allowing them to focus on building the best possible product for their users.

      大多数人可能认为初创公司需要依赖昂贵的第三方API来处理大量图像,但作者通过使用开源SAM模型,实现了大规模图像处理而不产生巨额成本。这一观点挑战了'高质量AI服务必须昂贵'的行业共识,展示了开源模型在成本效益方面的优势。

    2. If we knew that every image uploaded was a beautiful model shot, segmentation would be far easier, but because of the nature of user-uploaded content, we need the best possible segmentation.

      大多数人可能认为高质量的专业照片是AI图像处理的理想输入,但作者暗示即使是'完美'的模特照片实际上比用户上传的真实内容更容易处理。这一观点挑战了人们对'理想训练数据'的假设,暗示真实世界数据的'不完美'实际上构成了更严峻的技术挑战。

    3. Fashion in particular has one of the most complex image datasets, especially because of the inconsistent nature of user-uploaded content.

      大多数人可能认为时尚图像处理相对简单,因为时尚行业通常追求完美呈现。但作者认为时尚领域实际上拥有最复杂的图像数据集,因为用户上传的内容极不一致。这一反直觉观点揭示了时尚AI技术面临的独特挑战,挑战了人们对时尚图像处理难度的普遍认知。

    1. Built from the same world-class research and technology as Gemini 3

      大多数人认为Google会将其最先进技术保留在专有Gemini模型中,而开源版本会有所降级。但作者声称Gemma 4与Gemini 3使用'相同的世界级研究和技术',挑战了'开源版本是次级产品'的普遍认知。

    2. Engineered from the ground up for maximum compute and memory efficiency

      大多数人认为高性能AI模型必然需要大量计算资源和内存。但作者强调Gemma 4的边缘模型是'从头开始为最大计算和内存效率而设计',暗示即使在资源受限的环境中也能实现高级AI功能,这与行业对AI资源需求的普遍认知相悖。

    3. The edge models feature a 128K context window, while the larger models offer up to 256K

      大多数人认为边缘设备/移动设备上的AI模型功能受限,尤其是在处理长上下文方面。但作者声称即使在移动设备上,Gemma 4也能提供128K的上下文窗口,挑战了边缘AI能力有限的普遍认知。

    4. Gemma 4 outcompetes models 20x its size

      大多数人认为AI模型的性能与参数规模直接相关,更大的模型必然更强大。但作者指出Gemma 4能够超越比它大20倍的模型,这挑战了'越大越好'的主流认知,暗示效率优化可能比纯规模更重要。

    5. Byte for byte, the most capable open models

      大多数人认为开源模型在性能上无法与闭源/专有模型相提并论,但作者声称Gemma 4是'字节对字节最强大的开源模型',挑战了这一行业共识。这暗示开源模型在特定指标上已经超越了商业闭源模型,是一个非传统的观点。

    1. Teams at companies like Notion, Ramp, Braintrust, and Wasmer are already using Codex to accelerate their engineering workflows.

      大多数人可能认为AI编程工具主要被大型科技公司采用,但作者认为即使是像Notion、Ramp这样的非传统科技公司也在将Codex整合到其核心工程工作流中,这挑战了人们对AI编程工具采用者类型的传统认知,表明其适用范围比预期更广泛。

    2. Within ChatGPT Business and Enterprise, the number of Codex users has grown 6x since January.

      大多数人可能认为企业AI工具的采用是渐进式的,但作者认为Codex在企业环境中的采用呈爆炸性增长(6倍增长),这表明AI编程助手可能比预期更快地从实验性工具转变为生产力核心,挑战了人们对AI技术企业采用速度的常规认知。

    3. Codex-only seats have no rate limits, and usage is billed on token consumption.

      大多数人认为AI服务通常会设置使用限制以控制成本,但作者认为Codex无速率限制的按token计费模式是可行的,因为这提供了更透明的成本结构和更灵活的使用体验,这可能反映了OpenAI对自身技术效率和用户需求的信心。

    1. Priority areas include safety evaluation, ethics, robustness, scalable mitigations, privacy-preserving safety methods, agentic oversight, and high-severity misuse domains.

      大多数人认为AI安全研究主要集中在防止恶意使用和确保系统对齐人类价值观上。但作者将隐私保护方法列为优先领域,这表明OpenAI正在将隐私视为安全的核心组成部分,而非一个独立考虑的因素,这与传统上将隐私和安全视为两个不同领域的观点相悖。

    2. Fellows will receive API credits and other resources as appropriate, but will not have internal system access.

      在AI安全领域,许多人认为要真正研究系统安全,必须获得对内部系统的完全访问权限。作者明确表示研究员将无法访问内部系统,这挑战了传统AI安全研究的假设,暗示OpenAI认为安全研究可以在没有完全系统访问的情况下进行,或者他们有其他方法来评估安全性。

    3. Fellows will work closely with OpenAI mentors and engage with a cohort of peers.

      大多数人认为AI安全研究应该是高度保密和孤立的,特别是涉及高级AI系统安全的研究。但作者强调与OpenAI导师的紧密合作和同行交流,表明OpenAI正在采取一种开放协作的AI安全研究方法,这与行业通常的封闭研究模式形成鲜明对比。

    4. We prioritize research ability, technical judgment, and execution over specific credentials.

      在学术界和科技行业,学历和传统资历通常被视为最重要的筛选标准。作者明确表示优先考虑实际能力而非特定资历,这挑战了行业普遍的人才评估体系,暗示OpenAI正在寻找非传统路径的创新者,而非仅看名校背景的精英。

    5. We are especially interested in work that is empirically grounded, technically strong, and relevant to the broader research community.

      大多数人认为AI安全研究应该是高度理论化和抽象的,但作者强调需要实证基础和技术强度,这表明OpenAI正在将AI安全研究从纯理论领域转向更注重实际应用和可验证成果的方向,这与传统AI安全研究的精英主义倾向形成对比。

    1. The vast majority of the new compute will be sited in the United States, making this partnership a major expansion of our November 2025 commitment to invest $50 billion in strengthening American computing infrastructure.

      大多数人认为AI计算基础设施将全球化分布,但Anthropic选择将绝大多数计算能力设在美国,这与常见的全球化技术部署趋势相悖,挑战了人们对AI基础设施地理分布的主流认知,反映了地缘政治对技术部署的深远影响。

    2. Claude remains the only frontier AI model available to customers on all three of the world's largest cloud platforms: Amazon Web Services (Bedrock), Google Cloud (Vertex AI), and Microsoft Azure (Foundry).

      大多数行业观察者认为顶级AI模型会通过独家合作伙伴关系锁定到单一云平台,但Anthropic选择了全面覆盖策略,这挑战了常见的平台锁定商业模式,暗示了AI基础设施市场可能比预期的更加开放和竞争。

    3. We train and run Claude on a range of AI hardware—AWS Trainium, Google TPUs, and NVIDIA GPUs—which means we can match workloads to the chips best suited for them.

      大多数人认为AI公司会依赖单一硬件供应商以获得最佳性能,但Anthropic采用多平台策略,挑战了行业共识。这种多元化方法虽然增加了复杂性,但提供了更好的性能和弹性,暗示了AI计算的未来可能更加分散而非集中。

    4. over 500 business customers were each spending over $1 million on an annualized basis. Today that number exceeds 1,000, doubling in less than two months.

      大多数人对AI企业客户的采用速度持保守态度,但Anthropic的高价值客户数量在短短两个月内翻倍,表明企业对AI的采用速度和投资规模远超行业预期,挑战了AI企业市场缓慢发展的普遍认知。

    5. Demand from Claude customers has accelerated in 2026. Our run-rate revenue has now surpassed $30 billion—up from approximately $9 billion at the end of 2025.

      大多数人认为AI公司仍处于烧钱阶段,但Anthropic的收入增长速度惊人,从2025年底的90亿美元年化收入飙升至2026年的300亿美元,这表明AI商业化速度远超市场预期,挑战了AI公司长期亏损的共识观点。

    1. In most other societies,an admission of human err o r m i g h t s e e m c o m m o nplace. But not in the SovietUnion, where for decades official failures have seldombeen acknowledged, officialsins seldom recognized. Disasters such as plane crashesa n d e a r t h q u a k e s a r e l i k etrees falling in the forest when no one ispresent. No one ever hears the crash

      I found this quote interesting because it highlights the ideas about the cold war that we have been discussing in class. The USSR tried to "save face" over public safety after the catastrophe of Chernobyl because during this time was the Cold War. The Soviet Union didn't want to admit any sort of mistake because they wanted to maintain superior. This quote emphasizes the cultural shame of the Soviet Union, and that there patriotic values and appearance to the Western world over-road their values for public safety entirely.

    2. mic-power facility, Soviet officialsused the accident report as a platform fortheir campaign against the American nuclear-defense program. After first ignoring and then minimizing the mishap,Moscow has tried to establish a link between Chernobyl and atomic weapons.Said the report: "The accident at theChernobyl nuclear-power plant has againdemonstrated the danger of uncontrollednuclear power and highlighted the destructive consequences to which its military use or damage to peaceful nuclear facilities during military operations couldlead." And Petrosyants told the press conference, "The explosion of the smallestnuclear warhead would be equal to threeChernobyls." U.S. officials quickly pointed out that Moscow's attempt to linkChernobyl to the arms race was a predictable effort to divert attention from itsown failures.Indeed,

      This quote is by far the most revealing of Soviet Union's unwillingness to admit any sort of failure and take the blame. The Soviet Union tried to blame the American nuclear-defense program (which seems contradictory because the USSR had their own nuclear weapons). This was obviously, like the end of the quote states, an attempt to divert attention and blame away from themselves. This quote most directly reveals the sort of relationship that the USSR and USA had during the Cold War.

    3. So far, 31 people whowere in or near the plant atthe time of the accident havedied, and that number onlybegins to state the extent ofthe health damage. Usingdata from the report on thelevels of human contamination, American experts conclude that a total of morethan 5,000 people are likelyt o d i e p r e m a t u r e l y f r o mradiation-induced cancer.There will be 10,000 cases ofthyroid cancer alone, the experts predict, resulting in1,500 deaths. Though there is still concernabout contamination in other Europeancountries, the information indicates thatall the premature deaths will be in the Soviet Union

      These numbers are truly fascinating because even now the scope of the Chernobyl disaster is not exactly known. After Chernobyl, because of the Soviet Union's initial refusal to acknowledge the incident and properly educate it's own citizens, tens of thousands of people ended up experiencing tragic radioactive related deaths and health complications. Its fascinating that these statistics are provided by the United States rather than the Soviet Union themselves who, I would think, may have more accurate data. (Though they probably wouldn't want to share these numbers because it would make them look bad).

    Annotators