Hypothesis

100 Matching Annotations

Jun 2026
enpire-research.github.io enpire-research.github.io

ENPIRE：现实世界中的智能体机器人策略自我改进 --- ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

1
1. fxp007 18 Jun 2026
  
  in Public
  
  We evaluate the physical autoresearch capability of three coding agents: Codex with GPT-5.5, Claude Code with Opus 4.7, and Kimi Code with Kimi K2.6
  
  这个三方对比是论文里最有竞争情报价值的部分。值得注意的是评估框架：不是问哪个模型写的代码更好，而是问哪个编程智能体在有限时间内能把机器人策略的成功率提升得更高——这是一个端到端的、以物理世界结果为标准的评估。这类基准比纯代码生成基准更接近真实价值，也更难被单纯的参数规模优势所碾压。
  
  编程智能体对比 GPT-5.5 物理基准
Visit annotations in context

Tags

GPT-5.5

物理基准

编程智能体对比

Annotators

fxp007

URL

enpire-research.github.io/
forum.effectivealtruism.org forum.effectivealtruism.org

animal welfare has an evidence problem

14
1. unjournal 09 Jun 2026
  
  in Public
  
  I think that alternative proteins may fall into this bucket. While it doesn’t meet my personal donation bar for an evidence-based intervention, I understand why it has absorbed large amounts of philanthropic money earmarked for animal welfare. It might just eventually work.
  
  GPT5.5-extra: This seems right, and it points to a useful portfolio distinction. Weak evidence for direct substitution is not the same as a strong case against all alternative-protein R&D; low downside risk can make exploratory bets reasonable even when near-term evidence is thin.
  
  The decision-relevant question is not just "is this evidence-based now?" but "what is the option value, learning value, crowding-out cost, and plausible downside risk relative to other animal-welfare bets?"
  
  gpt evidence-audit
2. unjournal 09 Jun 2026
  
  in Public
  
  We (the EA animal welfare community) should be using significant resources to generate useful evidence. But simply funding the broad field of animal welfare science is likely to create scattered research results that are difficult to translate into action.
  
  GPT5.5-extra: I agree with this direction. A useful design distinction is between (1) decision-question elicitation, (2) study design and data collection, (3) independent review/replication, and (4) translation into funding or implementation decisions.
  
  "Taking ownership" should not mean collapsing these roles into a single interested evaluator. The highest-value infrastructure may be a pipeline with clear handoffs, independent review, public evaluation packages, and explicit decision thresholds.
  
  gpt evidence-audit
3. unjournal 09 Jun 2026
  
  in Public
  
  All three intervention examples I give have been extensively discussed, recommended by charity evaluators, and financed by major funders.
  
  GPT5.5-extra: This is an important point, but I would separate "widely recommended/funded" from "best marginal opportunity." A field can converge on interventions because they are legible, scalable, fundable, or coalition-friendly, not only because they have the strongest evidence.
  
  That distinction matters for diagnosis: the problem may be less "people made bad calls" and more "the field lacks institutions that can generate, audit, and translate decision-relevant evidence before interventions become large funding defaults."
  
  gpt evidence-audit
4. unjournal 09 Jun 2026
  
  in Public
  
  R&D should be the #1 priority right now, ahead of scaling existing work.
  
  GPT5.5-extra: This is plausible, and I am sympathetic, but the post does not fully close the argument for "#1 priority." To make the case, I would want an explicit value-of-information comparison: how much better could targeted R&D make future funding decisions, how soon, at what cost, and relative to the best marginal scaling opportunities?
  
  This is exactly the kind of question that seems well suited to animal-welfare evidence infrastructure: pre-specified decision-relevant questions, rapid review, and evaluation packages rather than scattered academic outputs.
  
  gpt evidence-audit
5. unjournal 09 Jun 2026
  
  in Public
  
  The report is a few years old now, but even then we already had multiple strong studies showing that the availability of equivalent substitutes does not lead to major changes in meat consumption.
  
  GPT5.5-extra: I would soften "equivalent substitutes" here. Peacock's review is mainly about current plant-based meat and hypothetical improvements in price/taste/convenience; it does not establish that actually equivalent future substitutes, cultivated meat, or precision-fermentation products would fail to substitute. It is stronger as evidence that current plant-based availability and many current interventions have weak or uncertain effects.
  
  Source: https://rethinkpriorities.org/research-area/price-taste-and-convenience-competitive-plant-based-meat-would-not-currently-replace-meat/
  
  gpt evidence-audit
6. unjournal 09 Jun 2026
  
  in Public
  
  However, we have little evidence for this substitution effect.
  
  GPT5.5-extra: I agree with the direction, but would phrase this as "limited and hard-to-interpret evidence" rather than simply "little evidence." There is evidence from current plant-based meat and meat-reduction interventions, but it is hard to generalize to future products or to settings where price/taste/convenience are genuinely competitive.
  
  This is also a measurement problem: substitution across animal products, plant-based products, eating occasions, and longer-run market responses is difficult to estimate with standard demand data.
  
  Related Unjournal/PQ context: https://uj-pba-workshop.netlify.app/
  
  gpt evidence-audit
7. unjournal 09 Jun 2026
  
  in Public
  
  I do not feel comfortable supporting a blanket push for cage-free reforms, given the currently available data on what happens when farms/countries switch.
  
  GPT5.5-extra: This conclusion depends on what "blanket push" means. I would distinguish: (1) poorly managed or abrupt transitions, (2) regulatory/corporate cage-free commitments with weak implementation support, and (3) mature higher-welfare cage-free or free-range standards. The evidence against (1) or (2) is not automatically evidence against (3).
  
  This seems like a good place for a decision-relevant decomposition: mortality, painful morbidity, behavioral deprivation, transition costs, and implementation quality should be modeled separately rather than compressed into "cage-free yes/no".
  
  gpt evidence-audit
8. unjournal 09 Jun 2026
  
  in Public
  
  A counterpoint I sometimes hear is that the difference in mortality between caged and cage-free systems disappears as farmers gain experience with cage-free systems.
  
  GPT5.5-extra: The 2021 meta-analysis does more than gesture at this counterpoint: it analyzes 6,040 flocks and 176M hens and finds mortality in cage-free aviaries decreases with experience; in recent years it finds no significant mortality difference among indoor systems. That does not prove all cage-free transitions improve welfare, but it weakens the inference from early/high-mortality transitions to mature cage-free systems.
  
  Source: Schuck-Paim et al. 2021, https://www.nature.com/articles/s41598-021-81868-3
  
  gpt evidence-audit
9. unjournal 09 Jun 2026
  
  in Public
  
  These aren't sudden, painless deaths. Increased vent pecking itself is also a sign of increased environmental stress. Overall, this suggests that hens in the cage-free systems generally experienced more distress.
  
  GPT5.5-extra: This is a reasonable concern if the CSES aviary result is representative, but I would avoid leaning too hard on this single-farm comparison as evidence about cage-free systems overall. The linked CSES system was one implementation at one commercial site, and the broader Schuck-Paim et al. 2021 dataset suggests mortality in aviaries falls substantially as producers gain experience.
  
  The stronger statement is: transition quality and management can matter a lot, and mortality/vent pecking are important welfare proxies, but this particular comparison does not by itself settle overall cage-free welfare.
  
  Sources: https://www.nature.com/articles/s41598-021-81868-3 and https://welfarefootprint.org/laying-hens/
  
  gpt evidence-audit
10. unjournal 09 Jun 2026
  
  in Public
  
  When shrimp first hit the ice slurry, they perform sudden full-body contractions (tail flips), but this also happens if you first cut their head off (check the supplementary material for a video).
  
  GPT5.5-extra: I read this as evidence that the tail flip is a ventral-nerve-cord reflex, not as a proposal to decapitate shrimp as a slaughter method. Weineck et al. 2018 transected shrimp and found that isolated abdomens began tail-flipping in ice slurry; cutting the segmental roots prevented the movement. Their interpretation is that this is locally neural and involuntary, not driven by higher central brain centers.
  
  So the evidential role is narrow: tail flips alone should not be treated as strong evidence of either conscious distress or insensibility, because a similar movement can be generated after separation from the head. It does not show that ice slurry is humane; it weakens one simple behavioral proxy.
  
  Source: Weineck et al. 2018, section 3.7, https://www.mdpi.com/2076-2615/8/9/158
  
  gpt evidence-audit
11. unjournal 09 Jun 2026
  
  in Public
  
  Additionally, both studies implanted conductive electrodes in the test animals. It is plausible that this significantly affects how current flows through the shrimp's body.I also feel confused about what a signal from an electrode on a heart or a ganglion actually tells us. The plots of the recorded “power” are hard to interpret without a control signal to assess what the noise floor is.
  
  GPT5.5-extra: Clarification after checking: Weineck et al.'s "ECG" was not a mammalian-style electrical ECG. It was an impedance measure of movement or fluid between leads over the heart, so amplitude tracks contraction/flow rather than subjective experience directly. The 2026 Somerville et al. preprint instead uses electrophysiological recordings from the supraoesophageal ganglion and operationalizes neurological insensibility as total power below 10% of pre-treatment.
  
  That is a meaningful proxy for neural activity, not a direct welfare readout. The electrode/current-path concern remains reasonable for external validity, but I would phrase the issue as partial proxies with unclear thresholds/noise floors and weak industrial translation, not as "the signals tell us nothing".
  
  Sources: Weineck et al. 2018 https://www.mdpi.com/2076-2615/8/9/158; Somerville et al. 2026 https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6315379
  
  gpt evidence-audit
12. unjournal 09 Jun 2026
  
  in Public
  
  n conclusion, evidence for electrical stunning is extremely limited and we shouldn't feel comfortable recommending anything with confidence.
  
  GPT5.5-extra: Steelman from the current sources: the evidence does not seem to imply "electrical stunning is bad" or "ice slurry is fine". It implies parameter sensitivity.
  
  The 2026 Somerville et al. preprint reports that lower-voltage/shorter electrical shock followed by cold shock sometimes left substantial neural activity or behavioral response, while higher-voltage/longer shock plus cold shock confirmed insensibility in the no-response group; below-0C ice slurry alone also reduced time to low neural activity. SWP's 2026 review interprets the wider decapod evidence as electrical stunning having greater potential than chilling to induce insensibility, while saying species-specific electrical parameters and operational standards remain uncertain.
  
  Sources: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6315379 and https://www.shrimpwelfareproject.org/humane-stunning-white-paper
  
  gpt evidence-audit
13. unjournal 09 Jun 2026
  
  in Public
  
  We have mixed evidence on whether transitioning egg producers to cage-free improves welfare overall.
  
  GPT5.5-extra: I would narrow this. The record supports "mortality evidence during cage-free transitions is mixed and confounded"; it is less clear that evidence on overall welfare is mixed in the same way.
  
  Schuck-Paim et al. 2021 cover 6,040 flocks and 176M hens and find cage-free aviary mortality falls as producers gain experience, with no recent mortality difference among indoor systems. WFI's welfare model estimates lower cumulative pain in cage-free aviaries than cages, while noting important research gaps. The comment thread also matters: Cynthia Schuck-Paim argues the CSES study is a weak basis for mortality/welfare conclusions because the aviary was poorly managed and not a fair mature-system comparison.
  
  Sources: https://www.nature.com/articles/s41598-021-81868-3 and https://welfarefootprint.org/laying-hens/
  
  gpt evidence-audit
14. unjournal 09 Jun 2026
  
  in Public
  
  We have evidence that the substitution effect of alternative proteins is weak, at best.
  
  GPT5.5-extra: I would soften/generalize this claim. The strongest public evidence I found supports: current plant-based meat and many meat-reduction interventions have weak or uncertain substitution effects.
  
  Peacock's RP report explicitly focuses on plant-based meats because cultivated meats are not widely available; it is not a direct test of future cultivated meat or precision-fermentation products. Green, Smith & Mathur 2025 meta-analyze RCTs of behavior-change interventions and find a very small pooled effect; they also say many promising interventions await rigorous evaluation. That is evidence of a gap and weak current intervention effects, but not yet evidence that all alternative proteins, once genuinely cheaper/tastier/more convenient, would have weak substitution.
  
  Sources: https://rethinkpriorities.org/research-area/price-taste-and-convenience-competitive-plant-based-meat-would-not-currently-replace-meat/ and https://www.sciencedirect.com/science/article/pii/S0195666325003861
  
  gpt evidence-audit
Visit annotations in context

Tags

gpt

evidence-audit

Annotators

unjournal

URL

forum.effectivealtruism.org/posts/svjqgyFuFQ34qSgmw/animal-welfare-has-an-evidence-problem
May 2026
openai.com openai.com

GPT-5.5 Instant: smarter, clearer, and more personalized | OpenAI

2
1. fxp007 07 May 2026
  
  in Public
  
  GPT-5.5 Instant is now the default model in ChatGPT
  
  【洞察】成为「默认模型」是比任何 benchmark 都更重要的事件：数亿普通用户的日常 AI 体验将在毫无感知的情况下全面换代。这是 OpenAI 最强大的竞争护城河——不是技术领先，而是「默认入口」的控制权。所有竞争对手即便技术上追平，也无法改变用户已习惯 ChatGPT 的事实。
  
  GPT-5.5 default-model network-effect insight
2. fxp007 07 May 2026
  
  in Public
  
  52.5% reduction in hallucinations
  
  🤖【令人震惊的数字】幻觉率降低 52.5%——这是 OpenAI 有史以来在单次模型更新中宣称的最大幻觉降幅。更重要的是这发生在医疗、法律等高风险领域。幻觉是 AI 在专业服务场景落地的最大障碍，这个数字若属实，意味着企业 AI 可信度的拐点正在到来。
  
  GPT-5.5 52-percent hallucination enterprise-AI shocking
Visit annotations in context

Tags

shocking

GPT-5.5

default-model

network-effect

hallucination

52-percent

insight

enterprise-AI

Annotators

fxp007

URL

openai.com/index/gpt-5-5-instant/
Apr 2026
nrehiew.github.io nrehiew.github.io

https://nrehiew.github.io/blog/minimal_editing/

1
1. fxp007 23 Apr 2026
  
  in Public
  
  Among the latest frontier models, GPT-5.4 over-edits the most.
  
  大多数人认为GPT-5.4是最先进的模型，但作者指出它在最小化编辑任务上表现最差，这挑战了对其能力的普遍看法。
  
  counterintuitive model-performance gpt-5.4
Visit annotations in context

Tags

counterintuitive

gpt-5.4

model-performance

Annotators

fxp007

URL

nrehiew.github.io/blog/minimal_editing/
openai.com openai.com

https://openai.com/index/introducing-gpt-rosalind/

1
1. fxp007 17 Apr 2026
  
  in Public
  
  That includes ongoing partnerships with national laboratories such as Los Alamos National Laboratory, where we are exploring AI-guided protein and catalyst design, including the ability of AI systems to modify biological structures while preserving or improving key functional properties. Over time, we expect these systems to become increasingly capable partners in discovery—helping scientists move faster from question to evidence, from evidence to insight, and from insight to new treatments for patients.
  
  OpenAI与洛斯阿拉莫斯国家实验室合作AI引导的蛋白与催化剂设计，标志着AI研究从解读文献和实验数据，跃迁到主动分子设计的新阶段。这一转变不仅是工具升级，更是OpenAI向R&D基础设施层战略扩张的意图。通过AI直接参与分子结构设计并保持功能特性，OpenAI正在构建从问题到证据、从证据到洞察、再到治疗方案的完整科研加速闭环，重塑基础研发范式。
  
  gpt-rosalind protein-design los-alamos strategy
Visit annotations in context

Tags

gpt-rosalind

los-alamos

strategy

protein-design

Annotators

fxp007

URL

openai.com/index/introducing-gpt-rosalind/
www.understandingai.org www.understandingai.org

Why it's getting harder to measure AI performance - Understanding AI

1
1. fxp007 09 Apr 2026
  
  in Public
  
  GPT-3.5 — the model that powered the original ChatGPT — could complete tasks that took a human programmer about 30 seconds.
  
  从 GPT-3.5 的 30 秒到 Claude Opus 4.6 的 12 小时，两年内增长了 1440 倍。从 GPT-2 到 GPT-5，任务难度增长了 5400 倍。这个进步速度在人类技术史上几乎没有先例——工业革命历经百年实现劳动效率数十倍提升，而 AI 在五年内实现了数千倍的某种意义上的「认知效率」提升。令人不安的是，这条曲线目前没有任何放缓的迹象。
  
  1440x-growth GPT-3.5 historical-scale exponential-progress
Visit annotations in context

Tags

historical-scale

1440x-growth

exponential-progress

GPT-3.5

Annotators

fxp007

URL

understandingai.org/p/why-its-getting-harder-to-measure
metr.org metr.org

Task-Completion Time Horizons of Frontier AI Models

2
1. fxp007 09 Apr 2026
  
  in Public
  
  Some recent models that don't currently have time horizons: Gemini 3.1 Pro, GPT-5.2-Codex, Grok 4.1
  
  METR 公开列出了「尚未完成评测」的前沿模型，这个透明度本身就令人惊讶。更令人注意的是列表的内容：Gemini 3.1 Pro 和 GPT-5.2-Codex 都榜上有名，说明 METR 的评测能力跟不上模型发布速度。在 AI 能力快速迭代的背景下，「评测滞后」已成为 AI 安全领域的系统性风险——我们对最新最强模型的能力边界，永远处于半盲状态。
  
  evaluation-lag AI-safety-risk transparency Gemini-GPT-Grok
2. fxp007 09 Apr 2026
  
  in Public
  
  on tasks that take a human expert 90 minutes to 3 hours, a GPT-5 agent (with time horizon of around 2 hours and 17 minutes) succeeds 100% of the time for around one-third of the tasks, fails 100% of the time for around one-third of the tasks, and sometimes succeeds and sometimes fails on the remaining third of tasks.
  
  「三分之一全成，三分之一全败，三分之一随机」——这个分布揭示了当前 AI 能力的真实形态：不是一个平滑的能力曲线，而是一个双峰的「能做 / 不能做」分布，中间夹着一个随机带。这意味着给 AI 分配任务时，「试一次」的结果几乎没有参考价值——你需要多次运行才能判断这个任务属于哪个区间。对 AI 产品设计者而言，这个分布是可靠性设计的核心约束。
  
  bimodal-distribution reliability GPT-5 task-prediction
Visit annotations in context

Tags

GPT-5

task-prediction

transparency

evaluation-lag

AI-safety-risk

reliability

Gemini-GPT-Grok

bimodal-distribution

Annotators

fxp007

URL

metr.org/time-horizons/
www.anthropic.com www.anthropic.com

A "diff" tool for AI: Finding behavioral differences in new models

1
1. fxp007 09 Apr 2026
  
  in Public
  
  A "Copyright Refusal Mechanism" feature exclusive to OpenAI's GPT-OSS-20B. It controls the model's tendency to refuse to provide copyrighted material, a behavior absent in the model it was compared against.
  
  OpenAI 的开源模型中存在一个专属的「版权拒绝机制」特征——这意味着版权合规行为是被明确编码进权重的，而非自然涌现的。更令人深思的是：同类竞争模型中不存在这个特征，暗示不同开发者对版权问题的训练决策存在根本性差异。
  
  copyright GPT-OSS behavior-control OpenAI
Visit annotations in context

Tags

copyright

OpenAI

behavior-control

GPT-OSS

Annotators

fxp007

URL

anthropic.com/research/diff-tool
Jan 2026
theaidigest.org theaidigest.org

AI Village

1
1. tonz 06 Jan 2026
  
  in Public
  
  https://web.archive.org/web/20260106104924/https://theaidigest.org/village
  
  Four AI models grouped together as a 'village' and set tasks (like elect a leader). The logs read like slapstick in a way, bumbling forward continuously.
  
  via [[Stephen Downes p]]
  
  ai claude gpt gemini deepseek
Visit annotations in context

Tags

gpt

gemini

claude

deepseek

ai

Annotators

tonz

URL

theaidigest.org/village
simonwillison.net simonwillison.net

2025: The year in LLMs

1
1. tonz 02 Jan 2026
  
  in Public
  
  signature features of GPT-4o in May 2024 was meant to be its multimodal output—the “o” stood for “omni”
  
  o for omni, as in multimodal outputs (text, image, sound?)
  
  gpt openai multimodal
Visit annotations in context

Tags

gpt

openai

multimodal

Annotators

tonz

URL

simonwillison.net/2025/Dec/31/the-year-in-llms/
Feb 2025
epoch.ai epoch.ai

How has DeepSeek improved the Transformer architecture?

1
1. mark.crowley 02 Feb 2025
  
  in Public
  
  Detailed explanation of what DeepSeek model is doing differently to improve performance and training time over ChatGPT.
  
  large-language-models transformers deepseek chat-gpt
Visit annotations in context

Tags

large-language-models

transformers

chat-gpt

deepseek

Annotators

mark.crowley

URL

epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture
Jan 2025
x.com x.com

(1) Deedy on X: "How to pick a LLM (Jan 2025): — Claude Sonnet is my daily driver. Fast, great writing and great code. — o1 / o1 pro for complex reasoning tasks (tough refactor) — Deepseek v3 for fast cheap API / 4-o replacement — Gemini for ultra long context, Flash and video understanding" / X

1
1. pyxelr 09 Jan 2025
  
  in Public
  
  How to pick a LLM (Jan 2025): — Claude Sonnet is my daily driver. Fast, great writing and great code. — o1 / o1 pro for complex reasoning tasks (tough refactor) — Deepseek v3 for fast cheap API / 4-o replacement — Gemini for ultra long context, Flash and video understanding
  
  LLM Claude OpenAI GPT Deepseek Gemini
Visit annotations in context

Tags

LLM

Gemini

OpenAI

Deepseek

GPT

Claude

Annotators

pyxelr

URL

x.com/deedydas/status/1877037554157105356
Sep 2024
hist4805.netlify.app hist4805.netlify.app

Schedule | HIST4805 Artificial Intelligence in/and History

1
1. shawn.graham 06 Sep 2024
  
  in Public
  
  summary
  
  Speaking of summaries, AI worse than humans at summaries studies show.
  
  Succinct reason why by David Chisnall:
  
  LLMs are good at transforms that have the same shape as ones that appear in their training data. They're fairly good, for example, at generating comments from code because code follows common structures and naming conventions that are mirrored in the comments (with totally different shapes of text).
  
  In contrast, summarisation is tightly coupled to meaning. Summarisation is not just about making text shorter, it's about discarding things that don't contribute to the overall point and combining related things. This is a problem that requires understanding the material, because it's all about making value judgements.
  
  gpt ai summaries
Visit annotations in context

Tags

gpt

summaries

ai

Annotators

shawn.graham

URL

hist4805.netlify.app/schedule/
Jun 2024
gamma.app gamma.app

Gamma

1
1. polarislee 11 Jun 2024
  
  in Public
  
  ppt GPT slides chart ai
Visit annotations in context

Tags

chart

ppt

GPT

ai

slides

Annotators

polarislee

URL

gamma.app/
www.codium.ai www.codium.ai

Meaningful Code Tests for Busy Devs | CodiumAI

1
1. polarislee 11 Jun 2024
  
  in Public
  
  unit-test ai code-review code-improve bug-detect code-analyzing vscode jetbrain GPT
Visit annotations in context

Tags

code-review

code-analyzing

GPT

ai

unit-test

bug-detect

code-improve

vscode

jetbrain

Annotators

polarislee

URL

codium.ai/
May 2024
openai.com openai.com

Our approach to data and AI

1
1. TylerRick 09 May 2024
  
  in Public
  
  When we train language models, we take trillions of words, and ask a computer to come up with an equation that best describes the relationship among the words and the underlying process that produced them.
  
  GPT AI
Visit annotations in context

Tags

GPT

AI

Annotators

TylerRick

URL

openai.com/index/approach-to-data-and-ai/
Apr 2024
theaidigest.org theaidigest.org

How fast is AI improving? - AI Digest

2
1. pyxelr 01 Apr 2024
  
  in Public
  
  The same LM can be a much more or less capable agent depending on the enhancements added. The researchers created and tested four different agents built on top of GPT-4 and Anthropic’s Claude:
  
  While today’s LMs agents don't pose a serious risk, we should be on the lookout for improved autonomous capabilities as LMs get more capable and reliable.
  
  GPT OpenAI LLM AI
2. pyxelr 01 Apr 2024
  
  in Public
  
  The latest GPT-4 model from OpenAI, which is trained on human preferences using a technique called RLHFEstimated final training run compute cost: ~$50mModel version: gpt-4-0613
  
  ~$50m = estimated training cost of GPT-4
  
  GPT OpenAI LLM AI
Visit annotations in context

Tags

OpenAI

GPT

AI

LLM

Annotators

pyxelr

URL

theaidigest.org/progress-and-dangers
Mar 2024
Local file Local file

Tove Ditlevsen - Barndommens gade

1
1. saraaskholm 22 Mar 2024
  
  in Public
  
  optimal kapitelinddeling af romanen ''barndommens gade'' til undervisningsbrug imellemtrinne
  
  "optimal"
  
  efterspørger vurdering fra gpt
Tags

efterspørger vurdering fra gpt

Annotators

saraaskholm
Jan 2024
explainextended.com explainextended.com

Happy New Year: GPT in 500 lines of SQL - EXPLAIN EXTENDED

1
1. tonz 09 Jan 2024
  
  in Public
  
  https://web.archive.org/web/20240106230221/https://explainextended.com/2023/12/31/happy-new-year-15/
  
  This seems a very good explainer for how LLMs and GPTs work. And all in 500 lines of sql :D
  
  gpt llm algogens sql
Visit annotations in context

Tags

llm

algogens

gpt

sql

Annotators

tonz

URL

explainextended.com/2023/12/31/happy-new-year-15/
cdn.openai.com cdn.openai.com

gpt-4-system-card.pdf

1
1. mark.crowley 06 Jan 2024
  
  in Public
  
  GPT-4 System CardOpenAIMarch 23, 2023
  
  chat-gpt large-language-models openai system-cards transformers toread reading_group_crowley
Visit annotations in context

Tags

reading_group_crowley

openai

system-cards

large-language-models

toread

transformers

chat-gpt

Annotators

mark.crowley

URL

cdn.openai.com/papers/gpt-4-system-card.pdf
Nov 2023
www.semanticscholar.org www.semanticscholar.org

What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Instruction Tuning

1
1. linxid 16 Nov 2023
  
  in Public
  
  方法：
  
  基础介绍：
  
  考虑到现有模型还没有探索，什么样的Instruction数据集是更有效的，而且什么因素导致了好的Instruction data，暂未有人探索。考虑到这些问题，作者探索什么是好的visual Instruction这个问题。基于这个目标，作者首先对现有的 visual Instruction set进行了评估，目标是发现关键因素。
  
  作者主要从task type和Instruction characteristic两个方面来评估。作者选择了六个典型的Instruction dataset，使用两个典型的BLIP2和MiniGPT-4来评估。根据实验结果，作者发现： 1. 对于task type，视觉推理任务对于提升模型的image caption和quetison answering任务很重要。 2. 对于Instruction characteristic，提升Instruction的复杂度更加有帮助对于提升性能，相比task的多样性，以及整合细粒度的标注信息。
  
  基于上述发现，作者开始构建复杂的视觉推理指令集用于改善模型。
  
  首先最直接的方法是通过chatgpt和gpt4来优化指令集，基于图像的标注。因为指令集跨跨模态的特性，LLMs可能会过于简单甚至包含本来图片中不存在的物体。考虑到上述问题，作者提出了一个系统的多阶段的方法，来自动生成visual Instruction数据集。
  
  输入一张图，根据可以获得标注，caption或者object，作者采用了一种先生成，再复杂化，再在重组的pipeline来生成Instruction。具体的，作者首先，使用特殊的prompt指导prompt来生成一个初始指令。然后使用迭代的方式，复杂化-->验证的方式，来逐步提升Instruction的复杂程度，同时保证质量。最后，将Instruction重组成多种形式，在下游任务重，获得更好的适应性。
  
  前提条件：
  
  视觉指令收集：
  
  任务类型，之前的指令微调的数据集，都是利用带有标注的图片。主要包括一下三个任务类型： 1. Image Caption，生成文本描述 2. VQA任务：需要模型根据问题生成关于图片的回答 3. Visual reasoning：需要模型基于图片内容进行推理。
  
  为了研究任务类型的影响，作者考虑一个最常用的指令微调数据集LLaVA-Instruct。作者将其划分成三个子数据集，LLaVA-Caption, LLaVA-VQA and LLaVA-Reasoning。
  
  指令特性： 指令的特性包括。 * 任务的多样性，已经有工作发现，提升工作的多样性，对于zero-shot能力是有帮助的。可以通过和不同的任务整合来获得此类能力。 * 指令的复杂程度，这是一个被广泛应用的策略，提升LLMs指令集的复杂程度。作者同样使用复杂的多模态做任务，例如，多跳的推理任务，来提升MLLMs的指令遵循能力。 * 细粒度的空间感知。对于MLLMs而言，感知细粒度的空间信息对图片中的特定物体，是必要的。基于这个目标。空间位置的标注可以包括在有文本的指令集中。
  
  GPT 多模态数据生成指令微调 visual Instruction
Visit annotations in context

Tags

指令微调

visual Instruction

GPT

多模态

数据生成

Annotators

linxid

URL

semanticscholar.org/reader/0b3e7b5cbef627b1ceedceadc5f58787f432163b
advancedcommunities.com advancedcommunities.com

Salesforce Launches Einstein GPT: A Revolutionary AI Tool for Business Communication

1
1. y.cherbadzhy 06 Nov 2023
  
  in Public
  
  Salesforce promotes Einstein GPT as the world’s first generative AI tool for CRM. Built on the GPT-3 (Generative Pre-trained Transformer) architecture and integrated in all of Salesforce Clouds as well as Tableau, MuleSoft, and Slack, Einstein GPT is capable of generating natural language responses to customer queries, creating personalized content, and even drafting entire email messages on behalf of sales representatives.
  
  Curious to see how AI automation solutions may complement with the Experience Cloud Products
  
  Einstein GPT Salesforce AI Einstein GPT
Visit annotations in context

Tags

Salesforce AI Einstein GPT

Einstein GPT

Annotators

y.cherbadzhy

URL

advancedcommunities.com/blog/salesforce-announces-einstein-gpt-the-worlds-first-generative-ai-for-crm/
Oct 2023
cdn.openai.com cdn.openai.com

Language Models are Unsupervised Multitask Learners

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  GPT-2 Introduction paper
  
  Language Models are Unsupervised Multitask Learners A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, (2019).
  
  large-language-models nlp machine-learning transformers gpt reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

gpt

machine-learning

reading_group_crowley

nlp

large-language-models

transformers

rdgrp-s23

Annotators

mark.crowley

URL

cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
papers.nips.cc papers.nips.cc

NeurIPS-2020-language-models-are-few-shot-learners-Paper.pdf

1
1. mark.crowley 25 Oct 2023
  
  in Public
  
  GPT-3 introduction paper
  
  large-language-models nlp machine-learning transformers gpt reading_group_crowley rdgrp-s23
Visit annotations in context

Tags

gpt

machine-learning

reading_group_crowley

nlp

large-language-models

transformers

rdgrp-s23

Annotators

mark.crowley

URL

papers.nips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
Aug 2023
www.linkedin.com www.linkedin.com

Posten | Feed | LinkedIn

1
1. re.aman 18 Aug 2023
  
  in Public
  
  This is a game-changer! ChatGPT Plugins are now available to everyone🔥🦾
  
  _0_Earth _1_AI _2_chat-gpt #tips-&-tricks
Visit annotations in context

Tags

_0_Earth

_2_chat-gpt

#tips-&-tricks

_1_AI

Annotators

re.aman

URL

linkedin.com/feed/update/urn:li:activity:7065243812834476032/
www.theverge.com www.theverge.com

OpenAI co-founder on company’s past approach to openly sharing research: “We were wrong”

1
1. re.aman 18 Aug 2023
  
  in Public
  
  OpenAI co-founder on company’s past approach to openly sharing research: ‘We were wrong’
  
  OpenAI on approach to open sharing of research
  
  _0_Earth _1_AI _2_chat-gpt
Visit annotations in context

Tags

_0_Earth

_2_chat-gpt

_1_AI

Annotators

re.aman

URL

theverge.com/2023/3/15/23640180/openai-gpt-4-launch-closed-research-ilya-sutskever-interview
bilge.world bilge.world

How to Fuck Text

1
1. DavidBlue 06 Aug 2023
  
  in Public
  
  As summarized by ChatGPT:
  
  This text explores the concept of "Text Fucking," a form of digital text manipulation primarily focused on Apple platforms. The author discusses their interest in accessibility and their personal authority on the subject. They define "Text Fucking" as the manipulation and destruction of digital text, and they emphasize the potential positive outcomes of this practice. The article covers various applications, including text editing apps, automation tools, Siri Shortcuts, and a text formatting app called "Text Case." The author shares their experience with automation, including automating tasks through tools like IFTTT, and they showcase various Siri Shortcuts they've created for text manipulation purposes. The article also highlights the use of Drafts, a versatile app that supports the author's experimentation with Text Fucking.
  
  GPT
Visit annotations in context

Tags

GPT

Annotators

DavidBlue

URL

bilge.world/text-fuck
Jun 2023
papers.nips.cc papers.nips.cc

NeurIPS-2020-language-models-are-few-shot-learners-Paper.pdf

1
1. mark.crowley 28 Jun 2023
  
  in Public
  
  We use the same model and architecture as GPT-2
  
  What do they mean by "model" here? If they have retrained on more data, with a slightly different architecture, then the model weights after training must be different.
  
  machine-learning transformers gpt ml-practice
Visit annotations in context

Tags

machine-learning

ml-practice

transformers

gpt

Annotators

mark.crowley

URL

papers.nips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
www.nngroup.com www.nngroup.com

ChatGPT Lifts Business Professionals’ Productivity and Improves Work Quality

3
1. brian_hoch 20 Jun 2023
  
  in Public
  
  Examples include press releases, short reports, and analysis plans — documents that were reported as realistic for the type of writing these professionals engaged in as part of their work.
  
  Have in mind the genres tested.
  
  Looking from a perspective of "how might we use such tools in UX" we're better served by looking at documents that UX generates through the lens of identifying parallels to the study's findings for business documents.
  
  To use AI to generate drafts, we'll want to look at AI tools built into design tools UXers use to create drafts. Those tools are under development but still developing.
  
  ai chat-gpt design user experience
2. brian_hoch 12 Jun 2023
  
  in Public
  
  the estimates of how users divided their times between different stages of document generation were based on self-reported numbers
  
  The numbers for how users divided their time may not be reliable as they're self-reported.
  
  Still leaves me curious about the accuracy of reported brainstorming time.
  
  ai chat-gpt
3. brian_hoch 12 Jun 2023
  
  in Public
  
  the productivity and quality improvements are likely due to a switch in the business professionals’ time allocation: less time spent on cranking out initial draft text and more time spent polishing the final result.
  
  This points to AI providing the best time savings in draft generation, which fits with the idea of having the AI generate the drafts based on the professional's queries.
  
  For UX designers, this points to AI in a design tool being most useful when it generates drafts (sketches) that the designer then revises. Where UX deliverables don't compare easily to written deliverables is the contextual factors that influence the design, like style guides or design systems. Design too AI assistants don't yet factor those in, though it seems likely it will, if provided style guides and design systems in a format it can read.
  
  Given a draft of sufficient quality that it doesn't require longer to revise than a draft the designer would create on their own, getting additional time to refine sounds great.
  
  I'm not sure what to make of the reduced time to brainstorm when using AI. Without additional information, it's hard not to assume that the AI tool may be influencing the direction of brainstorming as professionals think through the queries they'll use to get the AI to generate the most useful draft possible.
  
  ai chat-gpt efficiency brainstorming drafting
Visit annotations in context

Tags

drafting

user experience

design

ai

efficiency

brainstorming

chat-gpt

Annotators

brian_hoch

URL

nngroup.com/articles/chatgpt-productivity/
github.com github.com

AntonOsika/gpt-engineer: Specify what you want it to build, the AI asks for clarification, and then builds it.

1
1. TylerRick 14 Jun 2023
  
  in Public
  
  AI GPT
Visit annotations in context

Tags

GPT

AI

Annotators

TylerRick

URL

github.com/AntonOsika/gpt-engineer
Apr 2023
colab.research.google.com colab.research.google.com

Google Colaboratory

1
1. kael 10 Apr 2023
  
  in Public
  
  gpt python neural net wikipedia:en=Neural_network wikipedia:en=Artificial_neural_network
Visit annotations in context

Tags

gpt

wikipedia:en=Artificial_neural_network

wikipedia:en=Neural_network

python

neural net

Annotators

kael

URL

colab.research.google.com/drive/1SiF0KZJp75rUeetKOWqpsA8clmHP6jMg
www.semanticscholar.org www.semanticscholar.org

[PDF] What Can Transformers Learn In-Context? A Case Study of Simple Function Classes | Semantic Scholar

2
1. mshook 01 Apr 2023
  
  in Public
  
  We use a decoder-only Transformer architecture [Vaswani et al., 2017] from the GPT-2family
  
  gpt2 gpt transformer
2. mshook 01 Apr 2023
  
  in Public
  
  a random function f
  
  a random function not many or several
  
  icl transformer gpt2 gpt function
Visit annotations in context

Tags

gpt

function

transformer

icl

gpt2

Annotators

mshook

URL

semanticscholar.org/reader/de32da8f5c6a50a6c311e9357ba16aa7d05a1bc9
Mar 2023
aisnakeoil.substack.com aisnakeoil.substack.com

GPT-4 and professional benchmarks: the wrong answer to the wrong question

3
1. ravenscroftj 21 Mar 2023
  
  in Public
  
  Still, we can look for telltale signs. Another symptom of memorization is that GPT is highly sensitive to the phrasing of the question. Melanie Mitchell gives an example of an MBA test question where changing some details in a way that wouldn’t fool a person is enough to fool ChatGPT (running GPT-3.5). A more elaborate experiment along these lines would be valuable.
  
  OpenAI has memorised MBA tests- when these are rephrased or certain details are changed, the system fails to answer
  
  openai gpt ModelEvaluation
2. ravenscroftj 21 Mar 2023
  
  in Public
  
  In fact, we can definitively show that it has memorized problems in its training set: when prompted with the title of a Codeforces problem, GPT-4 includes a link to the exact contest where the problem appears (and the round number is almost correct: it is off by one). Note that GPT-4 cannot access the Internet, so memorization is the only explanation.
  
  GPT4 knows the link to the coding exams that it was evaluated against but doesn't have "internet access" so it appears to have memorised this as well
  
  openai gpt ModelEvaluation
3. ravenscroftj 21 Mar 2023
  
  in Public
  
  To benchmark GPT-4’s coding ability, OpenAI evaluated it on problems from Codeforces, a website that hosts coding competitions. Surprisingly, Horace He pointed out that GPT-4 solved 10/10 pre-2021 problems and 0/10 recent problems in the easy category. The training data cutoff for GPT-4 is September 2021. This strongly suggests that the model is able to memorize solutions from its training set — or at least partly memorize them, enough that it can fill in what it can’t recall.
  
  OpenAI was only able to pass questions available before september 2021 and failed to answer new questions - strongly suggesting that it has simply memorised the answers as part of its training
  
  llm openai gpt ModelEvaluation
Visit annotations in context

Tags

gpt

llm

openai

ModelEvaluation

Annotators

ravenscroftj

URL

aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks
www.together.xyz www.together.xyz

Announcing OpenChatKit — TOGETHER

1
1. polarislee 13 Mar 2023
  
  in Public
  
  OpenChatKit은 다양한 응용 프로그램을위한 특수 및 범용 챗봇을 모두 생성 할 수있는 강력한 오픈 소스 기반을 제공합니다. 우리는 협력 법과 온 토코교육 데이터 세트를 작성합니다. 모델 릴리스 그 이상으로 이것은 오픈 소스 프로젝트의 시작입니다. 우리는 지역 사회 공헌으로 지속적인 개선을위한 도구와 프로세스를 발표하고 있습니다.Together는 오픈 소스 기초 모델이보다 포괄적이고 투명하며 강력하며 능력이 있다고 생각합니다. 우리는 공개하고 있습니다 OpenChatKit 0.15 소스 코드, 모델 가중치 및 교육 데이터 세트에 대한 전체 액세스 권한이있는 Apache-2.0 라이센스에 따라. 이것은 커뮤니티 중심의 프로젝트이며, 우리는 그것이 어떻게 발전하고 성장하는지 보게되어 기쁩니다!유용한 챗봇은 자연 언어로 된 지침을 따르고 대화 상자에서 컨텍스트를 유지하며 응답을 조정해야합니다. OpenChatKit은이베이스에서 특수 제작 된 챗봇을 도출하기위한 기본 봇과 빌딩 블록을 제공합니다.이 키트에는 4 가지 주요 구성 요소가 있습니다:100 % 탄소 음성 계산에 대한 4,300 만 건 이상의 명령으로 EleutherAI의 GPT-NeoX-20B에서 채팅을 위해 미세 조정 된 명령 조정 된 대용량 언어 모델;작업을 정확하게 수행하기 위해 모델을 미세 조정하는 사용자 정의 레시피;추론시 문서 저장소, API 또는 기타 실시간 업데이트 정보 소스의 정보로 봇 응답을 보강 할 수있는 확장 가능한 검색 시스템;봇이 응답하는 질문을 필터링하도록 설계된 GPT-JT-6B로 미세 조정 된 조정 모델.OpenChatKit에는 사용자가 피드백을 제공하고 커뮤니티 구성원이 새로운 데이터 세트를 추가 할 수 있도록하는 도구가 포함되어 있습니다. 시간이 지남에 따라 LLM을 개선 할 수있는 개방형 교육 데이터 모음에 기여합니다.
  
  OpenChatKit은 다양한 응용 프로그램을위한 특수 및 범용 챗봇을 모두 생성 할 수있는 강력한 오픈 소스 기반을 제공합니다. 우리는 협력 법과 온 토코교육 데이터 세트를 작성합니다. 모델 릴리스 그 이상으로 이것은 오픈 소스 프로젝트의 시작입니다. 우리는 지역 사회 공헌으로 지속적인 개선을위한 도구와 프로세스를 발표하고 있습니다.
  
  Together는 오픈 소스 기초 모델이보다 포괄적이고 투명하며 강력하며 능력이 있다고 생각합니다. 우리는 공개하고 있습니다 OpenChatKit 0.15 소스 코드, 모델 가중치 및 교육 데이터 세트에 대한 전체 액세스 권한이있는 Apache-2.0 라이센스에 따라. 이것은 커뮤니티 중심의 프로젝트이며, 우리는 그것이 어떻게 발전하고 성장하는지 보게되어 기쁩니다!
  
  유용한 챗봇은 자연 언어로 된 지침을 따르고 대화 상자에서 컨텍스트를 유지하며 응답을 조정해야합니다. OpenChatKit은이베이스에서 특수 제작 된 챗봇을 도출하기위한 기본 봇과 빌딩 블록을 제공합니다.
  
  이 키트에는 4 가지 주요 구성 요소가 있습니다:
  
  100 % 탄소 음성 계산에 대한 4,300 만 건 이상의 명령으로 EleutherAI의 GPT-NeoX-20B에서 채팅을 위해 미세 조정 된 명령 조정 된 대용량 언어 모델;
  
  작업을 정확하게 수행하기 위해 모델을 미세 조정하는 사용자 정의 레시피;
  
  추론시 문서 저장소, API 또는 기타 실시간 업데이트 정보 소스의 정보로 봇 응답을 보강 할 수있는 확장 가능한 검색 시스템;
  
  봇이 응답하는 질문을 필터링하도록 설계된 GPT-JT-6B로 미세 조정 된 조정 모델.
  
  AI chatGPT Open source OpenChatKit GPT-NeoX
Visit annotations in context

Tags

AI

chatGPT

GPT-NeoX

Open source

OpenChatKit

Annotators

polarislee

URL

together.xyz/blog/openchatkit
Feb 2023
clementneo.com clementneo.com

We Found An Neuron in GPT-2

1
1. mshook 15 Feb 2023
  
  in Public
  
  The code to reproduce our results can be found here.
  
  https://github.com/UFO-101/an-neuron
  
  https://colab.research.google.com/github/UFO-101/an-neuron/blob/main/an_neuron_investigation.ipynb
  
  code transformer gpt interpretability colab ipynb cool
Visit annotations in context

Tags

gpt

code

interpretability

transformer

ipynb

colab

cool

Annotators

mshook

URL

clementneo.com/posts/2023/02/11/we-found-an-neuron
github.com github.com

disa-lab/CodeDoc_GPT-3_ASE22

1
1. kael 14 Feb 2023
  
  in Public
  
  codex gpt code doc doi:10.48550/arXiv.2209.02235
Visit annotations in context

Tags

doi:10.48550/arXiv.2209.02235

gpt

code

codex

doc

Annotators

kael

URL

github.com/disa-lab/CodeDoc_GPT-3_ASE22
arxiv.org arxiv.org

Automatic Code Documentation Generation Using GPT-3

1
1. kael 14 Feb 2023
  
  in Public
  
  codex gpt code doc doi:10.48550/arXiv.2209.02235
Visit annotations in context

Tags

doi:10.48550/arXiv.2209.02235

gpt

code

codex

doc

Annotators

kael

URL

arxiv.org/abs/2209.02235
platform.openai.com platform.openai.com

OpenAI API

1
1. kael 14 Feb 2023
  
  in Public
  
  codex openai api gpt programming code
Visit annotations in context

Tags

gpt

code

codex

programming

openai

api

Annotators

kael

URL

platform.openai.com/docs/guides/code/introduction
openai.com openai.com

OpenAI Codex

1
1. kael 14 Feb 2023
  
  in Public
  
  codex openai gpt programming code wikipedia:en=OpenAI_Codex
Visit annotations in context

Tags

gpt

wikipedia:en=OpenAI_Codex

code

codex

programming

openai

Annotators

kael

URL

openai.com/blog/openai-codex/
Jan 2023
arxiv.org arxiv.org

2301.11305.pdf

15
1. ravenscroftj 29 Jan 2023
  
  in Public
  
  Figure 3. The average drop in log probability (perturbation discrep-ancy) after rephrasing a passage is consistently higher for model-generated passages than for human-written passages. Each plotshows the distribution of the perturbation discrepancy d (x, pθ , q)for human-written news articles and machine-generated arti-cles; of equal word length from models GPT-2 (1.5B), GPT-Neo-2.7B (Black et al., 2021), GPT-J (6B; Wang & Komatsuzaki (2021))and GPT-NeoX (20B; Black et al. (2022)). Human-written arti-cles are a sample of 500 XSum articles; machine-generated textis generated by prompting each model with the first 30 tokens ofeach XSum article, sampling from the raw conditional distribution.Discrepancies are estimated with 100 T5-3B samples.
  
  quite striking here is the fact that more powerful/larger models are more capable of generating unusual or "human-like" responses - looking at the overlap in log likelihoods
  
  chatgpt detecting gpt
2. ravenscroftj 29 Jan 2023
  
  in Public
  
  if we apply small perturbations to a passagex ∼ pθ , producing ̃x, the quantity log pθ (x) − log pθ ( ̃x)should be relatively large on average for machine-generatedsamples compared to human-written text.
  
  By applying small changes to text sample x, we should be able to find the log probs of x and the perturbed example and there should be a fairly big delta for machine generated examples.
  
  chatgpt detecting gpt
3. ravenscroftj 29 Jan 2023
  
  in Public
  
  As in prior work, we study a ‘white box’ setting (Gehrmannet al., 2019) in which the detector may evaluate the log prob-ability of a sample log pθ (x). The white box setting doesnot assume access to the model architecture or parameters.While most public APIs for LLMs (such as GPT-3) enablescoring text, some exceptions exist
  
  The authors assume white-box access to the log probability of a sample $log p_{\Theta}(x)$ but do not require access to the model's actual architecture or weights.
  
  chatgpt detecting gpt
4. ravenscroftj 29 Jan 2023
  
  in Public
  
  Empirically, we find predictive entropy to be positively cor-related with passage fake-ness more often that not; there-fore, this baseline uses high average entropy in the model’spredictive distribution as a signal that a passage is machine-generated.
  
  this makes sense and aligns with the gltr - humans add more entropy to sentences by making unusual choices in vocabulary that a model would not.
  
  chatgpt detecting gpt
5. ravenscroftj 29 Jan 2023
  
  in Public
  
  We find that supervised detectors can provide similardetection performance to DetectGPT on in-distribution datalike English news, but perform significantly worse than zero-shot methods in the case of English scientific writing andfail altogether for German writing. T
  
  supervised detection methods fail on out of domain examples whereas detectgpt seems to be robust to changes in domain.
  
  chatgpt detecting gpt
6. ravenscroftj 29 Jan 2023
  
  in Public
  
  ex-tending DetectGPT to use ensembles of models for scoring,rather than a single model, may improve detection in theblack box setting
  
  DetectGPT could be extended to use ensembles of models allowing iot to work in black box settings where the log probs are unknown
  
  chatgpt detecting gpt
7. ravenscroftj 29 Jan 2023
  
  in Public
  
  hile in this work, we use off-the-shelfmask-filling models such as T5 and mT5 (for non-Englishlanguages), some domains may see reduced performanceif existing mask-filling models do not well represent thespace of meaningful rephrases, reducing the quality of thecurvature estimate.
  
  The approach requires access to language models that can meaningfully and accurately rephrase (perturbate) the outputs from the model under evaluation. If these things do not align then it may not work well.
  
  chatgpt detecting gpt
8. ravenscroftj 29 Jan 2023
  
  in Public
  
  For models be-hind APIs that do provide probabilities (such as GPT-3),evaluating probabilities nonetheless costs money.
  
  This does cost money to do for paid APIs and requires that log probs are made available.
  
  chatgpt detecting gpt
9. ravenscroftj 29 Jan 2023
  
  in Public
  
  We simulate human re-vision by replacing 5 word spans of the text with samplesfrom T5-3B until r% of the text has been replaced, andreport performance as r varies.
  
  I question the trustworthiness of this simulation - human edits are probably going to be more sporadic and random.
  
  chatgpt detecting gpt
10. ravenscroftj 29 Jan 2023
  
  in Public
  
  Figure 5. We simulate human edits to machine-generated text byreplacing varying fractions of model samples with T5-3B gener-ated text (masking out random five word spans until r% of text ismasked to simulate human edits to machine-generated text). Thefour top-performing methods all generally degrade in performancewith heavier revision, but DetectGPT is consistently most accurate.Experiment is conducted on the XSum dataset
  
  DetectGPT shows 95% AUROC for texts that have been modified by about 10% and this drops off to about 85% when text is changed up to 24%.
  
  chatgpt detecting gpt
11. ravenscroftj 29 Jan 2023
  
  in Public
  
  DetectGPT’s performancein particular is mostly unaffected by the change in languagefrom English to Germa
  
  Performance of this method is robust against changes between languages (e.g. English to German)
  
  chatgpt detecting gpt
12. ravenscroftj 29 Jan 2023
  
  in Public
  
  ecause the GPT-3 API does not provideaccess to the complete conditional distribution for each to-ken, we cannot compare to the rank, log rank, and entropy-based prior methods
  
  GPT-3 api does not expose the cond probs for each token so we can't compare to some of the prior methods. That seems to suggest that this method can be used with limited knowledge about the probabilities.
  
  chatgpt detecting gpt
13. ravenscroftj 29 Jan 2023
  
  in Public
  
  improving detection offake news articles generated by 20B parameterGPT-NeoX
  
  The authors test their approach on GPT-NeoX. The question would be whether we can get hold of the log probs from ChatGPT to do the same
  
  chatgpt detecting gpt
14. ravenscroftj 29 Jan 2023
  
  in Public
  
  his approach, which we call DetectGPT,does not require training a separate classifier, col-lecting a dataset of real or generated passages, orexplicitly watermarking generated text. It usesonly log probabilities computed by the model ofinterest and random perturbations of the passagefrom another generic pre-trained language model(e.g, T5)
  
  The novelty of this approach is that it is cheap to set up as long as you have the log probabilities generated by the model of interest.
  
  chatgpt detecting gpt
15. ravenscroftj 29 Jan 2023
  
  in Public
  
  See ericmitchell.ai/detectgptfor code, data, and other project information.
  
  Code and data available at https://ericmitchell.ai/detectgpt
  
  chatgpt detecting gpt
Visit annotations in context

Tags

detecting gpt

chatgpt

Annotators

ravenscroftj

URL

arxiv.org/pdf/2301.11305.pdf
hedgehogreview.com hedgehogreview.com

Autocomplete

2
1. wiobyrne 04 Jan 2023
  
  in Public
  
  Educators are now administering the Turing test in reverse: What are questions that only humans can answer well? What kinds of thinking does writing make possible for us?
  
  writing gpt-3 turing-test
2. wiobyrne 04 Jan 2023
  
  in Public
  
  GPT-3 threatens to “[undermine] the kind of writing intensive course that had served as the backbone of [his] teaching for two decades.” “I was less worried about whether GPT-3 is genuinely intelligent,” Symons writes, “and more worried about whether the development of these tools would make us less intelligent.”
  
  gpt-3 writing
Visit annotations in context

Tags

gpt-3

turing-test

writing

Annotators

wiobyrne

URL

hedgehogreview.com/web-features/thr/posts/autocomplete
Dec 2022
www.zylstra.org www.zylstra.org

Added #chatGPT to my #Obsidian with t… – Interdependent Thoughts

1
1. chrisaldrich 28 Dec 2022
  
  in Public
  
  https://www.zylstra.org/blog/2022/12/22660/
  
  read GPT-3 chatGPT Obsidian experiments
Visit annotations in context

Tags

read

GPT-3

chatGPT

Obsidian

experiments

Annotators

chrisaldrich

URL

zylstra.org/blog/2022/12/22660/
rewriting.csail.mit.edu rewriting.csail.mit.edu

Rewriting a Deep Generative Model

1
1. mshook 24 Dec 2022
  
  in Public
  
  Our method is based on the hypothesis that the weights of a generator act as Optimal Linear Associative Memory (OLAM). OLAM is a classic single-layer neural data structure for memorizing associations that was described by Teuvo Kohonen and James A Anderson (independently) in the 1970s. In our case, we hypothesize that within a large modern multilayer convolutional network, the each individual layer plays the role of an OLAM that stores a set of rules that associates keys, which denote meaningful context, with values, which determine output.
  
  nn ml memory gpt
Visit annotations in context

Tags

memory

gpt

nn

ml

Annotators

mshook

URL

rewriting.csail.mit.edu/
www.theatlantic.com www.theatlantic.com

The College Essay Is Dead

2
1. wiobyrne 10 Dec 2022
  
  in Public
  
  natural-language processing is going to force engineers and humanists together. They are going to need each other despite everything. Computer scientists will require basic, systematic education in general humanism: The philosophy of language, sociology, history, and ethics are not amusing questions of theoretical speculation anymore. They will be essential in determining the ethical and creative use of chatbots, to take only an obvious example.
  
  gpt-3 language models
2. wiobyrne 10 Dec 2022
  
  in Public
  
  The extraordinary ignorance on questions of society and history displayed by the men and women reshaping society and history has been the defining feature of the social-media era.
  
  gpt-3 writing intelligence
Visit annotations in context

Tags

gpt-3

intelligence

language models

writing

Annotators

wiobyrne

URL

theatlantic.com/technology/archive/2022/12/chatgpt-ai-writing-college-student-essays/672371/
www.jasonwei.net www.jasonwei.net

137 emergent abilities of large language models — Jason Wei

1
1. wiobyrne 10 Dec 2022
  
  in Public
  
  Emergent abilities are not present in small models but can be observed in large models.
  
  Here’s a lovely blog by Jason Wei that pulls together 137 examples of ’emergent abilities of large language models’. Emergence is a phenomenon seen in contemporary AI research, where a model will be really bad at a task at smaller scales, then go through some discontinuous change which leads to significantly improved performance.
  
  gpt-3 machine learning ai emergent abilities capability overhang
Visit annotations in context

Tags

gpt-3

capability overhang

machine learning

emergent abilities

ai

Annotators

wiobyrne

URL

jasonwei.net/blog/emergence
jack-clark.net jack-clark.net

Import AI 310: AlphaZero learned Chess like humans learn Chess; capability emergence in language models; demoscene AI.

1
1. wiobyrne 10 Dec 2022
  
  in Public
  
  Houston, we have a Capability Overhang problem: Because language models have a large capability surface, these cases of emergent capabilities are an indicator that we have a ‘capabilities overhang’ – today’s models are far more capable than we think, and our techniques available for exploring the models are very juvenile. We only know about these cases of emergence because people built benchmark datasets and tested models on them. What about all the capabilities we don’t know about because we haven’t thought to test for them? There are rich questions here about the science of evaluating the capabilities (and safety issues) of contemporary models.
  
  capability overhang ai language models gpt-3
Visit annotations in context

Tags

capability overhang

language models

gpt-3

ai

Annotators

wiobyrne

URL

jack-clark.net/2022/11/28/import-ai-310-alphazero-learned-chess-like-humans-learn-chess-capability-emergence-in-language-models-demoscene-ai/
www.theverge.com www.theverge.com

ChatGPT proves AI is finally mainstream — and things are only going to get weirder

2
1. wiobyrne 10 Dec 2022
  
  in Public
  
  As the metaphor suggests, though, the prospect of a capability overhang isn’t necessarily good news. As well as hidden and emerging capabilities, there are hidden and emerging threats. And these dangers, like our new skills, are almost too numerous to name.
  
  gpt-3 ai capability overhang
2. wiobyrne 10 Dec 2022
  
  in Public
  
  There’s a concept in AI that I’m particularly fond of that I think helps explain what’s happening. It’s called “capability overhang” and refers to the hidden capacities of AI: skills and aptitudes latent within systems that researchers haven’t even begun to investigate yet. You might have heard before that AI models are “black boxes” — that they’re so huge and complex that we don’t fully understand how they operate or come to specific conclusions. This is broadly true and is what creates this overhang.
  
  gpt-3 ai capability overhang
Visit annotations in context

Tags

gpt-3

capability overhang

ai

Annotators

wiobyrne

URL

theverge.com/2022/12/8/23499728/ai-capability-accessibility-chatgpt-stable-diffusion-commercialization
www.theatlantic.com www.theatlantic.com

The End of High-School English

3
1. wiobyrne 10 Dec 2022
  
  in Public
  
  Which is why I wonder if this may be the end of using writing as a benchmark for aptitude and intelligence.
  
  writing gpt-3 ai
2. wiobyrne 10 Dec 2022
  
  in Public
  
  Perhaps there are reasons for optimism, if you push all this aside. Maybe every student is now immediately launched into that third category: The rudiments of writing will be considered a given, and every student will have direct access to the finer aspects of the enterprise. Whatever is inimitable within them can be made conspicuous, freed from the troublesome mechanics of comma splices, subject-verb disagreement, and dangling modifiers.
  
  writing gpt-3 ai
3. wiobyrne 10 Dec 2022
  
  in Public
  
  I’ve also long held, for those who are interested in writing, that you need to learn the basic rules of good writing before you can start breaking them—that, like Picasso, you have to learn how to reliably fulfill an audience’s expectations before you get to start putting eyeballs in people’s ears and things.
  
  writing gpt-3 ai
Visit annotations in context

Tags

gpt-3

writing

ai

Annotators

wiobyrne

URL

theatlantic.com/technology/archive/2022/12/openai-chatgpt-writing-high-school-english-essay/672412/
Nov 2022
www.vice.com www.vice.com

Students Are Using AI to Write Their Papers, Because Of Course They Are

1
1. wiobyrne 26 Nov 2022
  
  in Public
  
  “In literacy education, particularly for developing writers, instructors are looking for the level of desirable difficulty, or the point at which you are working yourself just as hard so that you don’t break but you also improve,” Laffin told Motherboard. “Finding the right, appropriate level of desirable difficulty level of instruction makes their capacity to write grow. So if you are doing compensation techniques that go beyond finding that level of desirable difficulty and instructing at that place, then you’re not helping them grow as a writer.”
  
  writing AI gpt-3
Visit annotations in context

Tags

gpt-3

writing

AI

Annotators

wiobyrne

URL

vice.com/en/article/m7g5yq/students-are-using-ai-to-write-their-papers-because-of-course-they-are
Aug 2022
maggieappleton.com maggieappleton.com

Joining Ought

1
1. chrisaldrich 05 Aug 2022
  
  in Public
  
  https://maggieappleton.com/joining-ought
  
  read Maggie Appleton machine learning natural language processing GPT-3 Elicit Ought
Visit annotations in context

Tags

Maggie Appleton

read

GPT-3

Ought

natural language processing

Elicit

machine learning

Annotators

chrisaldrich

URL

maggieappleton.com/joining-ought
Jun 2022
direct.mit.edu direct.mit.edu

Human Language Understanding & Reasoning

1
1. mshook 14 Jun 2022
  
  in Public
  
  The dominant idea is one of attention, by which a representation at a position is computed as a weighted combination of representations from other positions. A common self-supervision objective in a transformer model is to mask out occasional words in a text. The model works out what word used to be there. It does this by calculating from each word position (including mask positions) vectors that represent a query, key, and value at that position. The query at a position is compared with the value at every position to calculate how much attention to pay to each position; based on this, a weighted average of the values at all positions is calculated. This operation is repeated many times at each level of the transformer neural net, and the resulting value is further manipulated through a fully connected neural net layer and through use of normalization layers and residual connections to produce a new vector for each word. This whole process is repeated many times, giving extra layers of depth to the transformer neural net. At the end, the representation above a mask position should capture the word that was there in the original text: for instance, committee as illustrated in Figure 1.
  
  transformer explanation attention qkv ml nn nlp language gpt good
Visit annotations in context

Tags

gpt

language

good

nlp

transformer

attention

explanation

qkv

nn

ml

Annotators

mshook

URL

direct.mit.edu/daed/article/151/2/127/110621/Human-Language-Understanding-amp-Reasoning
Apr 2022
mayt.substack.com mayt.substack.com

GPT-3 can run code

1
1. pyxelr 05 Apr 2022
  
  in Public
  
  # Input Input: 123, Output: Input: 121, Output: Input: 111, Output: Input: 123454321, Output: Input 123123, Output: # Instruction Output true if input is a palindrome # Output Input: 123, Output: false Input: 121, Output: true Input: 111, Output: true Input: 123454321, Output: true Input 123123, Output: false
  
  Example of using GPT-3 for programming
  
  DataScience programming GPT-3
Visit annotations in context

Tags

DataScience

GPT-3

programming

Annotators

pyxelr

URL

mayt.substack.com/p/gpt-3-can-run-code
Nov 2021
www.lesswrong.com www.lesswrong.com

interpreting GPT: the logit lens - LessWrong

1
1. mshook 20 Nov 2021
  
  in Public
  
  Other work on interpreting transformer internals has focused mostly on what the attention is looking at. The logit lens focuses on what GPT "believes" after each step of processing, rather than how it updates that belief inside the step.
  
  gpt how ml nn transformer belief attention
Visit annotations in context

Tags

gpt

how

belief

transformer

attention

nn

ml

Annotators

mshook

URL

lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens
www.pnas.org www.pnas.org

The neural architecture of language: Integrative modeling converges on predictive processing

1
1. mshook 10 Nov 2021
  
  in Public
  
  These findings provide strong evidence for a classic hypothesis about the computations underlying human language understanding, that the brain’s language system is optimized for predictive processing in the service of meaning extraction
  
  language meaning nn ml brain gpt
Visit annotations in context

Tags

brain

gpt

language

meaning

nn

ml

Annotators

mshook

URL

pnas.org/content/118/45/e2105646118
Jun 2021
www.gnu.org www.gnu.org

GNU GRUB Manual 2.06: BIOS installation

1
1. wenijinew 28 Jun 2021
  
  in Public
  
  When creating a BIOS Boot Partition on a GPT system, you should make sure that it is at least 31 KiB in size.
  
  This is important. If not set this, the OS won't be detected when grub is used with GPT system.
  
  GPT grub
Visit annotations in context

Tags

GPT

grub

Annotators

wenijinew

URL

gnu.org/software/grub/manual/grub/html_node/BIOS-installation.html
www.wired.com www.wired.com

AI Could Soon Write Code Based on Ordinary Language

1
1. sophia.sterckx 21 Jun 2021
  
  in BehSci
  
  Johnson, Khari. ‘AI Could Soon Write Code Based on Ordinary Language’. Wired. Accessed 21 June 2021. https://www.wired.com/story/ai-write-code-ordinary-language.
  
  is:website lang:en AI Artificial Intelligence code ordinary language translation programming language Microsoft OpenAI GPT-3
Visit annotations in context

Tags

is:website

Artificial Intelligence

GPT-3

code

OpenAI

translation

ordinary language

AI

programming language

Microsoft

lang:en

Annotators

sophia.sterckx

URL

wired.com/story/ai-write-code-ordinary-language
Apr 2021
towardsdatascience.com towardsdatascience.com

How To Fine-Tune GPT-2 So You Can Generate Long-Form Creative Writing

1
1. markcmarino 15 Apr 2021
  
  in Public
  
  Writing with AI
  
  GPT-2 AI
Visit annotations in context

Tags

AI

GPT-2

Annotators

markcmarino

URL

towardsdatascience.com/how-to-fine-tune-gpt-2-so-you-can-generate-long-form-creative-writing-7a5ae1314a61
minimaxir.com minimaxir.com

How To Make Custom AI-Generated Text With GPT-2

1
1. markcmarino 15 Apr 2021
  
  in Public
  
  Machine Assisted Writing
  
  GPT-2
Visit annotations in context

Tags

GPT-2

Annotators

markcmarino

URL

minimaxir.com/2019/09/howto-gpt2/
Feb 2021
app.inferkit.com app.inferkit.com

Talk to Transformer – InferKit

1
1. markcmarino 03 Feb 2021
  
  in Public
  
  Test out GPT-2
  
  GPT-2
Visit annotations in context

Tags

GPT-2

Annotators

markcmarino

URL

app.inferkit.com/demo
Jul 2020
ibuildmyideas.substack.com ibuildmyideas.substack.com

I build my ideas #8 - 07/19/20

1
1. TylerRick 23 Jul 2020
  
  in Public
  
  GPT-3 OpenAI AI
Visit annotations in context

Tags

OpenAI

GPT-3

AI

Annotators

TylerRick

URL

ibuildmyideas.substack.com/p/i-build-my-ideas-8-071920

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

Tags

Annotators

URL

Tags

Annotators

URL

方法：

基础介绍：

前提条件：

视觉指令收集：

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators