Hypothesis

10 Matching Annotations

Jun 2026
www.tomtunguz.com www.tomtunguz.com

https://www.tomtunguz.com/local-coding-models/

1
1. fxp007 17 Jun 2026
  
  in Public
  
  Qwen 3.6 35B-A3B dominates model mentions at 33%, followed by the 27B variant at 20%. DeepSeek Pro & Gemma4 31B round out the top four.
  
  这篇文章揭示了本地编码模型的选择趋势，其中Qwen 3.6 35B-A3B成为最受欢迎的选择。对于初学者来说，了解这些主流模型选择很重要，但不应盲目追随趋势，而应根据具体需求、硬件条件和任务类型选择适合的模型。
  
  model-selection beginner-tips
Visit annotations in context

Tags

beginner-tips

model-selection

Annotators

fxp007

URL

tomtunguz.com/local-coding-models/
www.tomtunguz.com www.tomtunguz.com

https://www.tomtunguz.com/golden-age-of-applications/

1
1. fxp007 17 Jun 2026
  
  in Public
  
  Models are tricky. Budgets prevent defaulting everyone to state-of-the-art. The legion of other models each have a personality.
  
  作者详细描述了不同AI模型的特性差异，如Kimi K2.6创意性强但精确度较低，Qwen 3.6性能好但可能中断工作流，GLM 5.1擅长编程但速度较慢。这提醒开发者需要根据具体需求选择合适的模型，而非盲目追求最新或最大的模型，同时要注意预算限制。
  
  model-selection cost-optimization
Visit annotations in context

Tags

cost-optimization

model-selection

Annotators

fxp007

URL

tomtunguz.com/golden-age-of-applications/
May 2026
a16z.com a16z.com

https://a16z.com/avoiding-death-on-the-yellow-brick-road/

1
1. fxp007 27 May 2026
  
  in Public
  
  The labs are already routing internally — different model classes for different requests, ensembles under the hood. What they can't do is route across vendors, or evaluate a competitor's model for a specific sub-task, or use an open-source fine-tune for the narrow piece where it's actually best.
  
  大多数人认为大模型实验室拥有绝对优势，可以解决所有AI问题。但作者认为实验室在模型选择上存在结构性限制，无法跨供应商评估模型或为特定子任务使用开源微调模型。这为专注于特定领域的企业提供了机会，它们可以选择最适合每个子任务的模型，而不仅限于自家实验室的模型。
  
  non-consensus model-selection ai-limitations
Visit annotations in context

Tags

ai-limitations

non-consensus

model-selection

Annotators

fxp007

URL

a16z.com/avoiding-death-on-the-yellow-brick-road/
Apr 2026
artificialanalysis.ai artificialanalysis.ai

APEX-Agents-AA Benchmark Leaderboard | Artificial Analysis

2
1. fxp007 10 Apr 2026
  
  in Public
  
  gpt-oss-20B (high): 0.7%
  
  gpt-oss-20B 的成绩是 0.7%——在 452 个专业任务中，只有不到 4 个通过了评测。这个数字与顶级模型的 33.3% 之间，存在近 50 倍的差距。这说明专业服务 Agent 能力不是「渐进改善」，而是存在明确的「能力阶梯」——低于某个规模的模型，在这类任务上几乎完全失效。这对企业 AI 选型的启示：在专业服务场景，「够用的小模型」可能根本不存在，只有「能用的大模型」和「完全不能用的模型」两种。
  
  0.7-percent capability-cliff model-size enterprise-selection
2. fxp007 10 Apr 2026
  
  in Public
  
  Cost (USD) to run the evaluation: GPT-5.4 (xhigh): $1,110, Claude Opus 4.6 (max): $1,055
  
  运行一次 452 个任务的评测，GPT-5.4 花费 1110 美元，Claude Opus 4.6 花费 1055 美元——每个任务平均约 2.3 美元。而 Gemini 3 Flash 只需要 596 美元，实现了 27.7% 的成绩（vs 顶级模型的 33.3%）。这个性价比数据对 AI 选型决策极为关键：如果业务场景可以接受 27% 而非 33% 的成功率，Gemini 3 Flash 能节省近一半成本。在金融服务的大规模部署中，这个差异将被放大数千倍。
  
  cost-analysis 2-dollars-per-task cost-performance model-selection
Visit annotations in context

Tags

capability-cliff

cost-performance

0.7-percent

model-size

model-selection

2-dollars-per-task

cost-analysis

enterprise-selection

Annotators

fxp007

URL

artificialanalysis.ai/evaluations/apex-agents-aa
transformer-circuits.pub transformer-circuits.pub

Emotion Concepts and their Function in a Large Language Model

1
1. fxp007 09 Apr 2026
  
  in Public
  
  we studied emotion-related representations in Claude Sonnet 4.5, a frontier LLM at the time of our investigation.
  
  【启发】这篇论文只研究了 Claude Sonnet 4.5 一个模型，但它的方法论对所有大模型都适用。这启发了一个迫切的研究议程：对不同架构（GPT、Gemini、Qwen、DeepSeek）的情绪向量进行横向比较，会不会发现系统性的情绪偏差——比如某些模型天生更「焦虑」、某些更「冷漠」？这不仅是学术问题，更是产品选型和安全评估的实际需求。
  
  inspiration cross-model-comparison emotion-audit model-selection
Visit annotations in context

Tags

inspiration

model-selection

cross-model-comparison

emotion-audit

Annotators

fxp007

URL

transformer-circuits.pub/2026/emotions/index.html
Aug 2020
www.nber.org www.nber.org

Measuring Employer-to-Employer Reallocation

1
1. katietaylor_99 11 Aug 2020
  
  in BehSci
  
  Fujita, Shigeru, Giuseppe Moscarini, and Fabien Postel-Vinay. ‘Measuring Employer-to-Employer Reallocation’. Working Paper. Working Paper Series. National Bureau of Economic Research, July 2020. https://doi.org/10.3386/w27525.
  
  is:report lang:en employer-to-employer reallocation EE CPS Current Population Survey survey methodology RIP Respondent Identificaion Policy selection model great recession recovery market COVID-19
Visit annotations in context

Tags

is:report

Current Population Survey

EE

COVID-19

recovery

CPS

Respondent Identificaion Policy

great recession

reallocation

selection model

employer-to-employer

lang:en

market

survey methodology

RIP

Annotators

katietaylor_99

URL

nber.org/papers/w27525
Jun 2020
arxiv.org arxiv.org

Statistical inference of assortative community structures

1
1. ErikStuchly 26 Jun 2020
  
  in BehSci
  
  Zhang, L., & Peixoto, T. P. (2020). Statistical inference of assortative community structures. ArXiv:2006.14493 [Cond-Mat, Physics:Physics, Stat]. http://arxiv.org/abs/2006.14493
  
  is:article lang:en statistical inference assortative community structure network partition modeling model selection assortavity significance
Visit annotations in context

Tags

modeling

assortative community structure

significance

is:article

model selection

statistical inference

assortavity

lang:en

network partition

Annotators

ErikStuchly

URL

arxiv.org/abs/2006.14493
arxiv.org arxiv.org

Clustering - What Both Theoreticians and Practitioners are Doing Wrong

1
1. ErikStuchly 25 Jun 2020
  
  in BehSci
  
  Ben-David, S. (2018). Clustering—What Both Theoreticians and Practitioners are Doing Wrong. ArXiv:1805.08838 [Cs, Stat]. http://arxiv.org/abs/1805.08838
  
  is:article lang:en cluster clustering tool machine learning unsupervised learning theory practice algorithm parameter computational task optimization model selection
Visit annotations in context

Tags

computational task

practice

algorithm

optimization

clustering tool

is:article

model selection

cluster

machine learning

theory

parameter

lang:en

unsupervised learning

Annotators

ErikStuchly

URL

arxiv.org/abs/1805.08838
psyarxiv.com psyarxiv.com

All About AIC

1
1. Marlene_Wulf 08 Jun 2020
  
  in BehSci
  
  Del Giudice, M. (2020). All About AIC [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/7hmgz
  
  is:preprint lang:en AIC Bayes factor model selection
Visit annotations in context

Tags

Bayes factor

AIC

lang:en

is:preprint

model selection

Annotators

Marlene_Wulf

URL

psyarxiv.com/7hmgz/

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL