Hypothesis

26 Matching Annotations

May 2026
www.huxiu.com www.huxiu.com

https://www.huxiu.com/article/4861200.html

1
1. fxp007 29 May 2026
  
  in Public
  
  训练数据约450亿token，仅为主流方法的十分之一。
  
  这是一个显著的数据点，表明连续空间范式在数据效率上有巨大提升。450亿token仅为传统方法的10%，这意味着在同等数据量下，连续空间模型可能实现更好的性能，或者以更少的数据达到相同效果，这将大幅降低AI训练成本和数据依赖。
  
  data-point efficiency training-data
Visit annotations in context

Tags

efficiency

training-data

data-point

Annotators

fxp007

URL

huxiu.com/article/4861200.html
openai.com openai.com

https://openai.com/index/building-self-improving-tax-agents-with-codex/

1
1. fxp007 29 May 2026
  
  in Public
  
  Rental properties took about six weeks and substantial engineering oversight to reach 90% precision and recall
  
  这个时间框架显示了复杂税务处理任务的AI训练周期。90%的精确率和召回率对于复杂的租赁房产税务处理是一个很好的基准。需要'大量工程监督'表明即使是先进AI系统也需要人类专家的指导和监督，特别是在专业领域。
  
  data-point training-timeline precision-recall
Visit annotations in context

Tags

precision-recall

training-timeline

data-point

Annotators

fxp007

URL

openai.com/index/building-self-improving-tax-agents-with-codex/
www.anthropic.com www.anthropic.com

https://www.anthropic.com/news/gates-foundation-partnership

1
1. fxp007 19 May 2026
  
  in Public
  
  PwC will roll out Claude Code and Cowork starting with U.S. teams and expanding toward a global workforce of hundreds of thousands of professionals, establish a joint Center of Excellence, and train and certify 30,000 PwC professionals on Claude
  
  这一数据点显示了PwC对Claude的大规模采用计划，包括培训3万名专业人士。'数万名'的表述不够精确，但30,000的培训数字显示了专业培训的规模。这表明专业服务公司正在积极将AI整合到其服务中，但文章没有提供培训的具体内容和认证标准。
  
  data-point professional-training enterprise-scale
Visit annotations in context

Tags

enterprise-scale

professional-training

data-point

Annotators

fxp007

URL

anthropic.com/news/gates-foundation-partnership
www.anthropic.com www.anthropic.com

https://www.anthropic.com/news/pwc-expanded-partnership

1
1. fxp007 19 May 2026
  
  in Public
  
  a program to train and certify 30,000 PwC professionals on Claude
  
  具体提到将培训并认证30,000名PwC专业人员的Claude使用。这是一个明确的量化指标，反映了企业对AI人才培训的投资规模。30,000人的培训计划显示了PwC对此次合作的重视程度和资源投入。
  
  data-point training-program
Visit annotations in context

Tags

training-program

data-point

Annotators

fxp007

URL

anthropic.com/news/pwc-expanded-partnership
deepmind.google deepmind.google

https://deepmind.google/blog/alphaevolve-impact/

1
1. fxp007 19 May 2026
  
  in Public
  
  doubling its training speed whilst improving model quality
  
  Klarna报告的训练速度翻倍同时提高模型质量，展示了AlphaEvolve在商业AI模型优化中的双重价值。这种改进不仅加速了开发周期，还提高了最终产品性能，为金融服务行业带来直接竞争优势。
  
  data-point ai-training commercial-impact
Visit annotations in context

Tags

ai-training

commercial-impact

data-point

Annotators

fxp007

URL

deepmind.google/blog/alphaevolve-impact/
www.llmwatch.com www.llmwatch.com

https://www.llmwatch.com/p/ai-agents-of-the-week-papers-you-cbd

1
1. fxp007 01 May 2026
  
  in Public
  
  The quality and structure of training data matters more than its volume.
  
  这一观点强调了数据质量在模型训练中的重要性，为数据工程和模型训练提供了新的方向。
  
  data-quality training-data
Visit annotations in context

Tags

data-quality

training-data

Annotators

fxp007

URL

llmwatch.com/p/ai-agents-of-the-week-papers-you-cbd
Apr 2026
www.anthropic.com www.anthropic.com

https://www.anthropic.com/news/anthropic-amazon-compute

2
1. fxp007 30 Apr 2026
  
  in Public
  
  over one million Trainium2 chips to train and serve Claude
  
  使用超过100万颗Trainium2芯片的数据，展示了Anthropic在AI硬件部署上的巨大规模。这一数字不仅反映了计算能力的投入，也显示了与AWS在芯片定制上的深度合作。对于AI模型训练而言，百万级芯片的部署规模是行业顶尖水平，表明Claude可能需要大量计算资源进行训练和推理。
  
  data-point hardware-deployment ai-training
2. fxp007 25 Apr 2026
  
  in Public
  
  over one million Trainium2 chips to train and serve Claude
  
  100万片Trainium2芯片的使用量展示了AI模型训练的硬件规模。这一数量级表明Anthropic正在进行大规模并行计算，这是训练大型语言模型的基础设施要求。与英伟达GPU的采用相比，Trainium芯片代表了云服务提供商在AI硬件领域的差异化竞争策略。
  
  data-point hardware ai-training
Visit annotations in context

Tags

hardware

ai-training

hardware-deployment

data-point

Annotators

fxp007

URL

anthropic.com/news/anthropic-amazon-compute
epoch.ai epoch.ai

https://epoch.ai/blog/have-ai-capabilities-accelerated

1
1. fxp007 24 Apr 2026
  
  in Public
  
  The minimum training cutoffs are: ECI (June 2024), METR Time Horizon (January 2024), Combined Math (September 2024), and WeirdML V2 (January 2025).
  
  这些时间节点显示了各数据集的最小训练截止点，时间跨度从2024年1月到2025年1月。值得注意的是，WeirdML V2的数据集最短(从2025年1月开始)，这可能解释了为什么该指标没有显示出加速趋势，因为数据不足以检测到趋势变化。
  
  data-point time-span training-cutoff
Visit annotations in context

Tags

time-span

training-cutoff

data-point

Annotators

fxp007

URL

epoch.ai/blog/have-ai-capabilities-accelerated
every.to every.to

https://every.to/playtesting/the-market-for-making-ai-better

1
1. fxp007 17 Apr 2026
  
  in Public
  
  Academic publishers, documentary archives, game studios, and companies sitting on years of enterprise data have all been courted for the seeds of intelligence needed to train the next generation of models.
  
  AI训练数据市场的扩张正在重塑多个传统行业的价值定位，从学术出版到游戏工作室，各种看似不相关的数据源都可能成为AI训练的'智能种子'。这种跨行业数据融合正在创造新的商业机会和市场动态。
  
  data-sources industry-transformation ai-training
Visit annotations in context

Tags

industry-transformation

data-sources

ai-training

Annotators

fxp007

URL

every.to/playtesting/the-market-for-making-ai-better
aphyr.com aphyr.com

https://aphyr.com/posts/419-the-future-of-everything-is-lies-i-guess-new-jobs

1
1. fxp007 17 Apr 2026
  
  in Public
  
  As slop takes over the Internet, labs may struggle to obtain high-quality corpuses for training models.
  
  这一观察揭示了AI训练数据质量的危机。随着互联网内容质量的下降，AI系统可能面临'垃圾进，垃圾出'的风险。作者提出的'低背景钢'比喻巧妙地指出了使用2023年前纯净数据的解决方案，同时也暗示了数字时代知识污染的严重性，这可能会对AI系统的可靠性和偏见产生深远影响。
  
  data-quality training-corpus ai-bias
Visit annotations in context

Tags

training-corpus

data-quality

ai-bias

Annotators

fxp007

URL

aphyr.com/posts/419-the-future-of-everything-is-lies-i-guess-new-jobs
a16z.com a16z.com

Where Enterprises are Actually Adopting AI - a16z

1
1. fxp007 10 Apr 2026
  
  in Public
  
  Support teams are high volume and high turnover, and thus need to train new reps in a fast and standardized way. To do so, they have clearly articulated standard operating procedures (SOPs) that guide the work of each rep. These SOPs create clear rules and guidelines that AI agents can model themselves off of.
  
  AI 在客服领域成功的秘密竟然是：这个行业为了管理人类员工的高流失率，被迫建立了极其清晰的 SOP 文档——而这恰好是训练 AI Agent 的完美素材。这是一个意外的历史巧合：企业因为人类问题（高离职率）被迫文档化了所有流程，然后 AI 来了，直接把这些文档变成了自己的「培训手册」。低价值工作被最彻底地文档化，反而最容易被 AI 替代。
  
  SOP-as-training-data customer-support ironic-automation surprising
Visit annotations in context

Tags

SOP-as-training-data

surprising

ironic-automation

customer-support

Annotators

fxp007

URL

a16z.com/where-enterprises-are-actually-adopting-ai/
huggingface.co huggingface.co

https://huggingface.co/papers/2604.04771

2
1. fxp007 08 Apr 2026
  
  in Public
  
  A three-stage progressive training strategy -- large-scale pre-training, hard sample fine-tuning, and GRPO alignment -- sequentially exploits these data at different quality tiers.
  
  大多数人认为训练策略应该统一应用于所有数据，但作者提出了分阶段渐进式训练策略，在不同质量层级的数据上采用不同方法，这种针对数据质量差异的训练方法挑战了传统'一刀切'的训练范式，代表了数据为中心的AI新思路。
  
  non-consensus training-strategy data-quality
2. fxp007 08 Apr 2026
  
  in Public
  
  SOTA models of different architectures and parameter scales exhibit highly consistent failure patterns on the same set of hard samples, suggesting that the performance bottleneck stems from shared deficiencies in training data rather than architecture itself.
  
  大多数人认为不同架构的模型会有不同的失败模式和弱点，但作者发现无论架构和参数规模如何，SOTA模型在相同困难样本上表现出高度一致的失败模式，这表明性能瓶颈源于训练数据的共同缺陷，而非架构差异，这一发现挑战了模型多样化的传统观点。
  
  non-consensus model-architecture training-data
Visit annotations in context

Tags

training-strategy

data-quality

non-consensus

training-data

model-architecture

Annotators

fxp007

URL

huggingface.co/papers/2604.04771
ai.meta.com ai.meta.com

https://ai.meta.com/blog/alta-daily-fashion-app-segment-anything/

1
1. fxp007 08 Apr 2026
  
  in Public
  
  If we knew that every image uploaded was a beautiful model shot, segmentation would be far easier, but because of the nature of user-uploaded content, we need the best possible segmentation.
  
  大多数人可能认为高质量的专业照片是AI图像处理的理想输入，但作者暗示即使是'完美'的模特照片实际上比用户上传的真实内容更容易处理。这一观点挑战了人们对'理想训练数据'的假设，暗示真实世界数据的'不完美'实际上构成了更严峻的技术挑战。
  
  non-consensus counterintuitive ai-training-data
Visit annotations in context

Tags

non-consensus

ai-training-data

counterintuitive

Annotators

fxp007

URL

ai.meta.com/blog/alta-daily-fashion-app-segment-anything/
Nov 2024
www.nature.com www.nature.com

AI models collapse when trained on recursively generated data

1
1. chrisaldrich 06 Nov 2024
  
  in Public
  
  AI models collapse when trained on recursively generated data by Ilia Shumailov et al.
  
  ᔥ[[Mathew Lowry]] in AI4Communities post - MyHub Experiments Wiki (accessed:: 2024-11-06 09:43:23)
  
  artificial intelligence models collapse training data
Visit annotations in context

Tags

artificial intelligence models

collapse

training data

Annotators

chrisaldrich

URL

nature.com/articles/s41586-024-07566-y
experiments.myhub.ai experiments.myhub.ai

AI4Communities post - MyHub Experiments Wiki

1
1. chrisaldrich 06 Nov 2024
  
  in Public
  
  the model collapse paper now suggests that the training data created by well-managed communities could be the new currency of collective intelligence.
  
  artificial intelligence training data data ownership collective memory collective intelligence sense making zettelkasten ratchet
Visit annotations in context

Tags

artificial intelligence

training data

sense making

collective intelligence

zettelkasten ratchet

data ownership

collective memory

Annotators

chrisaldrich

URL

experiments.myhub.ai/ai4communities_post
Feb 2024
ukdataservice.ac.uk ukdataservice.ac.uk

UK Data Service

1
1. Saldner_DANS 29 Feb 2024
  
  in Public
  
  rda_graph social sciences data catalogue data archive Training UK Data Service research data
Visit annotations in context

Tags

rda_graph

social sciences

research data

Training

data catalogue

UK Data Service

data archive

Annotators

Saldner_DANS

URL

ukdataservice.ac.uk/
Mar 2022
intellipaat.com intellipaat.com

Best Data Science Courses Online - IIT Madras Certification Training

1
1. sandeep_intellipaat 23 Mar 2022
  
  in Public
  
  Learn Data Science from IIT Madras faculty & Industry experts and earn a Data Science certification from India's best Engineering College. Become a Data Scientist through multiple data Science courses covered in this 7-month data science certification program with hands-on exercises & Project work.
  
  This Data Science Course is offered by Intellipaat in collaboration with IIT Madras (one of the renowned institutes in India) to help you master Data Science skills like Python, programming, Data Visualization, Statistical analysis and computing, Deep Learning, etc.
  
  Eager to step into the field of Data Science? Explore the Page now!
  
  Data Science Course Data Science Certification Data Science Training Data Scientist Course Data Scientist Training Data Scientist Certification online Data Science Course
Visit annotations in context

Tags

Data Scientist Training

Data Science Certification

Data Science Course

Data Scientist Certification

Data Science Training

Data Scientist Course

online Data Science Course

Annotators

sandeep_intellipaat

URL

intellipaat.com/data-scientist-course-training/
Jan 2022
www.theguardian.com www.theguardian.com

I’m leading a long Covid trial – it’s clear Britain has underestimated its impact | Amitava Banerjee

1
1. lucyparfitt16 12 Jan 2022
  
  in BehSci
  
  Banerjee, A. (2022, January 12). I’m leading a long Covid trial – it’s clear Britain has underestimated its impact. The Guardian. https://www.theguardian.com/commentisfree/2022/jan/12/long-covid-trial-britain-short-term-virus
  
  is:news lang:en COVID-19 long covid UK science short-term chronic ongoing symptoms data government acute effects funding prevention policy morbidity mortality training research mental health
Visit annotations in context

Tags

long covid

mortality

morbidity

data

lang:en

acute effects

short-term

is:news

funding

COVID-19

mental health

prevention

chronic

UK

policy

training

science

ongoing symptoms

research

government

Annotators

lucyparfitt16

URL

theguardian.com/commentisfree/2022/jan/12/long-covid-trial-britain-short-term-virus
May 2021
opendatapolicylab.org opendatapolicylab.org

Developing a Data Reuse Strategy for Solving Public Problems - Open Data Policy Lab

1
1. mlenc 07 May 2021
  
  in Public
  
  open data data reuse govlab training
Visit annotations in context

Tags

data reuse

training

govlab

open data

Annotators

mlenc

URL

opendatapolicylab.org//academy/data-reuse-strategy/syllabus-2021/
Oct 2020
www.youtube.com www.youtube.com

ORWG virtual meeting 08/09/2020

1
1. amyhcurtis 29 Oct 2020
  
  in BehSci
  
  ORWG Virtual Meeting 08/09/2020 https://www.youtube.com/playlist?list=PLOA0aRJ90NxvXtMt5Si5ukmR9LYfvDueB (n.d.)
  
  is:youtube webinar lang:en poster publish conference data management open science research work code test model development training research excellence framework
Visit annotations in context

Tags

work

is:youtube

data

model

lang:en

open science

webinar

code

development

research excellence framework

publish

test

training

research

poster

conference

management

Annotators

amyhcurtis

URL

youtube.com/playlist
May 2020
hypothes.is hypothes.is

Hypothesis

1
1. ritikasingh0019 20 May 2020
  
  in Public
  
  Register Today For Data Science Certification. Learn the Best Data Science Course from our Top Tutors. Study and Get A Certified Data Science Course. Enroll For Data Science Certification and Get 24/7 support and all time study Material. Land in your Dream Job by registering to this Course.
  
  data science training data science data science certification
Visit annotations in context

Tags

data science certification

data science training

data science

Annotators

ritikasingh0019

URL

hypothes.is/welcome/c466f23343fc1bd4
May 2018
hypothes.is hypothes.is

Hypothesis

1
1. urmilakhanna074 08 May 2018
  
  in Public
  
  hi there get the full insights on MSBI tools training and tutorial with the Real time Examples and application on the Running Projects as well https://www.youtube.com/watch?v=OzmdY0zCw4g
  
  MSBI training Business intelligence Datawarehousing Data integration ETL testing data manipulation
Visit annotations in context

Tags

Datawarehousing

ETL testing

MSBI training

Data integration

Business intelligence

data manipulation

Annotators

urmilakhanna074

URL

hypothes.is/users/urmilakhanna074
Jul 2017
www.edx.org www.edx.org

Introduction to R for Data Science

1
1. rschulz 05 Jul 2017
  
  in Public
  
  Introduction to R for Data Science
  
  Data analysis course using R
  
  training R programming data-analysis
Visit annotations in context

Tags

data-analysis

programming

training

R

Annotators

rschulz

URL

edx.org/course/introduction-r-data-science-microsoft-dat204x-5
www.datacamp.com www.datacamp.com

Learn R, Python & Data Science Online | DataCamp

1
1. rschulz 05 Jul 2017
  
  in Public
  
  Learn Data Science Online
  
  Data analysis courses using R and Python
  
  training python R data-analysis programming
Visit annotations in context

Tags

python

training

R

programming

data-analysis

Annotators

rschulz

URL

datacamp.com/

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL