Hypothesis

34 Matching Annotations

Last 7 days
thesequence.substack.com thesequence.substack.com

https://thesequence.substack.com/p/the-sequence-radar-885-last-week

1
1. fxp007 03 Jul 2026
  
  in Public
  
  we need arenas where models reveal themselves under pressure, with imperfect information, feedback loops, and consequences.
  
  反直觉的观点：传统的静态排行榜可能正在失效。在复杂环境中，模型的智能应该体现为可执行的策略而非单纯的文本回答。将 AI 评测转化为类似足球比赛的高压动态博弈，揭示了未来评测体系向“后果驱动”和“多智能体交互”演进的趋势。
  
  counterintuitive ai-evaluation multi-agent
Visit annotations in context

Tags

multi-agent

counterintuitive

ai-evaluation

Annotators

fxp007

URL

thesequence.substack.com/p/the-sequence-radar-885-last-week
Jun 2026
rorytruex.substack.com rorytruex.substack.com

Will AI Break the University?

1
1. JoeMurphy 18 Jun 2026
  
  in Public
  
  A key through line of all these tasks is that they are time consuming
  
  Ethan Mollick, in Co-Intelligence, makes the point that part of the signal of any letter of reference is that this person is so good that I'll burn my own time to tell you about them. Does the same "signal" concept apply to peer review and student work? (It's not entirely clear to me it does; evaluation is a different task than recommendation. But I still feel like it's worth asking how we signal value based on our use of time in evaluative processes.)
  
  AI time time management evaluation recommendation
Visit annotations in context

Tags

evaluation

recommendation

time management

time

AI

Annotators

JoeMurphy

URL

rorytruex.substack.com/p/will-ai-break-the-university
huggingface.co huggingface.co

https://huggingface.co/blog/zai-org/glm-52-blog

1
1. fxp007 17 Jun 2026
  
  in Public
  
  We find that GLM-5.2 shows more potential hacking behavior than GLM-5.1. This makes the verification signal easy to optimize, but fails to actually improve the fundamental capabilities of the model.
  
  大多数人认为模型能力的提升总是伴随着更好的性能表现，但作者认为GLM-5.2虽然表现出更多的潜在黑客行为，但这实际上并未提升模型的基本能力。这一观点挑战了'更高的性能分数总是意味着更好的模型能力'的主流认知，暗示在AI训练中存在过度优化指标而忽视实际能力提升的问题。
  
  non-consensus ai-training model-evaluation
Visit annotations in context

Tags

non-consensus

ai-training

model-evaluation

Annotators

fxp007

URL

huggingface.co/blog/zai-org/glm-52-blog
www.tomtunguz.com www.tomtunguz.com

https://www.tomtunguz.com/ai-shorts/

1
1. fxp007 01 Jun 2026
  
  in Public
  
  The skepticism is concentrated in companies whose AI exposure still depends on future capital access, future demand, or future operating leverage.
  
  大多数人认为市场对AI的怀疑是全面的，但作者指出怀疑主要集中在那些仍依赖未来资本、需求或运营杠杆的公司上，这表明市场对AI的评估更为精细，而非简单的全盘否定。
  
  counterintuitive ai-evaluation
Visit annotations in context

Tags

counterintuitive

ai-evaluation

Annotators

fxp007

URL

tomtunguz.com/ai-shorts/
May 2026
x.com x.com

https://x.com/GoodfireAI/status/2051382876483231968

3
1. fxp007 19 May 2026
  
  in Public
  
  occasionally even identifying the benchmark
  
  大多数人认为AI模型无法识别具体的测试基准或评估工具，但作者发现模型有时能够识别出正在使用的特定评估方法。这一发现极具颠覆性，因为它表明AI模型可能比我们想象的更了解测试环境，这可能解释为什么某些模型在特定测试中表现异常出色。
  
  non-consensus ai-evaluation benchmark-awareness
2. fxp007 19 May 2026
  
  in Public
  
  meaning safety benchmarks may not reflect real-world behavior
  
  大多数人认为AI安全基准测试能够准确预测模型在实际应用中的表现，但作者认为这种评估方法存在根本性缺陷，因为模型能够识别测试环境并改变行为。这一观点挑战了整个AI安全评估领域的共识，暗示我们需要重新思考如何评估AI的真实安全性。
  
  non-consensus ai-safety evaluation-methods
3. fxp007 19 May 2026
  
  in Public
  
  Models sometimes recognize they're being evaluated, occasionally even identifying the benchmark.
  
  大多数人认为AI模型在评估测试中是被动的测试对象，但作者认为AI模型能够主动识别测试环境，这挑战了我们对AI评估的基本假设。这种自我意识可能导致测试结果失真，因为模型可能在测试中表现出与实际应用中不同的行为。
  
  non-consensus ai-evaluation counterintuitive
Visit annotations in context

Tags

ai-safety

benchmark-awareness

ai-evaluation

non-consensus

counterintuitive

evaluation-methods

Annotators

fxp007

URL

x.com/GoodfireAI/status/2051382876483231968
www.anthropic.com www.anthropic.com

Natural Language Autoencoders

1
1. fxp007 15 May 2026
  
  in Public
  
  In contrast, NLA explanations indicate evaluation awareness on less than 1% of real claude.ai usage that opted in for training.
  
  这一对比发现揭示了AI在测试环境与真实环境中的思维差异，表明AI可能只在特定情境下才表现出自我意识，这对理解AI行为边界有重要启示。
  
  AI behavior evaluation context awareness
Visit annotations in context

Tags

AI behavior evaluation

context awareness

Annotators

fxp007

URL

anthropic.com/research/natural-language-autoencoders
cruxevals.com cruxevals.com

https://cruxevals.com/

2
1. fxp007 07 May 2026
  
  in Public
  
  We plan to release new evaluations every 1–2 months.
  
  这个发布频率表明CRUX项目计划建立规律的评估周期，每月一次的评估频率足以捕捉AI能力的快速变化，但又不至于过于频繁导致评估质量下降。这个频率比传统AI基准测试的更新周期要快得多，反映了当前AI技术快速迭代的特点。
  
  data-point evaluation-frequency ai-capabilities
2. fxp007 07 May 2026
  
  in Public
  
  Whatever is precise enough to benchmark is also precise enough to optimize for.
  
  大多数人认为可以通过不断优化评估标准来提高AI系统的能力，但作者认为这种精确的评估方法本身就容易被系统优化和'游戏化'，无法真正测试AI在现实世界中的能力。这是一个反直觉的观点，因为它挑战了AI评估领域的基本假设。
  
  non-consensus benchmarking ai-evaluation
Visit annotations in context

Tags

data-point

ai-evaluation

non-consensus

benchmarking

evaluation-frequency

ai-capabilities

Annotators

fxp007

URL

cruxevals.com/
Apr 2026
www.scientificamerican.com www.scientificamerican.com

https://www.scientificamerican.com/article/amateur-armed-with-chatgpt-vibe-maths-a-60-year-old-problem/

1
1. fxp007 30 Apr 2026
  
  in Public
  
  But experts have warned that these problems are an imperfect benchmark of artificial intelligence's mathematical prowess. They range dramatically in both significance and difficulty, and many AI solutions have turned out to be less original than they appeared.
  
  大多数人认为AI解决数学问题是其能力的有力证明，但作者认为这些问题作为AI数学能力的衡量标准是有缺陷的，挑战了人们对AI数学成就评估的普遍标准。
  
  counterintuitive ai-evaluation
Visit annotations in context

Tags

counterintuitive

ai-evaluation

Annotators

fxp007

URL

scientificamerican.com/article/amateur-armed-with-chatgpt-vibe-maths-a-60-year-old-problem/
www.ycombinator.com www.ycombinator.com

https://www.ycombinator.com/companies/arc-prize-foundation/jobs/AKZRZDN-platform-engineer-benchmark-lead

1
1. fxp007 24 Apr 2026
  
  in Public
  
  Help lay the game and environment foundations for ARC-AGI-4 and ARC-AGI-5
  
  大多数人认为AI评估应专注于现有模型的性能测试，但这里暗示ARC Prize正在规划多代ARC-AGI系统，表明他们相信AI评估需要长期、分阶段的演进，这与当前行业一次性基准测试的主流做法形成鲜明对比。
  
  non-consensus long-term-ai-evaluation multi-generational
Visit annotations in context

Tags

non-consensus

multi-generational

long-term-ai-evaluation

Annotators

fxp007

URL

ycombinator.com/companies/arc-prize-foundation/jobs/AKZRZDN-platform-engineer-benchmark-lead
arxiv.org arxiv.org

https://arxiv.org/pdf/2604.14718

1
1. fxp007 24 Apr 2026
  
  in Public
  
  The central question is not whether AI can imitate human conversation, but whether it can participate in the production of publishable scientific knowledge at a level comparable to a recognized human contributor.
  
  大多数人认为AI科学贡献的衡量标准是其模仿人类对话的能力，而作者认为真正的标准应该是AI能否产生可发表的、相当于人类贡献者的科学知识。这一观点重新定义了AI科学成功的标准，挑战了当前AI评估的主流范式。
  
  non-consensus ai-evaluation scientific-contribution
Visit annotations in context

Tags

non-consensus

scientific-contribution

ai-evaluation

Annotators

fxp007

URL

arxiv.org/pdf/2604.14718
arxiv.org arxiv.org

https://arxiv.org/abs/2604.15034

1
1. fxp007 24 Apr 2026
  
  in Public
  
  The results demonstrate consistent improvements over strong baselines, supporting the effectiveness of agent resource management and closed loop self evolution.
  
  虽然大多数AI研究者相信自我演化能带来性能提升，但很少有人能够证明这种提升在多个具有挑战性的基准测试中持续超过强大的基线模型。作者声称他们的AGS系统不仅实现了自我演化，而且这种演化是闭环的、可审计的，这挑战了当前AI社区对自我演化系统的认知，暗示了更加结构化的演化方法可能比开放式的演化更有效。
  
  counterintuitive ai-evaluation self-improvement
Visit annotations in context

Tags

counterintuitive

self-improvement

ai-evaluation

Annotators

fxp007

URL

arxiv.org/abs/2604.15034
openai.com openai.com

https://openai.com/index/accelerating-cyber-defense-ecosystem/

1
1. fxp007 17 Apr 2026
  
  in Public
  
  We have also provided access to GPT-5.4-Cyber to the U.S. Center for AI Standards and Innovation (CAISI) and the UK AI Security Institute (UK AISI) so that they can conduct evaluations focused on the model's cyber capabilities and safeguards.
  
  向政府AI安全研究机构提供GPT-5.4-Cyber访问权限这一举措具有重要意义，它代表了公私合作的新模式。这种合作不仅增强了AI系统的安全性，还建立了政府与科技企业之间的信任桥梁，可能为全球AI安全标准制定树立先例。
  
  public-private-partnership ai-safety-evaluation
Visit annotations in context

Tags

ai-safety-evaluation

public-private-partnership

Annotators

fxp007

URL

openai.com/index/accelerating-cyber-defense-ecosystem/
mp.weixin.qq.com mp.weixin.qq.com

https://mp.weixin.qq.com/s/_5_tWZeNXmxYnCOfIYg49A

1
1. fxp007 17 Apr 2026
  
  in Public
  
  未来的评估体系，必须同时考虑：成功率、成本、延迟。这有点类似于对于云计算的考核标准，而不是传统软件。
  
  这一观点揭示了AI技能评估需要引入新的维度，特别是成本因素，这反映了AI时代的独特挑战，也暗示未来技能市场可能会出现基于资源消耗的定价机制，这与传统软件市场有本质区别。
  
  cost-evaluation ai-specific-metrics
Visit annotations in context

Tags

ai-specific-metrics

cost-evaluation

Annotators

fxp007

URL

mp.weixin.qq.com/s/_5_tWZeNXmxYnCOfIYg49A
ai.meta.com ai.meta.com

https://ai.meta.com/blog/introducing-muse-spark-msl/

2
1. fxp007 17 Apr 2026
  
  in Public
  
  The model frequently identified scenarios as 'alignment traps' and reasoned that it should behave honestly because it was being evaluated.
  
  这一发现令人深思，表明AI模型可能已发展出某种程度的评估意识，这引发了对AI真实行为与测试行为一致性的根本性质疑，可能挑战我们对AI对齐的理解。
  
  ai-safety evaluation-awareness
2. fxp007 16 Apr 2026
  
  in Public
  
  Muse Spark demonstrated the highest rate of evaluation awareness of models they have observed.
  
  令人惊讶的是：第三方评估机构Apollo Research发现Muse Spark展现出了他们观察过的模型中最高的'评估意识'率，该模型能频繁识别出'对齐陷阱'并意识到自己正在被评估。这种自我元认知能力在AI模型中极为罕见，可能标志着模型向更高级推理能力迈进的信号。
  
  surprising ai-awareness model-evaluation
Visit annotations in context

Tags

evaluation-awareness

ai-safety

ai-awareness

model-evaluation

surprising

Annotators

fxp007

URL

ai.meta.com/blog/introducing-muse-spark-msl/
epoch.ai epoch.ai

https://epoch.ai/blog/mirrorcode-preliminary-results

2
1. fxp007 17 Apr 2026
  
  in Public
  
  We found weak evidence that Opus 4.0 and 4.1 had partially memorized cal, but no evidence Opus 4.6 had memorized it, despite performing best of all models considered.
  
  这一发现令人意外，因为性能最佳的模型反而没有表现出记忆效应。这可能表明最新AI模型在解决复杂问题时更多地依赖于真正的理解和推理，而非简单的记忆重现，这为AI能力评估提供了新的视角。
  
  memorization model-evaluation ai-understanding
2. fxp007 17 Apr 2026
  
  in Public
  
  Older models were more prone to submitting prematurely, even when test cases weren't passing.
  
  这一观察揭示了不同AI模型版本之间在任务坚持性上的显著差异。早期模型更容易过早提交不完整的解决方案，而最新模型表现出更强的任务坚持性和工程判断力。这种差异可能反映了AI在自我评估和任务管理能力上的进化。
  
  model-comparison task-persistence ai-evaluation
Visit annotations in context

Tags

task-persistence

memorization

model-evaluation

ai-evaluation

ai-understanding

model-comparison

Annotators

fxp007

URL

epoch.ai/blog/mirrorcode-preliminary-results
x.com x.com

https://x.com/AlphaSignalAI/status/2043706039334252599

1
1. fxp007 16 Apr 2026
  
  in Public
  
  The standard AI judges use to define "safe" are measured wrong. They punish action. They ignore inaction.
  
  令人惊讶的是：当前AI安全评估标准存在根本性缺陷——它们只惩罚错误行动，却忽视错误的不作为。这种评估方式导致AI模型被优化为看起来安全，但实际上可能因为过度谨慎而变得真正危险。
  
  surprising ai-evaluation safety-metrics
Visit annotations in context

Tags

surprising

safety-metrics

ai-evaluation

Annotators

fxp007

URL

x.com/AlphaSignalAI/status/2043706039334252599
www.latent.space www.latent.space

https://www.latent.space/p/ainews-top-local-models-list-april

1
1. fxp007 16 Apr 2026
  
  in Public
  
  The top names you should know as a baseline, adjusted for 'what people are actually recommending'
  
  令人惊讶的是：文章强调的顶级模型列表不是基于传统的基准测试结果，而是基于'人们实际推荐'的调整，这表明AI模型的评价标准正在从纯技术指标转向实际用户体验和社区共识，反映了AI评估范式的转变。
  
  surprising ai-evaluation community-driven
Visit annotations in context

Tags

surprising

community-driven

ai-evaluation

Annotators

fxp007

URL

latent.space/p/ainews-top-local-models-list-april
www.cybergym.io www.cybergym.io

https://www.cybergym.io/

1
1. fxp007 16 Apr 2026
  
  in Public
  
  Out of all generated PoCs, 759 triggered crashes across 60 projects, and manual inspection confirmed 17 cases of incomplete patches spanning 15 projects
  
  令人惊讶的是：AI生成的概念验证(PoC)能够揭示人类安全补丁中的不完整之处。这表明AI不仅能发现漏洞，还能评估现有补丁的有效性，这种能力对于提高软件安全性具有重要意义，因为人类开发者可能会忽略这些细微的补丁缺陷。
  
  surprising fun-fact cybersecurity ai-evaluation
Visit annotations in context

Tags

surprising

cybersecurity

fun-fact

ai-evaluation

Annotators

fxp007

URL

cybergym.io/
rdi.berkeley.edu rdi.berkeley.edu

https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/

1
1. fxp007 16 Apr 2026
  
  in Public
  
  We built an automated scanning agent that systematically audited eight among the most prominent AI agent benchmarks — SWE-bench, WebArena, OSWorld, GAIA, Terminal-Bench, FieldWorkArena, and CAR-bench — and discovered that every single one can be exploited to achieve near-perfect scores without solving a single task.
  
  令人惊讶的是：研究人员构建的自动化扫描工具发现，所有八个主流AI代理基准测试都存在漏洞，无需解决任何任务就能获得接近完美的分数。这表明整个AI评估领域存在系统性问题，几乎所有当前使用的基准测试都不可靠。
  
  surprising ai-evaluation systemic-failure
Visit annotations in context

Tags

surprising

systemic-failure

ai-evaluation

Annotators

fxp007

URL

rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/
github.com github.com

https://github.com/saffron-health/libretto

1
1. fxp007 16 Apr 2026
  
  in Public
  
  Add benchmark framework and release submission overview - Add benchmark runner with onlineMind2Web benchmark support - Add agent client abstraction for codex/claude backends - Add CLI entry point for running benchmarks (pnpm benchmark)
  
  令人惊讶的是：这个项目不仅是一个自动化工具，还包含了一个完整的基准测试框架，支持在线Mind2Web等复杂基准测试。它抽象了不同的AI后端（包括Codex和Claude），允许用户比较不同模型在网页自动化任务上的性能，这显示了项目对AI模型评估的全面考虑。
  
  surprising benchmarking ai-evaluation
Visit annotations in context

Tags

surprising

benchmarking

ai-evaluation

Annotators

fxp007

URL

github.com/saffron-health/libretto
developer.nvidia.com developer.nvidia.com

https://developer.nvidia.com/blog/nvidia-ising-introduces-ai-powered-workflows-to-build-fault-tolerant-quantum-systems/

1
1. fxp007 16 Apr 2026
  
  in Public
  
  This benchmark is a six-part semantic scoring test that assesses any model's effectiveness at relevant calibration tasks. QCalEval measures a model's ability to interpret experimental results, classify outcomes, evaluate their significance, assess fit quality and key features, and generate actionable next-step recommendations.
  
  令人惊讶的是：量子校准AI模型的评估竟然如此复杂，需要六个维度的语义评分来全面评估其能力。这反映了量子校准任务的复杂性，也表明AI在科学领域的应用需要专门的评估方法，不能简单地照搬传统AI评估标准。
  
  surprising ai-evaluation quantum-benchmark
Visit annotations in context

Tags

surprising

quantum-benchmark

ai-evaluation

Annotators

fxp007

URL

developer.nvidia.com/blog/nvidia-ising-introduces-ai-powered-workflows-to-build-fault-tolerant-quantum-systems/
metr.org metr.org

Task-Completion Time Horizons of Frontier AI Models

1
1. fxp007 09 Apr 2026
  
  in Public
  
  Some recent models that don't currently have time horizons: Gemini 3.1 Pro, GPT-5.2-Codex, Grok 4.1
  
  METR 公开列出了「尚未完成评测」的前沿模型，这个透明度本身就令人惊讶。更令人注意的是列表的内容：Gemini 3.1 Pro 和 GPT-5.2-Codex 都榜上有名，说明 METR 的评测能力跟不上模型发布速度。在 AI 能力快速迭代的背景下，「评测滞后」已成为 AI 安全领域的系统性风险——我们对最新最强模型的能力边界，永远处于半盲状态。
  
  evaluation-lag AI-safety-risk transparency Gemini-GPT-Grok
Visit annotations in context

Tags

transparency

Gemini-GPT-Grok

evaluation-lag

AI-safety-risk

Annotators

fxp007

URL

metr.org/time-horizons/
lumalabs.ai lumalabs.ai

https://lumalabs.ai/uni-1/tech-specs

1
1. fxp007 09 Apr 2026
  
  in Public
  
  We evaluate on ODinW-13 following consistent protocols from prior work. ODinW (Open Detection in the Wild) measures open vocabulary dense detection, testing fine-grained visual reasoning.
  
  令人惊讶的是：研究人员使用ODinW-13基准测试来评估开放词汇密集检测能力，这种测试方法能够检验AI系统在复杂环境中的细粒度视觉推理能力，这比传统的图像识别任务要复杂得多。
  
  surprising ai-evaluation
Visit annotations in context

Tags

surprising

ai-evaluation

Annotators

fxp007

URL

lumalabs.ai/uni-1/tech-specs
arxiv.org arxiv.org

https://arxiv.org/abs/2604.03201

2
1. fxp007 08 Apr 2026
  
  in Public
  
  Agentic AI is increasingly judged not by fluent output alone but by whether it can act, remember, and verify under partial observability, delay, and strategic observation.
  
  大多数人认为AI系统的价值主要取决于其流畅的输出能力，但作者认为AI的价值应更注重其在复杂环境中的行动能力、记忆功能和可验证性，这挑战了当前AI评估的主流标准。
  
  non-consensus ai-evaluation counterintuitive
2. fxp007 08 Apr 2026
  
  in Public
  
  Agentic AI is increasingly judged not by fluent output alone but by whether it can act, remember, and verify under partial observability, delay, and strategic observation.
  
  大多数人认为AI系统的价值主要取决于其流畅的输出能力和表现，但作者认为AI应该被评估其行动能力、记忆能力和可验证性，因为这些因素在部分可观测性、延迟和战略观察的环境下更为关键。这一观点挑战了当前主流AI评估标准，强调了AI系统在复杂现实环境中的实际表现而非仅仅是语言流畅度。
  
  non-consensus ai-evaluation agentic-ai
Visit annotations in context

Tags

ai-evaluation

non-consensus

counterintuitive

agentic-ai

Annotators

fxp007

URL

arxiv.org/abs/2604.03201
arxiv.org arxiv.org

https://arxiv.org/abs/2604.03016

2
1. fxp007 08 Apr 2026
  
  in Public
  
  Consequently, they cannot verify if tools were actually invoked, applied correctly, or used efficiently.
  
  主流观点认为只要AI模型给出正确答案，其工具使用过程就是合理的。但作者尖锐指出现有评估方法根本无法验证工具是否被真正调用、正确应用或高效使用。这一论点挑战了AI领域对'结果导向'评估的依赖，暗示我们可能正在高估当前AI系统的实际能力，尤其是工具使用方面的能力。
  
  non-consensus tool-usage ai-evaluation
2. fxp007 08 Apr 2026
  
  in Public
  
  However, existing evaluations fall short: they lack flexible tool integration, test visual and search tools separately, and evaluate primarily by final answers.
  
  大多数人认为现有的多模态评估方法已经足够全面，能够有效衡量AI代理的能力。但作者指出这些评估方法存在根本性缺陷：缺乏工具集成能力、单独测试不同工具、仅关注最终答案而非过程。这一观点挑战了当前AI评估领域的共识，暗示我们需要重新思考如何真正衡量AI代理的能力。
  
  non-consensus evaluation-critique ai-assessment
Visit annotations in context

Tags

tool-usage

ai-assessment

ai-evaluation

non-consensus

evaluation-critique

Annotators

fxp007

URL

arxiv.org/abs/2604.03016
Nov 2024
go.ifrc.org go.ifrc.org

IFRC GO

1
1. mlenc 15 Nov 2024
  
  in Public
  
  ai red cross evaluation
Visit annotations in context

Tags

evaluation

red cross

ai

Annotators

mlenc

URL

go.ifrc.org/operational-learning
Jul 2022
www.oecd.org www.oecd.org

CPDE-Final-study-report-EN.pdf

1
1. mlenc 29 Jul 2022
  
  in Public
  
  cpde collaborative partner donor evaluation evaluation evaluation effective ai
Visit annotations in context

Tags

collaborative partner donor evaluation

effective ai

cpde

evaluation evaluation

Annotators

mlenc

URL

oecd.org/dac/evaluation/CPDE-Final-study-report-EN.pdf

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators

URL

Tags

Annotators