Hypothesis

5 Matching Annotations

May 2026
openai.com openai.com

https://openai.com/index/introducing-chatgpt-images-2-0/

1
1. fxp007 01 May 2026
  
  in Public
  
  Greater precision and control
  
  该表述可能带有偏见，需要了解“Greater precision and control”是如何实现的，以及用户对此的评价。
  
  bias user-evaluation
Visit annotations in context

Tags

bias

user-evaluation

Annotators

fxp007

URL

openai.com/index/introducing-chatgpt-images-2-0/
Apr 2026
openai.com openai.com

https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/

1
1. fxp007 27 Apr 2026
  
  in Public
  
  We also found evidence that models that have seen the problems during training are more likely to succeed, because they have additional information needed to pass the underspecified tests.
  
  大多数人认为AI模型的性能提升主要源于算法和架构的改进。但作者发现，模型在SWE-bench上的成功更多取决于它们是否在训练中见过这些问题，而非真正的编程能力提升。这一观点与行业普遍认为的'模型进步'叙事相悖，暗示当前AI发展评估可能存在严重偏差。
  
  counterintuitive model-progress evaluation-bias
Visit annotations in context

Tags

model-progress

evaluation-bias

counterintuitive

Annotators

fxp007

URL

openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/
Aug 2021
psyarxiv.com psyarxiv.com

Studying science denial with a complex problem-solving task

1
1. lucyparfitt16 04 Aug 2021
  
  in BehSci
  
  Sulik, J., & McKay, R. (2021). Studying science denial with a complex problem-solving task [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/huxm7
  
  is:preprint lang:en science denial cognitive style problem solving analytic style hypothesis generation cognitive bias hypothesis evaluation trait conspiracy belief formation science research behavioral science
Visit annotations in context

Tags

cognitive bias

science

trait

analytic style

cognitive style

is:preprint

conspiracy

research

behavioral science

lang:en

science denial

hypothesis evaluation

belief formation

hypothesis generation

problem solving

Annotators

lucyparfitt16

URL

psyarxiv.com/huxm7/
Apr 2021
psyarxiv.com psyarxiv.com

Single- or double-blind review? A field study of system preference, reliability, bias, and validity

1
1. n.parfitt 29 Apr 2021
  
  in BehSci
  
  Pleskac, T. J., Kyung, E., Chapman, G. B., & Urminsky, O. (2021, April 23). Single- or double-blind review? A field study of system preference, reliability, bias, and validity. https://doi.org/10.31234/osf.io/q2tkw
  
  lang:en is:preprint single-blind double-blind review field study system preference reliability bias validity scientists peer review evaluation fair quality popularity publication lottery
Visit annotations in context

Tags

review

publication

reliability

single-blind

is:preprint

popularity

peer review

scientists

field study

preference

evaluation

system

lottery

bias

double-blind

lang:en

fair

quality

validity

Annotators

n.parfitt

URL

psyarxiv.com/q2tkw/
Apr 2020
psyarxiv.com psyarxiv.com

Forensic Mental Health Practitioners’ Use of Structured Risk Assessment Instruments, Views About Bias in Risk Evaluations, and Strategies to Counteract It

1
1. edampf 28 Apr 2020
  
  in BehSci
  
  Kamorowski, J., de Ruiter, C., Schreuder, M., Ask, K., & Jelicic, M. (2020, April 16). Forensic Mental Health Practitioners’ Use of Structured Risk Assessment Instruments, Views About Bias in Risk Evaluations, and Strategies to Counteract It. https://doi.org/10.31234/osf.io/te5c2
  
  is:preprint lang:en cognitive bias debiasing strategy risk assessment forensics mental health evaluation the Netherlands bias blind spot
Visit annotations in context

Tags

cognitive bias

debiasing strategy

forensics

is:preprint

mental health

the Netherlands

lang:en

risk assessment

bias blind spot

evaluation

Annotators

edampf

URL

psyarxiv.com/te5c2/