Hypothesis

4 Matching Annotations

Apr 2026
aisle.com aisle.com

https://aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier

1
1. fxp007 17 Apr 2026
  
  in Public
  
  Only GPT-OSS-120b is perfectly reliable in both directions (in our 3 re-runs of each setup). Most models that find the bug also false-positive on the fix, fabricating arguments about signed-integer bypasses that are technically wrong.
  
  这一结果揭示了AI模型在识别已修复代码方面的局限性，许多模型虽然能检测漏洞，但错误地将已修复代码标记为仍有问题。这强调了在AI安全系统中需要额外的验证和人工审核层，以确保结果的准确性和可靠性。
  
  model-reliability false-positives security-validation
Visit annotations in context

Tags

model-reliability

false-positives

security-validation

Annotators

fxp007

URL

aisle.com/blog/ai-cybersecurity-after-mythos-the-jagged-frontier
Dec 2020
psyarxiv.com psyarxiv.com

Putting psychology to the test: Rethinking model evaluation through benchmarking and prediction

1
1. marta_radosevic 01 Dec 2020
  
  in BehSci
  
  Rocca, R., & Yarkoni, T. (2020). Putting psychology to the test: Rethinking model evaluation through benchmarking and prediction. PsyArXiv. https://doi.org/10.31234/osf.io/e437b
  
  is:preprint lang:en psychology model evaluation benchmarking machine learning reliability modelling utility prediction
Visit annotations in context

Tags

prediction

psychology

lang:en

reliability

utility

model evaluation

is:preprint

benchmarking

machine learning

modelling

Annotators

marta_radosevic

URL

psyarxiv.com/e437b/
Oct 2020
www.nature.com www.nature.com

Three questions to ask before using model outputs for decision support

1
1. ErikStuchly 08 Oct 2020
  
  in BehSci
  
  Grimm, V., Johnston, A. S. A., Thulke, H.-H., Forbes, V. E., & Thorbek, P. (2020). Three questions to ask before using model outputs for decision support. Nature Communications, 11(1), 4959. https://doi.org/10.1038/s41467-020-17785-2
  
  is:article lang:en COVID-19 modeling screening model output assisted decision making transparency robustness reliability metascience empirical evidence
Visit annotations in context

Tags

metascience

model output

assisted decision making

is:article

lang:en

reliability

transparency

robustness

empirical evidence

modeling

COVID-19

screening

Annotators

ErikStuchly

URL

nature.com/articles/s41467-020-17785-2
Apr 2020
psyarxiv.com psyarxiv.com

Improving the Utility of Non-Significant Results for Educational Research

1
1. edampf 30 Apr 2020
  
  in BehSci
  
  Edelsbrunner, P. A., & Thurn, C. (2020, April 22). Improving the Utility of Non-Significant Results for Educational Research. https://doi.org/10.31234/osf.io/j93a2
  
  is:preprint lang:en competence model equivalence testing framework misinterpretation non-significant results research data analysis education policy theory practice reliability
Visit annotations in context

Tags

policy

non-significant

education

equivalence testing

practice

lang:en

misinterpretation

reliability

theory

results

competence model

framework

is:preprint

research

data analysis

Annotators

edampf

URL

psyarxiv.com/j93a2/