4 Matching Annotations
  1. Last 7 days
    1. Zhang and Abernethy (2025) propose deploying LLMs as quality checkers to surface critical problems instead of

      Is this the only empirical work? I thought there were others underway. Worth our digging into. Fwiw I can do an elicit.org query.