Hypothesis

We formalise a metric, the Pluralistic Repair Score (PRS), distinguishing principled revision from capitulation

This is a surprisingly pragmatic turn. Instead of just measuring diversity of output (which can be gamed), it proposes measuring the quality of disagreement. This introduces a normative standard for how an AI should change its mind—on principle, not on pressure—which is a radical departure from the typical RLHF goal of user satisfaction.

Novel Metric Principled Revision

Tags

Annotators

URL