- Jan 2023
-
arxiv.org arxiv.org
-
Figure 3. The average drop in log probability (perturbation discrep-ancy) after rephrasing a passage is consistently higher for model-generated passages than for human-written passages. Each plotshows the distribution of the perturbation discrepancy d (x, pθ , q)for human-written news articles and machine-generated arti-cles; of equal word length from models GPT-2 (1.5B), GPT-Neo-2.7B (Black et al., 2021), GPT-J (6B; Wang & Komatsuzaki (2021))and GPT-NeoX (20B; Black et al. (2022)). Human-written arti-cles are a sample of 500 XSum articles; machine-generated textis generated by prompting each model with the first 30 tokens ofeach XSum article, sampling from the raw conditional distribution.Discrepancies are estimated with 100 T5-3B samples.
quite striking here is the fact that more powerful/larger models are more capable of generating unusual or "human-like" responses - looking at the overlap in log likelihoods
-
if we apply small perturbations to a passagex ∼ pθ , producing ̃x, the quantity log pθ (x) − log pθ ( ̃x)should be relatively large on average for machine-generatedsamples compared to human-written text.
By applying small changes to text sample x, we should be able to find the log probs of x and the perturbed example and there should be a fairly big delta for machine generated examples.
-
As in prior work, we study a ‘white box’ setting (Gehrmannet al., 2019) in which the detector may evaluate the log prob-ability of a sample log pθ (x). The white box setting doesnot assume access to the model architecture or parameters.While most public APIs for LLMs (such as GPT-3) enablescoring text, some exceptions exist
The authors assume white-box access to the log probability of a sample \(log p_{\Theta}(x)\) but do not require access to the model's actual architecture or weights.
-
Empirically, we find predictive entropy to be positively cor-related with passage fake-ness more often that not; there-fore, this baseline uses high average entropy in the model’spredictive distribution as a signal that a passage is machine-generated.
this makes sense and aligns with the gltr - humans add more entropy to sentences by making unusual choices in vocabulary that a model would not.
-
We find that supervised detectors can provide similardetection performance to DetectGPT on in-distribution datalike English news, but perform significantly worse than zero-shot methods in the case of English scientific writing andfail altogether for German writing. T
supervised detection methods fail on out of domain examples whereas detectgpt seems to be robust to changes in domain.
-
ex-tending DetectGPT to use ensembles of models for scoring,rather than a single model, may improve detection in theblack box setting
DetectGPT could be extended to use ensembles of models allowing iot to work in black box settings where the log probs are unknown
-
hile in this work, we use off-the-shelfmask-filling models such as T5 and mT5 (for non-Englishlanguages), some domains may see reduced performanceif existing mask-filling models do not well represent thespace of meaningful rephrases, reducing the quality of thecurvature estimate.
The approach requires access to language models that can meaningfully and accurately rephrase (perturbate) the outputs from the model under evaluation. If these things do not align then it may not work well.
-
For models be-hind APIs that do provide probabilities (such as GPT-3),evaluating probabilities nonetheless costs money.
This does cost money to do for paid APIs and requires that log probs are made available.
-
We simulate human re-vision by replacing 5 word spans of the text with samplesfrom T5-3B until r% of the text has been replaced, andreport performance as r varies.
I question the trustworthiness of this simulation - human edits are probably going to be more sporadic and random.
-
Figure 5. We simulate human edits to machine-generated text byreplacing varying fractions of model samples with T5-3B gener-ated text (masking out random five word spans until r% of text ismasked to simulate human edits to machine-generated text). Thefour top-performing methods all generally degrade in performancewith heavier revision, but DetectGPT is consistently most accurate.Experiment is conducted on the XSum dataset
DetectGPT shows 95% AUROC for texts that have been modified by about 10% and this drops off to about 85% when text is changed up to 24%.
-
DetectGPT’s performancein particular is mostly unaffected by the change in languagefrom English to Germa
Performance of this method is robust against changes between languages (e.g. English to German)
-
ecause the GPT-3 API does not provideaccess to the complete conditional distribution for each to-ken, we cannot compare to the rank, log rank, and entropy-based prior methods
GPT-3 api does not expose the cond probs for each token so we can't compare to some of the prior methods. That seems to suggest that this method can be used with limited knowledge about the probabilities.
-
improving detection offake news articles generated by 20B parameterGPT-NeoX
The authors test their approach on GPT-NeoX. The question would be whether we can get hold of the log probs from ChatGPT to do the same
-
his approach, which we call DetectGPT,does not require training a separate classifier, col-lecting a dataset of real or generated passages, orexplicitly watermarking generated text. It usesonly log probabilities computed by the model ofinterest and random perturbations of the passagefrom another generic pre-trained language model(e.g, T5)
The novelty of this approach is that it is cheap to set up as long as you have the log probabilities generated by the model of interest.
-
See ericmitchell.ai/detectgptfor code, data, and other project information.
Code and data available at https://ericmitchell.ai/detectgpt
Tags
Annotators
URL
-