wrp 10 hours ago

TFA should be compared with Tufts (2025) A Practical Examination of AI-Generated Text Detectors for Large Language Models.[0] Tufts found that automated detection is very unreliable, while Russell found the opposite for human evaluators.

The explanation for the difference is that automated discrimination has relied mainly on structural factors such as average sentence/paragraph length and frequency of stock words/phrases and certain parts of speech. Human evaluators look at content factors such as repetition of ideas, less precise wording, generalizations rather than concrete examples, overall conceptual coherence, and factual errors.

https://arxiv.org/abs/2412.05139