斯坦福研究人员发现,在评估合同法问题时,法律教授有75%的次数更倾向于选择AI给出的答案,而非同行教授的答案。该研究让教授们针对40个真实学生提问撰写答案,并对近3000个人类与AI的回答进行了盲测比较。结果不仅显示AI胜出频率高,而且教授们仅将3.5%的AI答案标记为“有害”,而对人类答案的有害标记率为12%。这表明大语言模型并非只是流畅,其表现常能达到教授向学生解释法律模糊性的教学标准。
Stanford researchers found that law professors preferred AI answers over peer professor answers 75% of the time when judging contract-law help for students.
The study tested whether LLMs can handle a field where the answer is often not a fact, but a defensible argument built from rules, exceptions, and judgment.
The professors wrote 40 real student-style questions, gave their own answers, and then blindly judged nearly 3,000 comparisons between human and AI responses.
The striking result was not just that AI won often, but that professors marked AI answers as harmful only 3.5% of the time, compared with 12% for human answers.
i.e. the model was not merely sounding fluent, but often matching the teaching standard law professors use when explaining ambiguity to students.