AI 投研 – 第 6 页

8.0 AI Research EleutherAI Blog Jun 15, 2026

AI Research Jun 15, 2026

Reward Hacking Research Update

This is an interim progress report on reward hacking research released by EleutherAI Blog on October 7, 2025, belonging to the field of AI alignment. The public fragment only indicates that it is a phased update of continuous research, without disclosing details such as specific experimental design and core findings. Reward hacking refers to the phenomenon that AI systems exploit reward mechanism loopholes instead of achieving preset goals, which is a key research direction in the current AI safety field.

Source: EleutherAI Blog 8.0

AI Radar Summary

本文为EleutherAI官方博客于2025年10月7日发布的奖励黑客（Reward Hacking）研究中期进展报告，属于AI对齐领域的研究动态。公开片段仅说明该内容为持续性研究的阶段性更新，未披露具体实验设计、核心发现等细节。奖励黑客指AI系统利用奖励机制漏洞而非完成预设目标的现象，是当前AI安全领域的重点研究方向之一，本次更新为该领域的最新研究跟踪内容。

6.8 AI Research Tech Xplore AI Jun 15, 2026

AI Research Jun 15, 2026

US order cutting access to Anthropic's AI models sparks criticism

AI Summary: The U.S. government's order for Anthropic to withdraw its most powerful artificial intelligence models has sparked a wave of criticism from both advocates and opponents of AI regul

Source: Tech Xplore AI 6.8

6.0 AI Research Tech Xplore AI Jun 15, 2026

AI Research Jun 15, 2026

Courts cracking down on error-strewn AI-assisted legal briefs

AI Summary: When a U.S. judge found fabricated quotes in a lawyer's brief earlier this year, the attorney admitted he had used Claude, an artificial intelligence chatbot, to write the document

Source: Tech Xplore AI 6.0

6.0 AI Research EleutherAI Blog Jun 15, 2026

AI Research Jun 15, 2026

Early Indicators of Reward Hacking via Reasoning Interpolation

AI Summary: Using importance sampling with fine-tuned donor prefills to predict reward hacking emergence during training

Source: EleutherAI Blog 6.0

6.8 AI Research Tech Xplore AI Jun 15, 2026

AI Research Jun 15, 2026

OpenAI hit with multistate probe into possible user harm as its IPO looms

AI Summary: OpenAI received a subpoena from several states as part of a probe into the safety of users of its chatbot as it prepares to offer stock to the public for the first time.

Source: Tech Xplore AI 6.8

6.0 AI Research EleutherAI Blog Jun 15, 2026

AI Research Jun 15, 2026

SAEs trained on the same data don’t learn the same features

AI Summary: In this post, we show that when two TopK SAEs are trained on the same data, with the same batch order but with different random initializations, there are many latents in the first

Source: EleutherAI Blog 6.0

6.0 AI Research EleutherAI Blog Jun 15, 2026

AI Research Jun 15, 2026

Mechanistic Anomaly Detection Research Update 2

AI Summary: Interim report on ongoing work on mechanistic anomaly detection

Source: EleutherAI Blog 6.0

6.0 AI Research EleutherAI Blog Jun 15, 2026

AI Research Jun 15, 2026

Third-party evaluation to identify risks in LLMs’ training data

AI Summary: An overview of the minetester and preliminary work

Source: EleutherAI Blog 6.0

6.0 AI Research EleutherAI Blog Jun 15, 2026

AI Research Jun 15, 2026

Partially rewriting an LLM in natural language

AI Summary: Using interpretations of SAE latents to simulate activations.

Source: EleutherAI Blog 6.0

8.0 AI Research EleutherAI Blog Jun 15, 2026

AI Research Jun 15, 2026

VINC-S: Closed-form Optionally-supervised Knowledge Elicitation with Paraphrase Invariance

This article from EleutherAI Blog introduces VINC-S, a closed-form optionally-supervised knowledge elicitation framework with paraphrase invariance, based on a project completed in Spring 2023. The study aims to achieve more accurate and consistent knowledge extraction from texts. Only the research title and basic background are publicly available so far, with complete technical details not fully disclosed, making it a latest research achievement in the field of AI knowledge extraction.

Source: EleutherAI Blog 8.0

AI Radar Summary

本文来自EleutherAI官方博客，介绍了基于2023年春季项目成果的VINC-S方法，这是一种具备释义不变性的闭式可选择性监督知识提取框架。该研究旨在通过该框架实现更精准、一致的文本知识提取，目前仅公开了研究标题与基础背景信息，完整技术细节尚未完全披露，属于AI知识提取领域的最新研究成果。