频道

AI 投研

AI 赛道深度、公司拆解、概念解读和周报月报。

8.0 AI 投研 EleutherAI Blog 2026-06-15

AI 投研 2026-06-15

Reward Hacking Resarch Update

本文为EleutherAI官方博客于2025年10月7日发布的奖励黑客（Reward Hacking）研究中期进展报告，属于AI对齐领域的研究动态。公开片段仅说明该内容为持续性研究的阶段性更新，未披露具体实验设计、核心发

Reward Hacking Resarch Update

来源: EleutherAI Blog 8.0

AI Radar Summary

本文为EleutherAI官方博客于2025年10月7日发布的奖励黑客（Reward Hacking）研究中期进展报告，属于AI对齐领域的研究动态。公开片段仅说明该内容为持续性研究的阶段性更新，未披露具体实验设计、核心发现等细节。奖励黑客指AI系统利用奖励机制漏洞而非完成预设目标的现象，是当前AI安全领域的重点研究方向之一，本次更新为该领域的最新研究跟踪内容。

6.8 AI 投研 Tech Xplore AI 2026-06-15

AI 投研 2026-06-15

US order cutting access to Anthropic’s AI models sparks criticism

AI 摘要：The U.S. government's order for Anthropic to withdraw its most powerful artificial intelligence models h

US order cutting access to Anthropic’s AI models sparks criticism

来源: Tech Xplore AI 6.8

6.0 AI 投研 Tech Xplore AI 2026-06-15

AI 投研 2026-06-15

Courts cracking down on error-strewn AI-assisted legal briefs

AI 摘要：When a U.S. judge found fabricated quotes in a lawyer's brief earlier this year, the attorney admitted h

Courts cracking down on error-strewn AI-assisted legal briefs

来源: Tech Xplore AI 6.0

6.0 AI 投研 EleutherAI Blog 2026-06-15

AI 投研 2026-06-15

Early Indicators of Reward Hacking via Reasoning Interpolation

AI 摘要：Using importance sampling with fine-tuned donor prefills to predict reward hacking emergence during trai

Early Indicators of Reward Hacking via Reasoning Interpolation

来源: EleutherAI Blog 6.0

6.8 AI 投研 Tech Xplore AI 2026-06-15

AI 投研 2026-06-15

OpenAI hit with multistate probe into possible user harm as its IPO looms

AI 摘要：OpenAI received a subpoena from several states as part of a probe into the safety of users of its chatbo

OpenAI hit with multistate probe into possible user harm as its IPO looms

来源: Tech Xplore AI 6.8

6.0 AI 投研 EleutherAI Blog 2026-06-15

AI 投研 2026-06-15

SAEs trained on the same data don’t learn the same features

AI 摘要：In this post, we show that when two TopK SAEs are trained on the same data, with the same batch order bu

SAEs trained on the same data don’t learn the same features

来源: EleutherAI Blog 6.0

6.0 AI 投研 EleutherAI Blog 2026-06-15

AI 投研 2026-06-15

Mechanistic Anomaly Detection Research Update 2

AI 摘要：Interim report on ongoing work on mechanistic anomaly detection

Mechanistic Anomaly Detection Research Update 2

来源: EleutherAI Blog 6.0

6.0 AI 投研 EleutherAI Blog 2026-06-15

AI 投研 2026-06-15

Third-party evaluation to identify risks in LLMs’ training data

AI 摘要：An overview of the minetester and preliminary work

Third-party evaluation to identify risks in LLMs’ training data

来源: EleutherAI Blog 6.0

6.0 AI 投研 EleutherAI Blog 2026-06-15

AI 投研 2026-06-15

Partially rewriting an LLM in natural language

AI 摘要：Using interpretations of SAE latents to simulate activations.

Partially rewriting an LLM in natural language

来源: EleutherAI Blog 6.0

8.0 AI 投研 EleutherAI Blog 2026-06-15

AI 投研 2026-06-15

VINC-S: Closed-form Optionally-supervised Knowledge Elicitation with Paraphrase Invariance

本文来自EleutherAI官方博客，介绍了基于2023年春季项目成果的VINC-S方法，这是一种具备释义不变性的闭式可选择性监督知识提取框架。该研究旨在通过该框架实现更精准、一致的文本知识提取，目前仅公开了研究标题与基

VINC-S: Closed-form Optionally-supervised Knowledge Elicitation with Paraphrase Invariance

来源: EleutherAI Blog 8.0

AI Radar Summary

本文来自EleutherAI官方博客，介绍了基于2023年春季项目成果的VINC-S方法，这是一种具备释义不变性的闭式可选择性监督知识提取框架。该研究旨在通过该框架实现更精准、一致的文本知识提取，目前仅公开了研究标题与基础背景信息，完整技术细节尚未完全披露，属于AI知识提取领域的最新研究成果。