Ilya was right and predicted much of this
译Ilya 是对的,并且预测了其中很多。
This is so sad. I'm doomscrolling and everyone agrees it's horrible. So many people just want to build strong AI and safely deploy it. The government should facilitate this not axe it. I'm going to get some rest and hopefully can resume this goal tomorrow. Thanks all.
译这太让人难过了。 我一边刷屏一边看到所有人都觉得这很糟糕。 那么多人只是想打造强大的AI并安全地部署它。 政府应该为此提供便利,而不是砍掉它。 我要去休息一下,希望明天能继续这个目标。 谢谢大家。
Protesters wrote “STOP MUSK” on the road today in protest of SpaceX’s IPO. Stop Elon Musk from doing what exactly? • Helping people with paralysis regain independence and working to restore vision through Neuralink? • Accelerating the world’s transition to electric vehicles, solar energy and battery storage through Tesla? • Making rockets reusable and working to make humanity multiplanetary through SpaceX? • Connecting remote communities and restoring communications during disasters through Starlink? • Building maximum truth-seeking AI through xAI? • Restoring free speech on 𝕏? Elon Musk is dedicating his companies to advancing humanity and building a better future for civilization. Who is funding these paid protests against Elon and what are they so afraid of?
译针对抗议者在路面涂写“STOP MUSK”反对SpaceX IPO,推主逐一列举Elon Musk旗下公司的正面贡献:Neuralink帮助瘫痪者恢复独立与视力;Tesla加速电动汽车、太阳能及储能推广;SpaceX实现火箭可重复使用并推动人类多行星化;Starlink连接偏远社区并在灾害中恢复通信;xAI构建追求最大真相的AI;𝕏恢复言论自由。推主质疑这些抗议由谁资助,以及对方究竟在害怕什么。
So @Anthropic about to learn the @SpaceX ITAR/EAR lessons Will be very hard for non-nationals to work there and @OpenAI on frontier models. Suppose AGI is the ultimate dual purpose technology
译所以 @Anthropic 即将学习 @SpaceX 的 ITAR/EAR 教训 非国民将很难在那里以及 @OpenAI 的前沿模型岗位上工作。 假设 AGI 是终极双重用途技术。
Today is the first time our Intelligence Frontier chart has moved backward.
译今天是我们 Intelligence Frontier 图表首次出现回退。
好消息 Claude 重置了所有人的用量 快去看看 坏消息 我本来就是今天要重置的 特么的
译好消息 Claude 重置了所有人的用量 快去看看 坏消息 我本来就是今天要重置的 特么的
Not even much to say, I think the government way overstepped but we’ll see if they can substantiate the evidence (in which case Anthropic would tell us). Anthropic’s messaging was pushing government action, but this is insane and a bad action by USG for the AI trajectory.
译没什么好说的,我觉得政府过度干预了,但要看他们能否拿出证据(那样的话 Anthropic 会告诉我们)。 Anthropic 的消息曾推动政府行动,但这次太疯狂了,对 AI 发展而言是美国政府的一次糟糕举动。
A good time to remind people that in my time doing LLM research I feel like a minority of my colleagues are American citizens. It would be industry destroying to have to rebuild with segregation for frontier ai research to be legal.
译一个提醒人们的好时机:在我从事LLM研究期间,我感觉我的同事中只有少数是美国公民。如果前沿人工智能研究要合法地进行种族隔离,那将是毁灭行业的重建。
直接有人开源一键开启国行的Mac Siri AI,逻辑就是修改地区伪装美区。 地址:https://github.com/SkyBlue997/enableMacosAI
译开发者 SkyBlue997 在 GitHub 开源 enableMacosAI 工具,通过修改系统地区伪装美区来开启国行 Mac 的 Siri AI。此前有用户发现 macOS 的 GenerativeModels.plist 文件中存在 EnhancedSiriWaitlist 开关,关闭 SIP、挂载系统卷、修改键值并重启即可解锁 WWDC 新发布的 Siri AI 增强版。社区已整理出详细步骤,证明该 AI 能力早已内置,仅被等候名单屏蔽。
官方的手册指南解析,其实最适合学习的。
译官方的手册指南解析,其实最适合学习的。 [引用 @xiaohu]:http://x.com/i/article/2065389944034775040
卧槽!我们一开始就用错了Fable 5模型啊! 可以花几分钟看看原文还是有价值和启发的! 大多数人把Claude Fable 5当成更大上下文窗口的Sonnet 4.6在用,提个问,用5分钟,关标签页。 90%的用户从没跑过真正会复利可持续增长的Agent系统:每次运行都让下次更聪明,状态文件不断积累,技能持续打磨。 Fable 5是为连续运行数天设计的模型。 你却只用了几分钟。(我想说特么额度也不够啊!)😆 作者用14步构建自我改进系统,可以让你的Fable 5 起飞~ 一、Fable 5真正解锁了什么 1. Mythos级模型 - 2026年6月9日发布,首个公开的Mythos级模型(比Opus高一档)。 核心能力: • 数天级自主会话 • 内置自我验证 • 最复杂的代码工作 • 多阶段知识工作 2. 自我改进≠自我学习 - 模型权重不变,但系统环境会变聪明:每次会话写入经验教训,技能随边缘案例打磨,状态文件积累验证过的事实。 3. 复利堆栈:四层架构 • 第1层:原语(Fable 5本身、子Agent、工具) • 第2层:编排(目标循环、动态工作流、例程) • 第3层:记忆(状态文件、技能库、知识库) • 第4层:自我改进(视觉自检、评估循环、规则提炼) 4. 何时用哪个模型 - 按任务复杂度路由: • Fable 5:重型编排角色 • Opus 4.8:复杂但有界的子任务 • Sonnet 4.6:高频工人任务 • Haiku 4.5:评分子Agent 二、三个关键模式设计 5. /goal vs Outcomes + 验证器子Agent - 独立验证器优于自我批评。 6.模型评估自己的输出会偏向自己已写的结论。 7. 动态工作流 - 三个关键模式:扇出-综合、对抗验证、循环直到完成 8. Worktrees并行安全 - 多Agent并行工作时避免文件冲突 9. Routines长期编排 - 笔记本合上,Fable 5继续工作 三、自我改进层 10. 5阶段记忆进化:失败→调查→验证→提炼→查阅 • Sonnet 4.6止步于第1阶段 • Opus 4.7止步于第3阶段 • Fable 5能完成全流程 11. 状态文件 - 记忆实际存放的地方,包含5个部分对应5个阶段 12. 技能复利 - 把经验教训写进技能本身,而不只是聊天记录 13. 视觉自验证 - Fable 5用视觉检查UI输出是否符合目标 14. Mythos安全边界 - 在网络安全、生物、化学、模型蒸馏领域会自动降级到Opus 4. 把模型的能力发挥到真正需要的地方和适合自己的项目中,调优到最佳状态才是榨干最后一个token 最好的办法😄
译大多数用户将Claude Fable 5(首个公开Mythos级模型,2026年6月9日发布)当作更大上下文窗口的Sonnet 4.6单次提问使用,但Fable 5专为连续数天的Agent系统设计,支持自我改进:每次运行让下次更聪明,状态文件积累,技能持续打磨。文章提出14步构建自我改进系统,涵盖四层架构(原语、编排、记忆、自我改进)、任务路由(Fable 5用于重型编排,Opus 4.8负责复杂子任务,Sonnet 4.6高频工人,Haiku 4.5评分)、动态工作流模式以及5阶段记忆进化(失败→调查→验证→提炼→查阅)。在网络安全、生物、化学、模型蒸馏领域会自动降级到Opus 4。
为大规模训练 Composer 模型,Cursor 团队构建了始终运行的 Agent 舰队系统,本质是一个 Loop,实现数千个 Agent 的协同工作和自我管理 # 系统架构与工作原理 主 Agent(Fleet Manager): · 运行在大型远程机器上,配备本地常用工具 + 一个磁盘文件作为“inbox”(舰队共享收件箱) · 通过 SSH 连接数百台子 Agent 机器,收集状态并写入 inbox · 每轮循环检查舰队健康状况: · 保持健康任务后台运行 · 将故障/异常推送至 Slack 或 PagerDuty · 可主动控制舰队:终止、重启进程,处理瞬时故障 子 Agent:数百个并行运行的研究任务 Agent,专注于具体实验。 构建基础:基于 Cursor 此前公开的长运行 Agent 研究,赋予主 Agent 多项 Skills,这些技能编码了运行 ML 实验、审查监控结果等的隐性知识。 关键设计:使用 Cursor 自身产品,inbox 文件 + 良好 skills 实现状态共享和协调。
译Cursor 团队为训练 Composer 模型构建了一个始终运行的 Agent 舰队系统。主 Agent(Fleet Manager)在远程机器上运行,通过 SSH 连接数百台子 Agent 机器,利用本地工具和磁盘文件“inbox”实现状态共享与协调。每轮循环检查舰队健康,将故障推送至 Slack/PagerDuty,并主动终止或重启进程。子 Agent 并行执行研究实验。系统基于此前长运行 Agent 研究,主 Agent 拥有编码 ML 实验隐性知识的 Skills。核心是使用 Cursor 自身产品,通过 inbox 文件与 Skills 实现大规模 Agent 协同与自我管理。
lmk👀
译引用推文调侃至少价格包含了数据线,并询问该设备能否运行 MiniMax M3。主推文仅以“lmk👀”回应。
In ONE year, AI went from being able to solve ~none of the hardest math problems to solving almost ALL of them
译一年之内,AI从几乎无法解决任何最难数学问题,发展到几乎能解决所有它们。
NVIDIA just posted the first agentic AI benchmark results where GB300 NVL72 runs up to 20x more coding agents per megawatt than H200. Older inference benchmarks mostly ask how fast a system can produce tokens after one prompt. AgentPerf from Artificial Analysis, asks a harder question: how many agents can run at the same time while still feeling responsive. It tests a harder workload than normal LLM serving because an agent is not one request and one answer, but a long chain of model calls, code edits, command runs, tool delays, and growing context. The benchmark replays real coding-agent paths from public repos across 12+ programming languages, with request lengths from 5K to 131K tokens and an average near 27K tokens. NVIDIA says GB300 NVL72 reaches 61.4K concurrent agents per megawatt at the lowest service tier, while H200 reaches 2.6K. The gain comes from 72 GPUs acting like one rack-scale machine through NVLink, plus software that spreads MoE expert work, overlaps communication with compute, and keeps batches large. @NVIDIAAIDev
译NVIDIA 首次在 AgentPerf(由 Artificial Analysis 开发)中评测智能体 AI。该基准测试的不是传统 token 生成速度,而是每兆瓦可同时运行且保持响应性的编码智能体数量。工作负载模拟真实编码智能体路径(长链模型调用、代码编辑、命令运行、工具延迟、增长上下文),涵盖 12+ 编程语言,请求长度 5K–131K tokens(平均 27K)。结果:GB300 NVL72 在最低服务层每兆瓦达 61.4K 并发智能体,H200 仅为 2.6K(20 倍提升)。性能提升源于 72 GPU 通过 NVLink 组成的机架级系统,配合软件优化(MoE 专家分布、通信与计算重叠、大批量保持)。
今天凌晨五点的时候,我让 AI 帮我打磨一段文案,打磨三遍给我看。 AI 改完之后,我发现一遍比一遍讲究,但是一遍比一遍缺人味儿。 我已经用上最贵的 Claude Fable 5 了,还这样,让我很生气。 最后我跟 AI说,你改完之后,人味儿变少了。 我说不清什么是人味儿,只知道我感受不到文字背后的那个人了。 我们讨论很久,最后讨论出的结论是,AI 写的东西,背后缺少一个东西: 存在感。 人写的字背后站着一个具体的人,他在具体的位置上,付出过具体的代价。 然后把聊的关键信息做成了一个技能,这就是: 《人味儿写作心法.skill》 它特别适合自己写文章或口述后,让 AI 来改稿的场景。 开源免费发布。 给你的 Agent 装上它,让你的文字变得有人味儿 http://github.com/orange2ai/renwei-writing
译Oran Ge 让 Claude Fable 5 打磨文案三遍,发现改稿越来越讲究却缺“人味儿”。他与 AI 讨论后得出结论:人写的文字背后有“存在感”——作者在具体位置付出过具体代价,而 AI 无法复现。为此他制作了《人味儿写作心法.skill》,专用于自写文章或口述后让 AI 改稿的场景,旨在保留文字的人味。该技能已开源免费发布在 GitHub。
Yeah I'm going to have fun with this.
译我正在尝试一个智能体流程,将 Hyperframes 与 Gemini 视频分析结合起来,制作有趣的注释视频。是啊,这会很有意思。
How am I only now finding out about appshots? I was dragging screenshots into codex live a caveman.
译我怎么现在才发现appshots? 我之前还像个穴居人一样把截图拖进Codex Live。
最近跟藏师傅聊天,都感觉到深深的共鸣。 大众以为 AI 带来平权,但实际带来的是 K 型分化。 头部用户已经默认理解 Agent 的组成:文档、规则、memory、loop、MCP、CLI、工具调用、权限、安全沙箱、上下文工程、定时任务、心跳、文件系统、代码执行和 Skill。 普通用户只知道"Agent 能写代码"。 怎么办?把技能做好,是跨越鸿沟的唯一解法。 我们正在和藏师傅一起做一点实际的事情,让 Cola 帮助大众真正跨越鸿沟。
译AI带来的并非平权,而是K型分化。头部用户已默认理解Agent的组成:文档、规则、memory、loop、MCP、CLI、工具调用、权限、安全沙箱、上下文工程、定时任务、心跳、文件系统、代码执行和Skill;普通用户只知道"Agent能写代码"。做好Skill是跨越鸿沟的唯一解法。作者正与藏师傅一起通过Cola帮助大众真正跨越鸿沟。
我观察到身边朋友同事们的 ADHD 越来越严重了。 很容易被细小琐碎的事分散注意力, 反而对大问题视而不见。 把关掉通知,独自沉浸在一件完整的大事里,变得越来越不可能。 进入心流,也变得越来越难。 AI 的高速执行,还加重了这个问题。 每两三分钟一次的对话,是一次次注意力集中和注意力涣散的交替循环。 我们该如何来拯救自己的前额叶呢?
译观察到身边朋友同事的ADHD(注意力缺陷多动障碍)越来越严重:容易被琐事分散注意力,对大问题视而不见,关掉通知、沉浸大事变得不可能,进入心流也变难。AI的高速执行加重了这一问题——每两三分钟一次的对话,形成注意力集中与涣散的交替循环。推文最终发问:该如何拯救自己的前额叶?
IMO sth that is a bit overlooked but will become far more important in the future. GPT is 10-20x more token+cost effective for ~similar outcome.
译Peter Steinberger 指出 GPT 在 token 消耗和成本上比 Fable 高效 10-20 倍,且能达到相似结果。@thorstenball 的对比测试印证:让 Fable 和 deep^2 完成相同的 CLI、Web 服务器等多端功能,deep^2 花费 $20(首次未通过但可修复),Fable 运行 1 小时 40 分、花费 $350(首次成功)。后续追问后 Fable 总花费达 $457,deep^2 预计最多 $40,差距约 17 倍。
10 months later, I gave Claude Code with Fable the same brief, asking it to construct SimRefinery from surviving screenshots and documentation. Fully playable, with a learning mode & all sorts of sophistication. Look at the difference from the old version! https://simrefinery.netlify.app/
译10个月后,Ethan Mollick 再次向 Claude Code 和 Fable 下达同一指令——根据幸存截图和文档重建失传的 Maxis 模拟游戏 SimRefinery。新版本完全可玩,包含学习模式等多种复杂功能,与10个月前 ChatGPT Codex 仅凭一篇文章和截图快速搭建的可玩原型形成鲜明对比。当时他未写一行代码,仅偶尔提小修改请求。
How Lay Bankz turned a few keyboard notes into a psychedelic rock sample
译Lay Bankz 如何将几个键盘音符转变为一段迷幻摇滚采样。
I had already wondered how Apple manages to perform inference at Google while simultaneously protecting their privacy, essentially their unique selling point. The answer: the heaviest requests run on Blackwell B200s inside Google Cloud, with NVIDIA's Confidential Computing encrypting the data while it's processed, so neither Google nor Apple can see it. "NVIDIA Confidential Computing provides a hardware-based security layer for accelerated AI workloads. The technology protects data while it’s being processed by isolating workloads in trusted execution environments and enabling systems to cryptographically verify that the infrastructure has not been tampered with before any sensitive data is sent to the server."
译Kim解释Apple如何在Google Cloud上执行推理时保护隐私:最重的请求运行在Google Cloud的Blackwell B200s上,利用NVIDIA Confidential Computing提供基于硬件的安全层,将工作负载隔离在可信执行环境中加密处理数据,确保Google和Apple都无法看到数据。
Looking at the graph, I think Fable 5 will only maintain its lead up to GPT-5.6. And secondly, I think the benchmark will soon be completely saturated.
译观察图表,我认为 Fable 5 只会保持领先直到 GPT-5.6。 其次,我认为该基准测试很快就会完全饱和。
I'm messing around with an agent flow for combining Hyperframes with Gemini video analysis to make interesting annotated videos.
译我正在尝试一种智能体流程,将Hyperframes与Gemini视频分析相结合,制作有趣的注释视频。
The shape of the graph is getting very familiar.
译Claude Fable 5 在 FrontierMath 基准测试(Tiers 1-4, v2)中表现优异,Tiers 1-3 得分 87%,Tier 4 得分 88%,延续了 Anthropic 模型数学能力快速提升的趋势。主推文评论道:“图形的形状越来越熟悉了。”
Claude Fable 5 scores very well on FrontierMath: Tiers 1–4 (v2), reaching 87% on Tiers 1–3 and 88% on Tier 4. This continues a streak of Anthropic models improving rapidly at math.
译Claude Fable 5 在 FrontierMath(Tiers 1–4,v2)上得分很高,在 Tiers 1–3 上达到 87%,在 Tier 4 上达到 88%。这延续了 Anthropic 模型在数学上快速提升的趋势。
Fine-grained 3D motion control in AI video just got a little bit closer
译@andrew_n_carr 宣布“编辑视频运动!放弃提示开始导演”,并展示其“通用视频编辑器”工作流:先用 comic 4 捕捉视频,再用运动编辑器修改动作,最后用视频到视频模型(如 Runway、Gemini)重新渲染。他以时装片段为例,希望模特展现高抬腿活力,无需重拍。主推文 fofr 表示,AI视频中精细的3D运动控制已更近一步。
How to effectively run autonomous long-running coding agents? This is one of the most exciting discussions on agents I've ever had. I recorded it and am making it freely available. (bookmark it) The idea of autonomous long-running agents is a real thing. We talk about lots of things like /goal, /loop, and dynamic workflows, and what comes next. One interesting discussion was around how to make the agent run for longer while ensuring it stays on track. Most models today will struggle to coordinate work effectively. They sometimes pause the work early. Lots of mistakes happen, and lots of weird shortcuts (reward hacking). What helps is to be extremely clear about the goals it needs to achieve. To clarify the dos and don'ts clearly. Eliminate any assumptions you think the model would make. Deep expertise matters so much in this. But you can get far through careful planning. My formula currently is to use Opus 4.8 for planning carefully and GPT-5.5 for all executions. For the evaluator (via /goal), I am often using something like Deepseek or the latest models from Qwen, Kimi, and MiniMax, etc. Another insight we discussed to enforce goals is to provide strong visual cues for the agent to compare with. I found that a multimodal goal is a much stronger goal than a plain text one. And use agents to help you set clear goals. Watch here: https://academy.dair.ai/events/cmplo7v3b000e04l1pxprat4d
译DAIR.AI创始人Elvis Saravia分享如何有效运行长期自主编码智能体。他指出当前多数模型难以协调工作,会过早暂停、犯错或走捷径(reward hacking)。关键在于明确目标、消除假设,避免模型自行推断。他的实践公式:用Opus 4.8进行细致规划,GPT-5.5执行所有步骤,评估器(通过/goal)则使用Deepseek及Qwen、Kimi、MiniMax等最新模型。另一关键洞察是提供多模态视觉线索作为目标,比纯文本目标更强,能更好地约束智能体。完整讨论已录制并免费开放。
Victorian gothic nightmares, one Canvas workflow. See how @Shanzyin_ai built THE DREAM EATERS on PixVerse Canvas — nodes, shots, and the full project file, open to explore.
译PixVerse 展示 AI 电影制作人 @Shanzyin_ai 使用 Canvas 工作流创作的维多利亚哥特风格短片《THE DREAM EATERS》。短片包含完整节点、多个镜头及项目文件,开放探索。剧情设定为古老庄园中青少年被迫吞噬权贵噩梦,一名有缺陷的新兵将黑暗拖回现实。PixVerse 推出限时活动:转发+关注+回复“DREAM”,72 小时内可获得 150 Credits 及该工作流。
Google DeepMind published a 60-page paper mapping the road from AGI to superintelligence, written by Hutter, Legg, and Genewein. No hype, just a sober analysis The paper uses three levels. AGI = roughly average human performance across most cognitive tasks. ASI = a system that beats large, well-coordinated groups of human experts across virtually everything (their bar: tens of thousands of experts working ten years on one problem). Universal AI / AIXI = the theoretical ceiling, uncomputable, only approachable from below. Then they explore the question of how this could be achieved: Scaling compute, models, and data, the continuation of the trend that drove the breakthrough so far. It is the only path with historical data available for extrapolation. The core question: Does quantity transform into quality? Even if individual models plateau, the sheer act of running millions of faster AGI instances could trigger the leap. (A quick aside: that is a fascinating philosophical idea. It always reminds me of Hegel’s dialectic, the notion that quantity transforms into quality. We ought to start drawing on philosophical theories to make sense of the future.) Algorithmic paradigm shifts: a genuine break from the transformer pretraining paradigm. New architectures, new learning methods. However, hard to predict by definition. Recursive self-improvement: AI accelerates AI research, which produces better AI, which accelerates research further. Multi-agent coordination: superintelligence emerges from large collectives of AGI agents working together, like automated corporations or AI economies. Collective intelligence potentially far exceeding any individual model. The authors naturally point to what I repeatedly describe as the biggest bottleneck: energy. I recently linked to a few graphs showing, on the one hand, the extent to which energy is already becoming a problem and, on the other, how China dominates the expansion of both nuclear and solar energy in the global race. But the authors also address a profound shift in the world of work in a post-AGI era. I would say this is a reality we must face. So, it is not just about scaling, but also about whether the underlying conditions - such as energy and hardware - can be effectively established. Six things that could slow or stop all of this: The data wall. Quality training data runs out, possibly before the end of this decade. Resource demand grows too fast. Energy, chips, rare earths, investment. The physical infrastructure can't scale arbitrarily. The neural paradigm hits a ceiling. Pretrained transformers plus fine-tuning may not be enough to reach AGI, let alone go beyond it. Research gets harder. Keeping Moore's law going already needs 18x more researchers than in the 1970s. Ideas are genuinely harder to find as fields mature. The abstraction barrier. Models trained on human concepts may never invent new ones from scratch. Saturating GPQA or SWE-bench shows mastery of what humans already worked out, not the ability to go beyond it. Train only on pre-Newtonian physics and you won't reason your way to relativity. Deliberate slowdown. Regulation, accidents, public backlash. Real, but likely countered by the competitive pressure between companies and nations. I think it’s great that Google is addressing questions such as which paths they believe lead to AGI, what the road to ASI might look like, what challenges will arise, and much more. Overall, however, it sounds to me like all of this could actually succeed, making it, in that sense, a call to discuss and reflect on the consequences.
译Google DeepMind发表60页论文,由Hutter、Legg、Genewein撰写,定义AGI(多数认知任务达平均人类水平)、ASI(超越大量专家协作)和不可计算的AIXI三个层级。实现路径包括规模扩展、算法突破、递归自我改进和多智能体协调,瓶颈在于能源与硬件。六种阻碍:高质量数据可能本十年内耗尽、资源需求过快、神经范式天花板、研究难度激增(维持摩尔定律需18倍于1970年代的研究者)、模型无法创造全新概念、人为放缓。作者认为这是对AGI后果的严肃反思呼吁。
I asked Claude Fable 5 to reverse engineer a 1993 DOS game with no source code. It read the raw machine code, rewrote the engine in C, and gave me a fully editable port for every platform. 30 min from EXE to iPhone. Sharing it all so you can revive your own childhood games!
译我让Claude Fable 5逆向工程了一款1993年的DOS游戏,没有源代码。 它读取了原始机器码,用C重写了引擎,并给了我一个完全可编辑的移植版,适用于每个平台。 从EXE到iPhone,30分钟。 分享这一切,让你也能复活自己的童年游戏!
derivation of policy gradient: https://rlhfbook.com/c/06-policy-gradients#deriving-the-policy-gradient
译策略梯度推导: https://rlhfbook.com/c/06-policy-gradients#deriving-the-policy-gradient
Holy, no way! (/s)
译据 The Information 报道,OpenAI 正在准备一个新 AI 模型。主推文回应:“天哪,不会吧!(/s)”
codex users! how have you found codex'x ability to use (correctly) computer use / chrome extension / in app browser? if you want to give us feedback leave a comment and I'll organize it for the team!
译codex 用户们! 你们觉得 codex 在(正确)使用电脑/Chrome 扩展/应用内浏览器方面的能力怎么样?如果想给我们反馈,请留下评论,我会整理给团队的!
World models can now create imagined experiences for AI—environments where agents continuously learn, adapt, and improve. We suspect multi-agent interaction may be a critical ingredient for recursive AI and general intelligence. https://odyssey.ml/the-era-of-multi-agent-imagined-experience
译世界模型现在可以为AI创造想象体验——智能体在其中持续学习、适应和提升的环境。 我们推测多智能体交互可能是递归AI和通用智能的关键要素。
http://x.com/i/article/2065439304785039360 # Building recursive agent systems At Cursor, we run thousands of agents to help us train the next version of Composer. We give them research tasks, and if they aren't succeeding or run into issues, they DM us on Slack or page us via PagerDuty. ## Scaling training for Composer We’ve built an org chart of agents that work together. As we’ve scaled training for Composer, we’ve wanted to run thousands more experiments. This was possible before, but it was slow and hard to keep track of every experiment’s status. To speed things up and parallelize work, we built an always-running agent system (yes, it's a loop). ## An agent system for research Here’s how the system works: 1. The main agent runs on a massive remote machine with all the tools you'd use locally, plus a file on disk acting as an “inbox” for the fleet. 1. It SSHes into machines running hundreds of child agents and collects their statuses into the inbox. 1. On every loop, it checks fleet health, keeps healthy tasks running in the background, and surfaces anything broken to the team on Slack. 1. Like all infra, the agents occasionally hit transient issues or need to be poked, so the main agent can control the whole fleet, quitting or restarting processes as needed. This “fleet manager” builds on our previously published research on long-running agents. We’ve given the manager many different skills that encode tacit knowledge for how to run ML experiments, review and monitor results, and more. ## Researchers with superpowers Training a great model means trying a bunch of ideas for creating useful RL data. A single laptop is not enough here, you really want an army of computers in the cloud to run experiments in parallel. And since we aren't compute-constrained, we rolled out this infra for everyone in ML. Researcher time is our scarcest resource and we’ve found a way to scale their leverage by orders of magnitude. Imagine if you had a human manager with 10,000 direct reports. Obviously that wouldn’t work well, but this human → agent “org” kind of does! If you have a problem that is verifiable, where throwing more tokens at it will solve it faster or better, it’s worth considering building a system like this. It’s enabled us to have swarms of agents crawling through Composer’s data to recursively improve itself for future versions. And if this sounds exciting, we’re hiring!
译Cursor 为训练下一代 Composer,构建了一个始终运行的递归智能体系统。主智能体在远程机器上通过 SSH 管理数百个子智能体,将状态收集到磁盘“收件箱”,循环检查集群健康并保持任务运行,通过 Slack 向团队报告问题。主智能体具备多种技能用于运行和监控 ML 实验。研究人员可并行运行数千个实验,大幅提升效率。对于可验证的问题,投入更多 tokens 能更快解决。
针对抗议者在路面涂写“STOP MUSK”反对SpaceX IPO,推主逐一列举Elon Musk旗下公司的正面贡献:Neuralink帮助瘫痪者恢复独立与视力;Tesla加速电动汽车、太阳能及储能推广;SpaceX实现火箭可重复使用并推动人类多行星化;Starlink连接偏远社区并在灾害中恢复通信;xAI构建追求最大真相的AI;𝕏恢复言论自由。推主质疑这些抗议由谁资助,以及对方究竟在害怕什么。
我尼玛!苹果现在安全性真的差! 世界果真就是草台班子、谁也不例外! 苹果新Siri AI的等候名单? 国外大神Mac用户直接一顿操作就把AI增强版给硬解锁了,名单这东西瞬间成了笑话。 WWDC刚秀完新Siri,大家还在排队等官方推送,结果有...
http://x.com/i/article/2065389944034775040
大多数用户将Claude Fable 5(首个公开Mythos级模型,2026年6月9日发布)当作更大上下文窗口的Sonnet 4.6单次提问使用,但Fable 5专为连续数天的Agent系统设计,支持自我改进:每次运行让下次更聪明,状态文件积累,技能持续打磨。文章提出14步构建自我改进系统,涵盖四层架构(原语、编排、记忆、自我改进)、任务路由(Fable 5用于重型编排,Opus 4.8负责复杂子任务,Sonnet 4.6高频工人,Haiku 4.5评分)、动态工作流模式以及5阶段记忆进化(失败→调查→验证→提炼→查阅)。在网络安全、生物、化学、模型蒸馏领域会自动降级到Opus 4。
http://x.com/i/article/2065077530571264000
Cursor 团队为训练 Composer 模型构建了一个始终运行的 Agent 舰队系统。主 Agent(Fleet Manager)在远程机器上运行,通过 SSH 连接数百台子 Agent 机器,利用本地工具和磁盘文件“inbox”实现状态共享与协调。每轮循环检查舰队健康,将故障推送至 Slack/PagerDuty,并主动终止或重启进程。子 Agent 并行执行研究实验。系统基于此前长运行 Agent 研究,主 Agent 拥有编码 ML 实验隐性知识的 Skills。核心是使用 Cursor 自身产品,通过 inbox 文件与 Skills 实现大规模 Agent 协同与自我管理。
http://x.com/i/article/2065439304785039360
At least the cable is included in the price! Now can it run MiniMax M3?
Claude Fable 5 scores very well on FrontierMath: Tiers 1-4 (v2), reaching 87% on Tiers 1-3 and 88% on Tier 4. This conti...
NVIDIA 首次在 AgentPerf(由 Artificial Analysis 开发)中评测智能体 AI。该基准测试的不是传统 token 生成速度,而是每兆瓦可同时运行且保持响应性的编码智能体数量。工作负载模拟真实编码智能体路径(长链模型调用、代码编辑、命令运行、工具延迟、增长上下文),涵盖 12+ 编程语言,请求长度 5K–131K tokens(平均 27K)。结果:GB300 NVL72 在最低服务层每兆瓦达 61.4K 并发智能体,H200 仅为 2.6K(20 倍提升)。性能提升源于 72 GPU 通过 NVLink 组成的机架级系统,配合软件优化(MoE 专家分布、通信与计算重叠、大批量保持)。
Oran Ge 让 Claude Fable 5 打磨文案三遍,发现改稿越来越讲究却缺“人味儿”。他与 AI 讨论后得出结论:人写的文字背后有“存在感”——作者在具体位置付出过具体代价,而 AI 无法复现。为此他制作了《人味儿写作心法.skill》,专用于自写文章或口述后让 AI 改稿的场景,旨在保留文字的人味。该技能已开源免费发布在 GitHub。
I'm messing around with an agent flow for combining Hyperframes with Gemini video analysis to make interesting annotated...
AI带来的并非平权,而是K型分化。头部用户已默认理解Agent的组成:文档、规则、memory、loop、MCP、CLI、工具调用、权限、安全沙箱、上下文工程、定时任务、心跳、文件系统、代码执行和Skill;普通用户只知道"Agent能写代码"。做好Skill是跨越鸿沟的唯一解法。作者正与藏师傅一起通过Cola帮助大众真正跨越鸿沟。
http://x.com/i/article/2065096982310567936
观察到身边朋友同事的ADHD(注意力缺陷多动障碍)越来越严重:容易被琐事分散注意力,对大问题视而不见,关掉通知、沉浸大事变得不可能,进入心流也变难。AI的高速执行加重了这一问题——每两三分钟一次的对话,形成注意力集中与涣散的交替循环。推文最终发问:该如何拯救自己的前额叶?
Day 3 with Fable. Gave a huge prompt to implement a feature across CLI, web server, and another server to both Fable and...
I gave ChatGPT Codex an article & screenshot from a famous, lost Maxis simulation, SimRefinery, and asked it to create i...
Kim解释Apple如何在Google Cloud上执行推理时保护隐私:最重的请求运行在Google Cloud的Blackwell B200s上,利用NVIDIA Confidential Computing提供基于硬件的安全层,将工作负载隔离在可信执行环境中加密处理数据,确保Google和Apple都无法看到数据。
Claude Fable 5 scores very well on FrontierMath: Tiers 1-4 (v2), reaching 87% on Tiers 1-3 and 88% on Tier 4. This conti...
Claude Fable 5 scores very well on FrontierMath: Tiers 1-4 (v2), reaching 87% on Tiers 1-3 and 88% on Tier 4. This conti...
EDIT MOTION IN VIDEOS!!! Quit prompting and start directing I've been shouting for YEARS about 3D as the control layer. ...
DAIR.AI创始人Elvis Saravia分享如何有效运行长期自主编码智能体。他指出当前多数模型难以协调工作,会过早暂停、犯错或走捷径(reward hacking)。关键在于明确目标、消除假设,避免模型自行推断。他的实践公式:用Opus 4.8进行细致规划,GPT-5.5执行所有步骤,评估器(通过/goal)则使用Deepseek及Qwen、Kimi、MiniMax等最新模型。另一关键洞察是提供多模态视觉线索作为目标,比纯文本目标更强,能更好地约束智能体。完整讨论已录制并免费开放。
An ancient estate. Teenagers forced to devour the nightmares of the powerful. One defective recruit who drags the darkne...
Google DeepMind发表60页论文,由Hutter、Legg、Genewein撰写,定义AGI(多数认知任务达平均人类水平)、ASI(超越大量专家协作)和不可计算的AIXI三个层级。实现路径包括规模扩展、算法突破、递归自我改进和多智能体协调,瓶颈在于能源与硬件。六种阻碍:高质量数据可能本十年内耗尽、资源需求过快、神经范式天花板、研究难度激增(维持摩尔定律需18倍于1970年代的研究者)、模型无法创造全新概念、人为放缓。作者认为这是对AGI后果的严肃反思呼吁。
derivation of Policy Gradient.
OpenAI is preparing a new AI model, per The Information
Cursor 为训练下一代 Composer,构建了一个始终运行的递归智能体系统。主智能体在远程机器上通过 SSH 管理数百个子智能体,将状态收集到磁盘“收件箱”,循环检查集群健康并保持任务运行,通过 Slack 向团队报告问题。主智能体具备多种技能用于运行和监控 ML 实验。研究人员可并行运行数千个实验,大幅提升效率。对于可验证的问题,投入更多 tokens 能更快解决。