OpenAI Developers@OpenAIDevs · 3天前53Invite a friend to Codex and add another reset to the bank.
When they send their first Codex message, you’ll both bank one to use when you need it.
Rate limit banking is rolling out to Go, Plus, Pro, and Business users, with the first reset on us.
https://x.com/OpenAI/status/2065225362544726371
译OpenAI 为 Codex 推出“邀请好友”功能:用户邀请朋友加入 Codex,对方发送第一条消息后,双方各获得一次速率限制(rate limit)重置额度,可存入“银行”留待需要时使用。该功能即日起向 Go、Plus、Pro 和 Business 用户逐步开放,每人首条重置免费。引用推文指出,用户现在可自主保存 rate limit 重置,不再受限于固定时间点。
OpenAI@OpenAI · 3天前70We heard you wanted to use Codex rate limit resets on your own time.
Starting today, we’re rolling out the ability to save rate limit resets to use later.
We’re starting Go, Plus, Pro, and Business users with one free reset:
译我们听说您希望能在自己方便的时候使用 Codex 速率限制重置。
从今天起,我们开始推出将速率限制重置保留到以后使用的功能。
我们从 Go、Plus、Pro 和 Business 用户开始,每人提供一次免费重置:
Rohan Paul@rohanpaul_ai · 3天前67Dario Amodei's new interview on Bloomberg: The scary part is not when AI does 90% of the job. It is what happens when it learns the last 10%.
"We’re already starting to see the beginning of it. There may be some people that it’s not making more productive, and it’s better for the AI to just do the whole thing."
And on that topic Claude Code creator Boris Cherny says: "it's very uncomfortable.
Artificial intelligence is this force that is far bigger than we are"
---
@bbgoriginals
From "Bloomberg Originals" YouTube channel, (link in comment)
译Anthropic CEO Dario Amodei 在 Bloomberg 采访中表示,AI 的可怕之处不在于它完成90%的工作,而是学会最后10%的时候。他指出,对于某些人来说,AI 可能不会让他们更高效,不如让 AI 直接做全部。Claude Code 创始人 Boris Cherny 对此评论称,这种感觉非常不舒服,AI 是远比我们强大的力量。
karminski-牙医@karminski3 · 3天前56我的使用经验是, one-pass 能力越强(且能在较少的思考下one-pass) 模型才是SOTA的. 要用 agentic coding 才能修复第一次犯的错反而是模型拉夸的表现, 再不济也要在Interleaved thinking过程中修复. agentic coding 是用来解决工程量和运行时问题的. 不是用来修静态检查就行发现的bug的.更简单的说, 你有bug不在thinking中修, 反而非要在n+1次上下文中修复, 是不是骗我买coding plan(x)?
译karminski认为,one-pass能力强(少思考即正确)的模型才是SOTA;需用agentic coding修复首次错误反显模型差,bug应在thinking中修复,而非依赖n+1次上下文,否则有诱导购买coding plan之嫌。@iamai_omni建议测评转向长期任务一致性,可构建loop测评,重点看后续几轮修复表现。
karminski-牙医@karminski3 · 3天前65单卡 700TPS! Diffusion Gemma 来了!
Google 刚刚发布了 Gemma 小模型的 Diffusion 版本! 大小26B, 激活参数量4B, 最重要的是, 这次还跟 NVIDIA 合作针对4090和5090优化了一波, 5090每秒能生成700+token!
给不知道什么是 Diffusion 大模型的同学科普一下, 传统大模型都是一个字一个字吐出来的, 而 Diffusion 大模型则是如同刮奖一样, 是一片一片出来的, 速度高是 Diffusion 大模型的优点.
有得必有失, 缺点当然就是输出质量没有传统大模型好了. 不过这次的 Diffusion Gemma 还是比之前的 Diffusion 文本大模型好不少, AIME 2026(数学能力测试) 能达到 Gemma4-26B-A4B 的94%的水平, 最差的是tau2 bench(考验Agent能力的测试), 也能达到82%.
这个模型大小 4bit 量化版本 16G 显存就能运行了, 另外, 我突发奇想, 这个模型能不能作为 gemma4 dense 模型的草稿模型用来投机解码? 感兴趣的同学可以试试!
#diffusiongemma #gemma #gemma4 #google
译Google 推出 Diffusion Gemma,大小 26B、激活参数量 4B,与 NVIDIA 合作针对 RTX 4090/5090 优化,5090 上速度达 700+ token/s。该扩散文本模型以“刮奖式”并行生成而非逐 token 生成,输出质量略逊但优于此前同类模型:AIME 2026(数学)达 Gemma4-26B-A4B 的 94%,tau2 bench(Agent)达 82%。4bit 量化版仅需 16G 显存即可运行。
SemiAnalysis@SemiAnalysis_ · 3天前66Pretraining fundamentally does not make sense anymore for anyone other than frontier labs. Although there are a lot of people at enterprises & startups who have "Pretrainitis" to show “impact” and get promotions, fundamentally, it doesn’t make sense.
There is probably higher ROI in partnering with a frontier lab to do prompt engineering, although it isn’t as “sexy” as pretraining.
译预训练从根本上说对前沿实验室以外的任何人都不再有意义。虽然企业和初创公司中有很多人患有"预训练症"以显示"影响力"并获得晋升,但从根本上说,这并不合理。与前沿实验室合作进行提示工程可能会有更高的投资回报率,尽管它不像预训练那样"性感"。
Ethan Mollick@emollick · 3天前61This is an interesting test, and the frontier models (GPT-5.5 Pro Extended, Claude 5 Fable Max) do fail. They refuse to turn the "three words" into "four" if that fits better
Prompting the AI to act like a translator surfaces the problem, but it still avoids changing the wording
译Ethan Mollick 指出,GPT-5.5 Pro Extended 和 Claude 5 Fable Max 在 Beninatto‑Trombetti 翻译测试中失败。该测试要求将“Solo 3 parole: non sei solo”译为英语,同时将 meta‑linguistic 声明从“3 parole”更新为“4 words”(正确译文:“Just 4 words: you are not alone”)。但前沿模型拒绝修改措辞,即使提示扮演翻译角色仍回避变更。Valerio Capraro 认为,Claude 5 Fable 作为最新 LLM 仍无法通过此简单测试,说明 LLM 擅重组已知知识但缺乏真正理解,AGI 仍遥远。
Google Gemini@GeminiApp · 3天前45Get a closer look at Gemini's new Neural Expressive design language at our next Discord community event.
Product Marketing Manager Megan C. will be discussing some of her favorite highlights that help improve the Gemini experience, from dynamic visual responses to seamless mode switching.
👉Join the Discord to watch live: http://discord.gg/gemini
📅 This Friday, June 12 at 11:30 AM PT
译Get a closer look at Gemini's new Neural Expressive design language at our next Discord community event.
在我们的下一次 Discord 社区活动中,近距离了解 Gemini 全新的 Neural Expressive 设计语言。
Product Marketing Manager Megan C. will be discussing some of her favorite highlights that help improve the Gemini experience, from dynamic visual responses to seamless mode switching.
产品营销经理 Megan C. 将讨论她最喜欢的一些亮点,这些亮点有助于改善 Gemini 体验,从动态视觉响应到无缝模式切换。
👉Join the Discord to watch live: http://discord.gg/gemini
👉加入 Discord 观看直播:http://discord.gg/gemini
📅 This Friday, June 12 at 11:30 AM PT
📅 本周五,6月12日,太平洋时间上午11:30
Rohan Paul@rohanpaul_ai · 3天前62This paper shows an AI improving itself better when it rewrites its setup and updates its model.
The problem is that most AI progress still depends on people changing prompts, tools, code, training data, and model weights by hand.
The paper’s idea is SIA, a loop where one AI watches how a task agent performs, then either changes the agent’s outer setup or trains the model itself.
The outer setup means things like prompts, tools, retry rules, and output parsing, while weight updates mean changing the model’s learned behavior through task feedback.
The loop works like this: the task agent tries many answers or programs, the verifier scores them, and those scores become training feedback.
Then the system updates a small add-on set of weights called LoRA weights, which changes the model’s behavior without retraining the whole model.
So the base model stays mostly the same, but the LoRA adapter learns, “outputs like this got high reward, outputs like that failed.”
The authors tested this on 3 very different tasks: Chinese legal charge classification, GPU kernel speed tuning, and single-cell RNA denoising.
The combined version beat setup-only improvement on all 3 tasks, reaching 70.1% on LawBench, faster GPU code than the prior best, and 0.289 on denoising.
The main lesson is that better scaffolding helps the agent act better, but weight updates help it learn task patterns that prompts and tools alone did not find.
----
Link – arxiv. org/abs/2605.27276
Title: "SIA: Self Improving AI with Harness & Weight Updates"
译该论文提出SIA框架,让AI自动循环改进:一个观察者AI监控任务代理的表现,然后修改其外部设置(提示词、工具、重试规则、输出解析)或通过LoRA权重更新训练模型本身,模型主体不变,仅适配器从任务反馈中学习。在三个任务上测试:中文法律罪名分类(LawBench达70.1%)、GPU内核速度调优(生成代码优于此前最佳)、单细胞RNA降噪(得分0.289)。综合版本在所有任务上超越仅修改设置的方案,表明权重更新能帮助模型学到提示和工具无法发现的模式。
Rohan Paul@rohanpaul_ai · 3天前83Jeff Bezos on CNBC explains revealed what Prometheus is building.
Today his new company Prometheus announced a $12B funding round at a valuation of $41B .
Prometheus trying to build an artificial general engineer that can help design and manufacture physical products like engines, medical devices, and electronics.
So the target areas are hard physical products like jet engines, chips, bridges, medical devices, consumer electronics, aerospace systems, vehicles, and drug design, where design cycles can take years because every idea has to survive physics, materials, cost, testing, and factory limits.
Bezos’ jet-engine example explains it well: asking for the same engine with 10% more thrust can become a 10-year engineering program, and Prometheus wants to shrink that “dream-build” cycle by 10x or more.
The $6.2B launch funding gave Prometheus a massive starting base, and the new raise says the company likely needs far more compute, talent, and industrial data before it can prove the product.
Their $41B valuation shows that frontier AI is becoming less a software race than a compute procurement race.
A company with no broadly shipped product can raise $12 billion at a $41 billion valuation because investors are not only funding a model, they are prepaying for the machines that might make the model possible.
The scarce asset is no longer just talent or algorithms, but clustered GPUs, power contracts, cooling, networking, and the operational skill to keep expensive silicon busy.
They are proof that demand is arriving faster than infrastructure can be built, and that every frontier funding round quietly turns into a future claim on power, racks, GPUs, and uptime.
译Jeff Bezos 在 CNBC 披露其新公司 Prometheus 的愿景:构建人工通用工程师,设计制造喷气发动机、芯片、医疗设备等硬物理产品,将传统数年设计周期缩短 10 倍以上。公司宣布完成 120 亿美元融资,估值 410 亿美元。初始启动资金 62 亿美元,新一轮融资表明公司需要更多算力、人才和工业数据才能验证产品。410 亿美元估值表明,前沿 AI 已从软件竞赛变为计算采购竞赛——投资者实质在为可能实现模型所需的机器预付费。
Epoch AI@EpochAIResearch · 3天前55How big a leap is Mythos in cyber capabilities?
@timotheechauvin, @AlexBarry4, @js_denain, and @ansonwhho compiled the public evidence and found that while it’s unclear if Mythos was ahead of trend in discovering vulnerabilities, it represents a big jump in exploiting them. 🧵
译Mythos 在网络能力方面有多大的飞跃?
@timotheechauvin、@AlexBarry4、@js_denain 和 @ansonwhho 整理了公开证据,发现虽然尚不清楚 Mythos 在发现漏洞方面是否领先于趋势,但它在利用漏洞方面代表了一次巨大飞跃。🧵
Rohan Paul@rohanpaul_ai · 3天前67OpenAI is buying Ona to give Codex agents a secure cloud desk that stays open after humans leave.
Codex already has 5M weekly users, up 400%, but harder work breaks the old chat pattern because agents need tools, files, credentials, logs, and time.
Ona adds persistent cloud workspaces, meaning an agent gets a controlled place to run commands, inspect systems, preserve context, and resume work without depending on one device.
The enterprise angle is the real acquisition target: companies want agents inside their own cloud boundary, with scoped credentials, review trails, access limits, and auditable activity.
This makes Codexmore like a managed execution layer for tests, bug fixes, refactors, vulnerability work, migrations, and multi-step knowledge tasks.
译OpenAI 宣布收购 Ona,其安全云端执行技术可为 Codex 智能体创建持久云端工作空间——用户离开后,智能体仍可持续运行命令、检查系统、保留上下文并跨设备恢复任务。目前 Codex 周活用户达 500 万(增长 400%)。收购旨在强化企业级部署:智能体可在企业云边界内运行,具备作用域凭证、审核追踪、访问限制和可审计活动,适用于测试、漏洞修复、重构、迁移等多步骤任务。收购完成后,Ona 团队将加入 OpenAI Codex 团队。
Rohan Paul@rohanpaul_ai · 3天前71Jeff Bezos shuts down AI-induced job loss talk, predicts labor shortage instead
Jeff Bezos on CNBC
"I think that there’s going to be a labor shortage as a result.
Many smart people are saying, oh my God, there are going to be no more radiologists because the AI can read X-rays better than the radiologist can. And there are going to be no more software engineers because the AI can program better than the software engineer can.
These people are wrong. What’s really going to happen is that it’s going to elevate all of these people.
It’s like, let’s say you’re a software engineer. You’ve been digging out the basement of your house with a shovel, and somebody’s about to hand you a bulldozer. You should be so happy if you’re digging the basement to your house and somebody says, “Hey, how about this?
We’re going to have so much productivity in our economy.”
----
From "CNBC Television" YouTube channel, (link in comment)
译杰夫·贝佐斯在 CNBC 反驳“AI 取代人类工作”的观点。他认为,许多人担心 AI 会消灭放射科医生、软件工程师等岗位,但这种看法是错的。AI 实际上会提升这些人的能力,就像挖地下室从铁锹换成推土机一样。他预测结果反而是劳动力短缺,经济生产力将大幅提升。